Natural - General natural language facilities for Node.
PyText - Natural language modeling framework based on PyTorch.
FlashText - Extract Keywords from sentence or Replace keywords in sentences.
LASER Language-Agnostic SEntence Representations - Library to calculate and use multilingual sentence embeddings.
StanfordNLP - Python NLP Library for Many Human Languages.
nlp-tutorial - Tutorial for who is studying NLP(Natural Language Processing) using TensorFlow and PyTorch.
gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners".
Lingvo - Framework for building neural networks in Tensorflow, particularly sequence models.
Fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
ParlAI - Framework for training and evaluating AI models on a variety of openly available dialogue datasets.
Olivia - Your new best friend built with an artificial neural network.
Project Alias - Open-source parasite to train custom wake-up names for smart home devices while disturbing their built-in microphone.
Transfer NLP library - Framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP.
FARM - Fast & easy transfer learning for NLP. Harvesting language models for the industry.
Flair - Very simple framework for state-of-the-art NLP. Developed by Zalando Research.
Unsupervised Data Augmentation - Semi-supervised learning method which achieves state-of-the-art results on a wide variety of language and vision tasks.
Rasa - Open source machine learning framework to automate text-and voice-based conversations.
T5 - Text-To-Text Transfer Transformer.
NLP Library - Curated collection of papers for the NLP practitioner.
spacy-transformers - spaCy pipelines for pre-trained BERT, XLNet and GPT-2.
GloVe - Global Vectors for Word Representation.
Botpress - Open-source Virtual Assistant platform.
VizSeq - Visual Analysis Toolkit for Text Generation Tasks.
Introduction to Natural Language Processing book - Survey of computational methods for understanding, generating, and manipulating human language, which offers a synthesis of classical representations and algorithms with contemporary machine learning techniques.
SentenceRepresentation - Code acompanies the paper 'Learning Sentence Representations from Unlabelled Data' Felix Hill, KyungHyun Cho and Anna Korhonen 2016.
Megatron LM - Ongoing research training transformer language models at scale, including: BERT & GPT-2.
XLNet - New unsupervised language representation learning method based on a novel generalized permutation language modeling objective.
ALBERT - Lite BERT for Self-supervised Learning of Language Representations.
BERT - TensorFlow code and pre-trained models for BERT.
sticker - Sequence labeler that uses either recurrent neural networks, transformers, or dilated convolution networks.
sticker-transformers - Pretrained transformer models for sticker.
pke - Python Keyphrase Extraction module.
Interactive Attention Visualization - Small example of an interactive visualization for attention values as being used by transformer language models like GPT2 and BERT.
GluonNLP - Toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your NLP research.
Finetune - Scikit-learn style model finetuning for NLP.
Awesome NLP Paper Discussions - Papers & presentations from Hugging Face's weekly science day.
TTS - Deep learning for Text to Speech.
gpt-2-simple - Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts.
BERTScore - BERT score for text generation.
NLP Index - Collection of NLP resources.
Inferbeddings - Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation.
nlp - Lightweight and extensible library to easily share and access datasets and evaluation metrics for NLP.
vtext - NLP in Rust with Python bindings.
Semantic Machines - Solving conversational artificial intelligence. Part of Microsoft.
Language Interpretability Tool (LIT) - Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
BERTopic - Topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
NLP Pandect - Comprehensive reference for all topics related to Natural Language Processing.
Informers - State-of-the-art natural language processing for Ruby.
Sentence-BERT for spaCy - Wraps sentence-transformers (also known as sentence-BERT) directly in spaCy.
Lingua Franca - Mycroft's multilingual text parsing and formatting library.
Simple Transformers - Based on the Transformers library by HuggingFace. Lets you quickly train and evaluate Transformer models.
Deep Bidirectional Transformers for Language Understanding (2020) - Explains a legendary paper, BERT. (HN)
EasyTransfer - Designed to make the development of transfer learning in NLP applications easier.
LambdaBERT - Transformers-style implementation of BERT using LambdaNetworks instead of self-attention.
DialoGPT - State-of-the-Art Large-scale Pretrained Response Generation Model.
LAMA: LAnguage Model Analysis - Probe for analyzing the factual and commonsense knowledge contained in pretrained language models.
awesome-2vec - Curated list of 2vec-type embedding models.
The Pile - Large, diverse, open source language modelling data set that consists of many smaller datasets combined together.
Bort - Companion code for the paper "Optimal Subarchitecture Extraction for BERT."
Text Synth - Text completion using the GPT-2 language model.
Contextualized Topic Models - Family of topic models that use pre-trained representations of language (e.g., BERT) to support topic modeling.
Language Style Transfer - Code for Style Transfer from Non-Parallel Text by Cross-Alignment paper.
NLU - Power of Spark NLP, the Simplicity of Python. 1 line for hundreds of NLP models and algorithms.
duoBERT - Multi-stage passage ranking: monoBERT + duoBERT.
SMAC3 - Sequential Model-based Algorithm Configuration.
Semantic Experiences by Google - Experiments in understanding language.
Long-Range Arena - Systematic evaluation of efficient transformer models.
PaddleHub - Awesome pre-trained models toolkit based on PaddlePaddle.
FastSeq - Provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc.
FastFormers - Provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Understanding (NLU).
Adversarial NLI - Adversarial Natural Language Inference Benchmark.
Text Classification Models - All kinds of text classification models and more with deep learning.
huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
OpenNRE - Open-Source Package for Neural Relation Extraction (NRE).
gpt-scrolls - Collaborative collection of open-source safe GPT-3 prompts that work well.
SLING - A natural language frame semantics parser - Built to learn to read and understand Wikipedia articles in many languages for the purpose of knowledge base completion.
VecMap - Framework to learn cross-lingual word embedding mappings.
GPT3 List - List of things that people are claiming is enabled by GPT3.
DeBERTa - Decoding-enhanced BERT with Disentangled Attention.
Robustness Gym - Python evaluation toolkit for natural language processing.
Deep Daze - Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network).
NLP Cloud - Serve spaCy pre-trained models, and your own custom models, through a RESTful API.
Reranker - Build Text Rerankers with Deep Language Models.
rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...).
rust-tokenizers - Offers high-performance tokenizers for modern language models.
Shifterator - Interpretable data visualizations for understanding how texts differ at the word level.
minnn - Exercise in developing a minimalist neural network toolkit for NLP.
retext - Natural language processor powered by plugins part of the unified collective.
CLIP Playground - Try OpenAI's CLIP model in your browser.
GPT-3 Demo - GPT-3 Examples, Demos, Showcase, and NLP Use-cases.
Big Sleep - Simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.
Beyond the Imitation Game Benchmark (BIG-bench) - Collaborative benchmark intended to probe large language models, and extrapolate their future capabilities.
AutoNLP - Automatic way to train, evaluate and deploy state-of-the-art NLP models for different tasks.
DeText - Deep Neural Text Understanding Framework for Ranking and Classification Tasks.
Natural Language YouTube Search - Search inside YouTube videos using natural language.
Accelerate - Simple way to train and use NLP models with multi-GPU, TPU, mixed-precision.
GENRE (Generarive ENtity REtrieval) - Uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned BART architecture.
Teachable NLP - GPT-2 Training as a Service.
DensePhrases - Provides answers to your natural language questions from the entire Wikipedia in real-time.
Podium - Framework agnostic Python NLP library for data loading and preprocessing.
TextFlint - Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing.
nlpaug - Data augmentation for NLP.
Top2Vec - Learns jointly embedded topic, document and word vectors.
NLPretext - All the goto functions you need to handle NLP use-cases.
adapter-transformers - Friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models.
TextAttack - Generating adversarial examples for NLP models.
GPT-NeoX - Implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library.
Transformers Interpret - Model explainability tool designed to work exclusively with the transformers package.
UniLM - Pre-trained models for natural language understanding (NLU) and generation (NLG) tasks.
AutoNLP - Faster and easier training and deployments of SOTA NLP models.
TAble PArSing (TAPAS) - End-to-end neural table-text understanding models.
Haystack - End-to-end Python framework for building natural language search interfaces to data.
PLMpapers - Must-read Papers on pre-trained language models.
Evaluation Harness for Large Language Models - Framework for few-shot evaluation of autoregressive language models.
MLP GPT - Jax - GPT, made only of MLPs, in Jax.
SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation.
PromptPapers - Must-read papers on prompt-based tuning for pre-trained language models.
Obsei - Automation tool for text analysis need.
DALL·E Mini - Generate images from a text prompt.
Jury - Evaluation for Natural Language Generation.
Rubrix - Free and open-source tool to explore, label, and monitor data for NLP projects.
OpenCLIP - Open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training).
Spark NLP Workshop - Showcasing notebooks and codes of how to use Spark NLP in Python and Scala.
ConceptNet Numberbatch - Set of semantic vectors (also known as word embeddings) than can be used directly as a representation of word meanings.
NL-Augmenter - Collaborative Repository of Natural Language Transformations.
clip-retrieval - Easily computing clip embeddings and building a clip retrieval system with them.
NVIDIA NeMo - Toolkit for conversational AI.
BEIR - Heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.
UER-py - Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo.
ExplainaBoard - Explainable Leaderboard for NLP.
Fast-BERT - Super easy library for BERT based NLP models.
Quantum Stat - Your NLP Model Training Platform.
NERDA - Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks.