SpaCy - Industrial-strength Natural Language Processing (NLP) with Python and Cython.
NLP progress - Track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art across the most common NLP tasks and their corresponding datasets. (Web)
Natural - General natural language facilities for Node.
PyText - Natural language modeling framework based on PyTorch.
FlashText - Extract Keywords from sentence or Replace keywords in sentences.
LASER Language-Agnostic SEntence Representations - Library to calculate and use multilingual sentence embeddings.
StanfordNLP - Python NLP Library for Many Human Languages.
nlp-tutorial - Tutorial for who is studying NLP(Natural Language Processing) using TensorFlow and PyTorch.
gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners".
Lingvo - Framework for building neural networks in Tensorflow, particularly sequence models.
Fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
ParlAI - Framework for training and evaluating AI models on a variety of openly available dialogue datasets.
Olivia - Your new best friend built with an artificial neural network.
Project Alias - Open-source parasite to train custom wake-up names for smart home devices while disturbing their built-in microphone.
Transfer NLP library - Framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP.
FARM - Fast & easy transfer learning for NLP. Harvesting language models for the industry.
Transformers - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. (Web)
Flair - Very simple framework for state-of-the-art NLP. Developed by Zalando Research.
Unsupervised Data Augmentation - Semi-supervised learning method which achieves state-of-the-art results on a wide variety of language and vision tasks.
Rasa - Open source machine learning framework to automate text-and voice-based conversations.
T5 - Text-To-Text Transfer Transformer.
NLP Library - Curated collection of papers for the NLP practitioner.
spacy-transformers - spaCy pipelines for pre-trained BERT, XLNet and GPT-2.
AllenNLP - Open-source NLP research library, built on PyTorch. (Announcing AllenNLP 1.0)
GloVe - Global Vectors for Word Representation.
Botpress - Open-source Virtual Assistant platform.
VizSeq - Visual Analysis Toolkit for Text Generation Tasks.
Introduction to Natural Language Processing book - Survey of computational methods for understanding, generating, and manipulating human language, which offers a synthesis of classical representations and algorithms with contemporary machine learning techniques.
Tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production. (Article)
SentenceRepresentation - Code acompanies the paper 'Learning Sentence Representations from Unlabelled Data' Felix Hill, KyungHyun Cho and Anna Korhonen 2016.
Megatron LM - Ongoing research training transformer language models at scale, including: BERT & GPT-2.
XLNet - New unsupervised language representation learning method based on a novel generalized permutation language modeling objective.
ALBERT - Lite BERT for Self-supervised Learning of Language Representations.
BERT - TensorFlow code and pre-trained models for BERT.
sticker - Sequence labeler that uses either recurrent neural networks, transformers, or dilated convolution networks.
sticker-transformers - Pretrained transformer models for sticker.
pke - Python Keyphrase Extraction module.
Interactive Attention Visualization - Small example of an interactive visualization for attention values as being used by transformer language models like GPT2 and BERT.
GluonNLP - Toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your NLP research.
Finetune - Scikit-learn style model finetuning for NLP.
Natural Language Toolkit (NLTK) - Suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. (Web) (Book)
NLP 100 Exercise - Bootcamp designed for learning skills for programming, data analysis, and research activities. (Code)
Awesome NLP Paper Discussions - Papers & presentations from Hugging Face's weekly science day.
TTS - Deep learning for Text to Speech.
gpt-2-simple - Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts.
BERTScore - BERT score for text generation.
This Word Does Not Exist - Allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch. (Code) (HN)
Inferbeddings - Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation.
nlp - Lightweight and extensible library to easily share and access datasets and evaluation metrics for NLP.
vtext - NLP in Rust with Python bindings.
Gwern on GPT-3 (HN)
Semantic Machines - Solving conversational artificial intelligence. Part of Microsoft.
GPT3 Examples (HN)
GPT-3 Explorer - Power tool for experimenting with GPT-3. (Code)
Project Insight - NLP as a Service. (Forum post)
Language Interpretability Tool (LIT) - Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
Booste Pre Trained Models - Free-to-use GPT-2 API. (HN)
BERTopic - Topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
NLP Pandect - Comprehensive reference for all topics related to Natural Language Processing.
spacy-streamlit - sGpaCy building blocks for Streamlit apps. (Tweet)
Informers - State-of-the-art natural language processing for Ruby.
Sentence-BERT for spaCy - Wraps sentence-transformers (also known as sentence-BERT) directly in spaCy.
Lingua Franca - Mycroft's multilingual text parsing and formatting library.
Simple Transformers - Based on the Transformers library by HuggingFace. Lets you quickly train and evaluate Transformer models.
Deep Bidirectional Transformers for Language Understanding (2020) - Explains a legendary paper, BERT. (HN)
EasyTransfer - Designed to make the development of transfer learning in NLP applications easier.
LambdaBERT - Transformers-style implementation of BERT using LambdaNetworks instead of self-attention.
DialoGPT - State-of-the-Art Large-scale Pretrained Response Generation Model.
LAMA: LAnguage Model Analysis - Probe for analyzing the factual and commonsense knowledge contained in pretrained language models.
awesome-2vec - Curated list of 2vec-type embedding models.
The Pile - Large, diverse, open source language modelling data set that consists of many smaller datasets combined together.
Bort - Companion code for the paper "Optimal Subarchitecture Extraction for BERT."
GPT Neo - Implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library.
Text Synth - Text completion using the GPT-2 language model.
Contextualized Topic Models - Family of topic models that use pre-trained representations of language (e.g., BERT) to support topic modeling.
Language Style Transfer - Code for Style Transfer from Non-Parallel Text by Cross-Alignment paper.
NLU - Power of Spark NLP, the Simplicity of Python. 1 line for hundreds of NLP models and algorithms.
duoBERT - Multi-stage passage ranking: monoBERT + duoBERT.
SMAC3 - Sequential Model-based Algorithm Configuration.
Semantic Experiences by Google - Experiments in understanding language.
Long-Range Arena - Systematic evaluation of efficient transformer models.
PaddleHub - Awesome pre-trained models toolkit based on PaddlePaddle.
FastSeq - Provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc.
FastFormers - Provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Understanding (NLU).
Adversarial NLI - Adversarial Natural Language Inference Benchmark.
Text Classification Models - All kinds of text classification models and more with deep learning.
huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
OpenNRE - Open-Source Package for Neural Relation Extraction (NRE).
gpt-scrolls - Collaborative collection of open-source safe GPT-3 prompts that work well.
SLING - A natural language frame semantics parser - Built to learn to read and understand Wikipedia articles in many languages for the purpose of knowledge base completion.
VecMap - Framework to learn cross-lingual word embedding mappings.
GPT3 List - List of things that people are claiming is enabled by GPT3.
DeBERTa - Decoding-enhanced BERT with Disentangled Attention.
Robustness Gym - Python evaluation toolkit for natural language processing.
Deep Daze - Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network).
NLP Cloud - Serve spaCy pre-trained models, and your own custom models, through a RESTful API.
Reranker - Build Text Rerankers with Deep Language Models.
rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...).
rust-tokenizers - Offers high-performance tokenizers for modern language models.
Shifterator - Interpretable data visualizations for understanding how texts differ at the word level.