Project DeepSpeech - Open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu's Deep Speech research paper.
wav2letter++ - Fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
Kaldi - Speech Recognition Toolkit.
Real-Time Voice Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time.
Kaldi Active Grammar - Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time.
SpecAugment with PyTorch - PyTorch Implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition.
Dragonfly - Speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software.
Gentle - Robust yet lenient forced-aligner built on Kaldi. A tool for aligning speech with text.
Porcupine - On-device wake word detection powered by deep learning.
Eesen - End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding.
Silero Models - Pre-trained STT models and benchmarks made embarrassingly simple.
Wavenet For Speech Denoising - Neural network for end-to-end speech denoising, as described in: "A Wavenet For Speech Denoising".
Vosk - Speech recognition toolkit with state-of-the-art accuracy and low latency in Rust.
Voicegain - Speech-to-text Platform and APIs. Speech Recognition.
Speaker Diarization - Process to answer the question of 'who spoke when?' in an audio file.
SpeechRecognition - Local auto speech recognition project based on Kaldi and ALSA.
Athena - Open-source implementation of sequence-to-sequence based speech processing engine.
Cheetah - On-device streaming speech-to-text engine powered by deep learning.
WaveRNN - PyTorch implementation of Deepmind's WaveRNN model from Efficient Neural Audio Synthesis.
Conformer - PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.
libfvad - Voice activity detection (VAD) library, based on WebRTC's VAD engine.
ASR with PyTorch - Experimental code for speech recognition using PyTorch and Kaldi.
Parrot.PY - Computer interaction using audio and speech recognition.
Vosk API - Offline open source speech recognition toolkit.
Lyra - Very Low-Bitrate Codec for Speech Compression.
lasr - PyTorch Lightning implementation of Automatic Speech Recognition.
TTS - Library for advanced Text-to-Speech generation.
Common Voice - Mozilla's initiative to help teach machines how real people speak.