ca809df71f | ||
---|---|---|
examples | ||
images | ||
notebooks | ||
parakeet | ||
tests | ||
tools | ||
.gitignore | ||
.pre-commit-config.yaml | ||
LICENSE | ||
README.md | ||
setup.py |
README.md
Parakeet
Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by Baidu Research and other research groups.
In particular, it features the latest [WaveFlow] (https://arxiv.org/abs/1912.01219) model proposed by Baidu Research. WaveFlow is a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It generates high-fidelity speech as WaveNet, while synthesizing serval orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.9M parameters, which is 15 times smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio around 40 times faster than real-time on a V100 GPU without engineered inference kernels.
Setup
Make sure the library libsndfile1
is installed, e.g., on Ubuntu.
sudo apt-get install libsndfile1
Install PaddlePaddle
See install for more details. This repo requires paddlepaddle 1.7 or above.
Install Parakeet
# git clone this repo first
cd Parakeet
pip install -e .
Install CMUdict for nltk
CMUdict from nltk is used to transform text into phonemes.
import nltk
nltk.download("punkt")
nltk.download("cmudict")
Related Research
- Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
- Neural Speech Synthesis with Transformer Network
- FastSpeech: Fast, Robust and Controllable Text to Speech.
- WaveFlow: A Compact Flow-based Model for Raw Audio
Examples
- Train a DeepVoice3 model with ljspeech dataset
- Train a TransformerTTS model with ljspeech dataset
- Train a FastSpeech model with ljspeech dataset
- Train a WaveFlow model with ljspeech dataset
Copyright and License
Parakeet is provided under the Apache-2.0 license.