ParakeetRebeccaRosario/README.md

# Parakeet

Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by [Baidu Research](http://research.baidu.com) and other research groups.  

In particular, it features the latest [WaveFlow] (https://arxiv.org/abs/1912.01219) model proposed by Baidu Research. WaveFlow is a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It generates high-fidelity speech as WaveNet, while synthesizing serval orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has
only 5.9M parameters, which is 15 times smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio around 40 times faster than real-time on a V100 GPU without engineered inference kernels.

<div align="center">
  <img src="images/logo.png" width=450 /> <br>
</div>

### Setup

Make sure the library `libsndfile1` is installed, e.g., on Ubuntu.

```bash
sudo apt-get install libsndfile1
```

### Install PaddlePaddle

See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires paddlepaddle 1.7 or above.

### Install Parakeet

```bash
# git clone this repo first
cd Parakeet
pip install -e .
```

### Install CMUdict for nltk

CMUdict from nltk is used to transform text into phonemes.
```python
import nltk
nltk.download("punkt")
nltk.download("cmudict")
```


## Related Research

- [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
- [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)
- [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
- [WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)

## Examples

- [Train a DeepVoice3 model with ljspeech dataset](./examples/deepvoice3)
- [Train a TransformerTTS  model with ljspeech dataset](./examples/transformer_tts)
- [Train a FastSpeech model with ljspeech dataset](./examples/fastspeech)
- [Train a WaveFlow model with ljspeech dataset](./examples/waveflow)

## Copyright and License

Parakeet is provided under the [Apache-2.0 license](LICENSE).
Init deepvoice3 commit 2019-11-13 22:22:46 +08:00			`# Parakeet`

Update README.md 2020-03-05 08:47:07 +08:00			`Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by [Baidu Research](http://research.baidu.com) and other research groups.`

			In particular, it features the latest [WaveFlow] (https://arxiv.org/abs/1912.01219) model proposed by Baidu Research. WaveFlow is a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It generates high-fidelity speech as WaveNet, while synthesizing serval orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has
			`only 5.9M parameters, which is 15 times smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio around 40 times faster than real-time on a V100 GPU without engineered inference kernels.`
Init deepvoice3 commit 2019-11-13 22:22:46 +08:00
Add logo parakeet 2020-02-06 12:42:00 +08:00			`<div align="center">`
			`<img src="images/logo.png" width=450 /> <br>`
			`</div>`

Update README 2020-02-25 14:39:43 +08:00			`### Setup`

Update README.md 2020-03-05 08:47:07 +08:00			Make sure the library `libsndfile1` is installed, e.g., on Ubuntu.
Update README 2020-02-25 14:39:43 +08:00
			```bash
			`sudo apt-get install libsndfile1`
			```
Init deepvoice3 commit 2019-11-13 22:22:46 +08:00
Update README 2020-02-25 14:39:43 +08:00			`### Install PaddlePaddle`
update paddle version requirement to 1.7, remove requirements.txt 2020-02-18 10:29:54 +08:00
Update README.md 2020-03-05 08:47:07 +08:00			`See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires paddlepaddle 1.7 or above.`
update paddle version requirement to 1.7, remove requirements.txt 2020-02-18 10:29:54 +08:00
			`### Install Parakeet`

Init deepvoice3 commit 2019-11-13 22:22:46 +08:00			```bash
update README for parakeet and examples/deepvoice3 2020-02-18 10:29:54 +08:00			`# git clone this repo first`
			`cd Parakeet`
			`pip install -e .`
Init deepvoice3 commit 2019-11-13 22:22:46 +08:00			```

update README for parakeet and examples/deepvoice3 2020-02-18 10:29:54 +08:00			`### Install CMUdict for nltk`

			`CMUdict from nltk is used to transform text into phonemes.`
			```python
			`import nltk`
modified README.md of install libsndfile1 2020-02-19 11:00:17 +08:00			`nltk.download("punkt")`
update README for parakeet and examples/deepvoice3 2020-02-18 10:29:54 +08:00			`nltk.download("cmudict")`
			```
Modified installation related content in README. 2020-02-18 17:42:24 +08:00

Enable the fp16 inference for waveflow 2020-02-25 23:53:54 +08:00			`## Related Research`
Init deepvoice3 commit 2019-11-13 22:22:46 +08:00
update links for models in README 2020-02-18 11:32:14 +08:00			`- [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)`
Modified installation related content in README. 2020-02-18 17:42:24 +08:00			`- [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)`
			`- [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).`
Enable the fp16 inference for waveflow 2020-02-25 23:53:54 +08:00			`- [WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)`
update README for parakeet and examples/deepvoice3 2020-02-18 10:29:54 +08:00
			`## Examples`

Fix examples dirs 2020-02-26 13:49:58 +08:00			`- [Train a DeepVoice3 model with ljspeech dataset](./examples/deepvoice3)`
			`- [Train a TransformerTTS model with ljspeech dataset](./examples/transformer_tts)`
			`- [Train a FastSpeech model with ljspeech dataset](./examples/fastspeech)`
			`- [Train a WaveFlow model with ljspeech dataset](./examples/waveflow)`
Update README 2020-02-27 12:21:52 +08:00
			`## Copyright and License`

			`Parakeet is provided under the [Apache-2.0 license](LICENSE).`