Merge branch 'add_readme' into 'master'

Add readme

See merge request !14
This commit is contained in:
liuyibing01 2020-02-17 20:32:21 +08:00
commit 5a325e2343
8 changed files with 269 additions and 11 deletions

View File

@ -2,3 +2,119 @@
Paddle fluid implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
We implement Fastspeech model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
## Installation
### Install paddlepaddle
This implementation requires the latest develop version of paddlepaddle. You can either download the compiled package or build paddle from source.
1. Install the compiled package, via pip, conda or docker. See [**Installation Mannuals**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/index_en.html) for more details.
2. Build paddlepaddle from source. See [**Compile From Source Code**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/compile/fromsource_en.html) for more details. Note that if you want to enable data parallel training for multiple GPUs, you should set `-DWITH_DISTRIBUTE=ON` with cmake.
### Install parakeet
You can choose to install via pypi or clone the repository and install manually.
1. Install via pypi.
```bash
pip install parakeet
```
2. Install manually.
```bash
git clone <url>
cd Parakeet/
pip install -e .
### Download cmudict for nltk
You also need to download cmudict for nltk, because convert text into phonemes with `cmudict`.
```python
import nltk
nltk.download("punkt")
nltk.download("cmudict")
```
If you have completed all the above installations, but still report an error at runtime
``` OSError: sndfile library not found ```
You need to install ```libsndfile``` using your distributions package manager. e.g. install via:
``` sudo apt-get install libsndfile1 ```
## Dataset
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture
![FastSpeech model architecture](./images/model_architecture.png)
FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length
regulator to expand the source phoneme sequence to match the length of the target
mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model.
The model consists of encoder, decoder and length regulator three parts.
## Project Structure
```text
├── config # yaml configuration files
├── synthesis.py # script to synthesize waveform from text
├── train.py # script for model training
```
## Train Transformer
FastSpeech model can train with ``train.py``.
```bash
python train.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--transtts_path='../transformer_tts/checkpoint' \
--transformer_step=160000 \
--config_path='config/fastspeech.yaml' \
```
or you can run the script file directly.
```bash
sh train.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--transtts_path='../transformer_tts/checkpoint' \
--transformer_step=160000 \
--config_path='config/fastspeech.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--fastspeech_step``
For more help on arguments:
``python train.py --help``.
## Synthesis
After training the FastSpeech, audio can be synthesized with ``synthesis.py``.
```bash
python synthesis.py \
--use_gpu=1 \
--alpha=1.0 \
--checkpoint_path='checkpoint/' \
--fastspeech_step=112000 \
```
or you can run the script file directly.
```bash
sh synthesis.sh
```
For more help on arguments:
``python synthesis.py --help``.

Binary file not shown.

After

Width:  |  Height:  |  Size: 513 KiB

View File

@ -1,6 +1,6 @@
# train model
# if you wish to resume from an exists model, uncomment --checkpoint_path and --fastspeech_step
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
CUDA_VISIBLE_DEVICES=0\
python -u train.py \
--batch_size=32 \
--epochs=10000 \

View File

@ -2,3 +2,145 @@
Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).
We implement TransformerTTS model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
## Installation
### Install paddlepaddle
This implementation requires the latest develop version of paddlepaddle. You can either download the compiled package or build paddle from source.
1. Install the compiled package, via pip, conda or docker. See [**Installation Mannuals**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/index_en.html) for more details.
2. Build paddlepaddle from source. See [**Compile From Source Code**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/compile/fromsource_en.html) for more details. Note that if you want to enable data parallel training for multiple GPUs, you should set `-DWITH_DISTRIBUTE=ON` with cmake.
### Install parakeet
You can choose to install via pypi or clone the repository and install manually.
1. Install via pypi.
```bash
pip install parakeet
```
2. Install manually.
```bash
git clone <url>
cd Parakeet/
pip install -e .
### Download cmudict for nltk
You also need to download cmudict for nltk, because convert text into phonemes with `cmudict`.
```python
import nltk
nltk.download("punkt")
nltk.download("cmudict")
```
If you have completed all the above installations, but still report an error at runtime
``` OSError: sndfile library not found ```
You need to install ```libsndfile``` using your distributions package manager. e.g. install via:
``` sudo apt-get install libsndfile1 ```
## Dataset
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture
![TransformerTTS model architecture](./images/model_architecture.jpg)
The model adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implemented CBHG model of tacotron as a vocoder part and converted the spectrogram into raw wave using griffin-lim algorithm.
## Project Structure
```text
├── config # yaml configuration files
├── data.py # dataset and dataloader settings for LJSpeech
├── synthesis.py # script to synthesize waveform from text
├── train_transformer.py # script for transformer model training
├── train_vocoder.py # script for vocoder model training
```
## Train Transformer
TransformerTTS model can train with ``train_transformer.py``.
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
or you can run the script file directly.
```bash
sh train_transformer.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--transformer_step``
For more help on arguments:
``python train_transformer.py --help``.
## Train Vocoder
Vocoder model can train with ``train_vocoder.py``.
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
or you can run the script file directly.
```bash
sh train_vocoder.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--vocoder_step``
For more help on arguments:
``python train_vocoder.py --help``.
## Synthesis
After training the transformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
```bash
python synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--sample_path='./sample' \
--config_path='config/synthesis.yaml' \
```
or you can run the script file directly.
```bash
sh synthesis.sh
```
And the audio file will be saved in ``--sample_path``.
For more help on arguments:
``python synthesis.py --help``.

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

View File

@ -1,10 +1,10 @@
# train model
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
CUDA_VISIBLE_DEVICES=0 \
python -u synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--postnet_step=70000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--log_dir='./log' \

View File

@ -1,7 +1,7 @@
# train model
# if you wish to resume from an exists model, uncomment --checkpoint_path and --transformer_step
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
CUDA_VISIBLE_DEVICES=0 \
python -u train_transformer.py \
--batch_size=32 \
--epochs=10000 \

View File

@ -1,7 +1,7 @@
# train model
# if you wish to resume from an exists model, uncomment --checkpoint_path and --transformer_step
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
# if you wish to resume from an exists model, uncomment --checkpoint_path and --vocoder_step
CUDA_VISIBLE_DEVICES=0 \
python -u train_vocoder.py \
--batch_size=32 \
--epochs=10000 \