ParakeetRebeccaRosario/examples/transformer_tts/README.md

# TransformerTTS
PaddlePaddle dynamic graph implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).

## Dataset

We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).

```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture
<div align="center" name="TransformerTTS model architecture">
  <img src="./images/model_architecture.jpg" width=400 height=600 /> <br>
</div>
<div align="center" >
TransformerTTS model architecture
</div>

The model adopts the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implement the CBHG model of Tacotron as the vocoder part and convert the spectrogram into raw wave using Griffin-Lim algorithm.

## Project Structure
```text
├── config                 # yaml configuration files
├── data.py                # dataset and dataloader settings for LJSpeech
├── synthesis.py           # script to synthesize waveform from text
├── train_transformer.py   # script for transformer model training
├── train_vocoder.py       # script for vocoder model training
```

## Train Transformer

TransformerTTS model can be trained with ``train_transformer.py``.
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
Or you can run the script file directly.
```bash
sh train_transformer.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```

If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--transformer_step``.

**Note: In order to ensure the training effect, we recommend using multi-GPU training to enlarge the batch size, and at least 16 samples in single batch per GPU.**

For more help on arguments:
``python train_transformer.py --help``.

## Train Vocoder
Vocoder model can be trained with ``train_vocoder.py``.
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
Or you can run the script file directly.
```bash
sh train_vocoder.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--vocoder_step``.

For more help on arguments:
``python train_vocoder.py --help``.

## Synthesis
After training the TransformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
```bash
python synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--sample_path='./sample' \
--config_path='config/synthesis.yaml' \
```

Or you can run the script file directly.
```bash
sh synthesis.sh
```

And the audio file will be saved in ``--sample_path``.

For more help on arguments:
``python synthesis.py --help``.
add transformerTTS and fastspeech 2020-02-10 15:38:29 +08:00			`# TransformerTTS`
Update README.md 2020-03-06 08:28:58 +08:00			`PaddlePaddle dynamic graph implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).`
add transformerTTS and fastspeech 2020-02-10 15:38:29 +08:00
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`## Dataset`

			`We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).`

			```bash
			`wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2`
			`tar xjvf LJSpeech-1.1.tar.bz2`
			```
			`## Model Architecture`
change the image size in transformer_tts readme 2020-03-10 16:53:27 +08:00			`<div align="center" name="TransformerTTS model architecture">`
			`<img src="./images/model_architecture.jpg" width=400 height=600 /> <br>`
			`</div>`
			`<div align="center" >`
			`TransformerTTS model architecture`
			`</div>`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
Update README.md 2020-03-06 08:28:58 +08:00			`The model adopts the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implement the CBHG model of Tacotron as the vocoder part and convert the spectrogram into raw wave using Griffin-Lim algorithm.`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
			`## Project Structure`
			```text
			`├── config # yaml configuration files`
			`├── data.py # dataset and dataloader settings for LJSpeech`
			`├── synthesis.py # script to synthesize waveform from text`
			`├── train_transformer.py # script for transformer model training`
			`├── train_vocoder.py # script for vocoder model training`
			```

			`## Train Transformer`

Update README.md 2020-03-06 08:28:58 +08:00			TransformerTTS model can be trained with ``train_transformer.py``.
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`python train_trasformer.py \`
			`--use_gpu=1 \`
			`--use_data_parallel=0 \`
			`--data_path=${DATAPATH} \`
			`--config_path='config/train_transformer.yaml' \`
			```
Update README.md 2020-03-06 08:28:58 +08:00			`Or you can run the script file directly.`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`sh train_transformer.sh`
			```
Update README.md 2020-03-06 08:28:58 +08:00			If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
			```bash
			`CUDA_VISIBLE_DEVICES=0,1,2,3`
			`python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \`
			`--use_gpu=1 \`
add README of FastSpeech 2020-02-17 16:44:53 +08:00			`--use_data_parallel=1 \`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`--data_path=${DATAPATH} \`
			`--config_path='config/train_transformer.yaml' \`
			```

Update README.md 2020-03-06 08:28:58 +08:00			If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--transformer_step``.
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
modified the process of generating masks to speed up batching 2020-03-18 17:49:32 +08:00			`Note: In order to ensure the training effect, we recommend using multi-GPU training to enlarge the batch size, and at least 16 samples in single batch per GPU.`

add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python train_transformer.py --help``.

			`## Train Vocoder`
Update README.md 2020-03-06 08:28:58 +08:00			Vocoder model can be trained with ``train_vocoder.py``.
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`python train_vocoder.py \`
			`--use_gpu=1 \`
			`--use_data_parallel=0 \`
			`--data_path=${DATAPATH} \`
			`--config_path='config/train_vocoder.yaml' \`
			```
Update README.md 2020-03-06 08:28:58 +08:00			`Or you can run the script file directly.`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`sh train_vocoder.sh`
			```
Update README.md 2020-03-06 08:28:58 +08:00			If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
			```bash
			`CUDA_VISIBLE_DEVICES=0,1,2,3`
			`python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \`
			`--use_gpu=1 \`
add README of FastSpeech 2020-02-17 16:44:53 +08:00			`--use_data_parallel=1 \`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`--data_path=${DATAPATH} \`
			`--config_path='config/train_vocoder.yaml' \`
			```
Update README.md 2020-03-06 08:28:58 +08:00			If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--vocoder_step``.
add README of TransformerTTS 2020-02-17 15:53:54 +08:00
add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python train_vocoder.py --help``.

			`## Synthesis`
Update README.md 2020-03-06 08:28:58 +08:00			After training the TransformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`python synthesis.py \`
			`--max_len=50 \`
			`--transformer_step=160000 \`
			`--vocoder_step=70000 \`
			`--use_gpu=1`
			`--checkpoint_path='./checkpoint' \`
			`--sample_path='./sample' \`
			`--config_path='config/synthesis.yaml' \`
			```

Update README.md 2020-03-06 08:28:58 +08:00			`Or you can run the script file directly.`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			```bash
			`sh synthesis.sh`
			```

			And the audio file will be saved in ``--sample_path``.

add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python synthesis.py --help``.