ParakeetRebeccaRosario/examples/transformer_tts/README.md

# TransformerTTS
Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).

## Dataset

We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).

```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture

![TransformerTTS model architecture](./images/model_architecture.jpg)
The model adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implemented CBHG model of tacotron as a vocoder part and converted the spectrogram into raw wave using griffin-lim algorithm.

## Project Structure
```text
├── config                 # yaml configuration files
├── data.py                # dataset and dataloader settings for LJSpeech
├── synthesis.py           # script to synthesize waveform from text
├── train_transformer.py   # script for transformer model training
├── train_vocoder.py       # script for vocoder model training
```

## Train Transformer

TransformerTTS model can train with ``train_transformer.py``.
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
or you can run the script file directly.
```bash
sh train_transformer.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```

if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--transformer_step``

For more help on arguments:
``python train_transformer.py --help``.

## Train Vocoder
Vocoder model can train with ``train_vocoder.py``.
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
or you can run the script file directly.
```bash
sh train_vocoder.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=1 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--vocoder_step``

For more help on arguments:
``python train_vocoder.py --help``.

## Synthesis
After training the transformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
```bash
python synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--sample_path='./sample' \
--config_path='config/synthesis.yaml' \
```

or you can run the script file directly.
```bash
sh synthesis.sh
```

And the audio file will be saved in ``--sample_path``.

For more help on arguments:
``python synthesis.py --help``.
add transformerTTS and fastspeech 2020-02-10 15:38:29 +08:00			`# TransformerTTS`
			`Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).`

add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`## Dataset`

			`We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).`

			```bash
			`wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2`
			`tar xjvf LJSpeech-1.1.tar.bz2`
			```
			`## Model Architecture`

			`![TransformerTTS model architecture](./images/model_architecture.jpg)`
			`The model adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implemented CBHG model of tacotron as a vocoder part and converted the spectrogram into raw wave using griffin-lim algorithm.`

			`## Project Structure`
			```text
			`├── config # yaml configuration files`
			`├── data.py # dataset and dataloader settings for LJSpeech`
			`├── synthesis.py # script to synthesize waveform from text`
			`├── train_transformer.py # script for transformer model training`
			`├── train_vocoder.py # script for vocoder model training`
			```

			`## Train Transformer`

			TransformerTTS model can train with ``train_transformer.py``.
			```bash
			`python train_trasformer.py \`
			`--use_gpu=1 \`
			`--use_data_parallel=0 \`
			`--data_path=${DATAPATH} \`
			`--config_path='config/train_transformer.yaml' \`
			```
			`or you can run the script file directly.`
			```bash
			`sh train_transformer.sh`
			```
			If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:

			```bash
			`CUDA_VISIBLE_DEVICES=0,1,2,3`
			`python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \`
			`--use_gpu=1 \`
add README of FastSpeech 2020-02-17 16:44:53 +08:00			`--use_data_parallel=1 \`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`--data_path=${DATAPATH} \`
			`--config_path='config/train_transformer.yaml' \`
			```

			if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--transformer_step``

add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python train_transformer.py --help``.

			`## Train Vocoder`
			Vocoder model can train with ``train_vocoder.py``.
			```bash
			`python train_vocoder.py \`
			`--use_gpu=1 \`
			`--use_data_parallel=0 \`
			`--data_path=${DATAPATH} \`
			`--config_path='config/train_vocoder.yaml' \`
			```
			`or you can run the script file directly.`
			```bash
			`sh train_vocoder.sh`
			```
			If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:

			```bash
			`CUDA_VISIBLE_DEVICES=0,1,2,3`
			`python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \`
			`--use_gpu=1 \`
add README of FastSpeech 2020-02-17 16:44:53 +08:00			`--use_data_parallel=1 \`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			`--data_path=${DATAPATH} \`
			`--config_path='config/train_vocoder.yaml' \`
			```
			if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--vocoder_step``

add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python train_vocoder.py --help``.

			`## Synthesis`
			After training the transformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
			```bash
			`python synthesis.py \`
			`--max_len=50 \`
			`--transformer_step=160000 \`
			`--vocoder_step=70000 \`
			`--use_gpu=1`
			`--checkpoint_path='./checkpoint' \`
			`--sample_path='./sample' \`
			`--config_path='config/synthesis.yaml' \`
			```

			`or you can run the script file directly.`
			```bash
			`sh synthesis.sh`
			```

			And the audio file will be saved in ``--sample_path``.

add license 2020-02-26 21:03:51 +08:00			`For more help on arguments:`
add README of TransformerTTS 2020-02-17 15:53:54 +08:00			``python synthesis.py --help``.