ParakeetRebeccaRosario/examples/transformer_tts/README.md

106 lines
3.4 KiB
Markdown
Raw Normal View History

2020-02-10 15:38:29 +08:00
# TransformerTTS
Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).
2020-02-17 15:53:54 +08:00
## Dataset
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Model Architecture
![TransformerTTS model architecture](./images/model_architecture.jpg)
The model adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implemented CBHG model of tacotron as a vocoder part and converted the spectrogram into raw wave using griffin-lim algorithm.
## Project Structure
```text
├── config # yaml configuration files
├── data.py # dataset and dataloader settings for LJSpeech
├── synthesis.py # script to synthesize waveform from text
├── train_transformer.py # script for transformer model training
├── train_vocoder.py # script for vocoder model training
```
## Train Transformer
TransformerTTS model can train with ``train_transformer.py``.
```bash
python train_trasformer.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
or you can run the script file directly.
```bash
sh train_transformer.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
--use_gpu=1 \
2020-02-17 16:44:53 +08:00
--use_data_parallel=1 \
2020-02-17 15:53:54 +08:00
--data_path=${DATAPATH} \
--config_path='config/train_transformer.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--transformer_step``
2020-02-26 21:03:51 +08:00
For more help on arguments:
2020-02-17 15:53:54 +08:00
``python train_transformer.py --help``.
## Train Vocoder
Vocoder model can train with ``train_vocoder.py``.
```bash
python train_vocoder.py \
--use_gpu=1 \
--use_data_parallel=0 \
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
or you can run the script file directly.
```bash
sh train_vocoder.sh
```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
--use_gpu=1 \
2020-02-17 16:44:53 +08:00
--use_data_parallel=1 \
2020-02-17 15:53:54 +08:00
--data_path=${DATAPATH} \
--config_path='config/train_vocoder.yaml' \
```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--vocoder_step``
2020-02-26 21:03:51 +08:00
For more help on arguments:
2020-02-17 15:53:54 +08:00
``python train_vocoder.py --help``.
## Synthesis
After training the transformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
```bash
python synthesis.py \
--max_len=50 \
--transformer_step=160000 \
--vocoder_step=70000 \
--use_gpu=1
--checkpoint_path='./checkpoint' \
--sample_path='./sample' \
--config_path='config/synthesis.yaml' \
```
or you can run the script file directly.
```bash
sh synthesis.sh
```
And the audio file will be saved in ``--sample_path``.
2020-02-26 21:03:51 +08:00
For more help on arguments:
2020-02-17 15:53:54 +08:00
``python synthesis.py --help``.