Update README.md

This commit is contained in:
zhaokexin01 2020-03-06 08:21:50 +08:00
parent b1c64eecdc
commit 1dacf4caf4
1 changed files with 7 additions and 7 deletions

View File

@ -1,5 +1,5 @@
# Fastspeech # Fastspeech
Paddle fluid implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263). PaddlePaddle dynamic graph implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
## Dataset ## Dataset
@ -14,7 +14,7 @@ tar xjvf LJSpeech-1.1.tar.bz2
![FastSpeech model architecture](./images/model_architecture.png) ![FastSpeech model architecture](./images/model_architecture.png)
FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extracts attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length
regulator to expand the source phoneme sequence to match the length of the target regulator to expand the source phoneme sequence to match the length of the target
mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model. mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model.
The model consists of encoder, decoder and length regulator three parts. The model consists of encoder, decoder and length regulator three parts.
@ -28,7 +28,7 @@ The model consists of encoder, decoder and length regulator three parts.
## Train Transformer ## Train Transformer
FastSpeech model can train with ``train.py``. FastSpeech model can be trained with ``train.py``.
```bash ```bash
python train.py \ python train.py \
--use_gpu=1 \ --use_gpu=1 \
@ -38,11 +38,11 @@ python train.py \
--transformer_step=160000 \ --transformer_step=160000 \
--config_path='config/fastspeech.yaml' \ --config_path='config/fastspeech.yaml' \
``` ```
or you can run the script file directly. Or you can run the script file directly.
```bash ```bash
sh train.sh sh train.sh
``` ```
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow: If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follows:
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 CUDA_VISIBLE_DEVICES=0,1,2,3
@ -55,7 +55,7 @@ python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog tr
--config_path='config/fastspeech.yaml' \ --config_path='config/fastspeech.yaml' \
``` ```
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--fastspeech_step`` If you wish to resume from an existing model, please set ``--checkpoint_path`` and ``--fastspeech_step``.
For more help on arguments: For more help on arguments:
``python train.py --help``. ``python train.py --help``.
@ -70,7 +70,7 @@ python synthesis.py \
--fastspeech_step=112000 \ --fastspeech_step=112000 \
``` ```
or you can run the script file directly. Or you can run the script file directly.
```bash ```bash
sh synthesis.sh sh synthesis.sh
``` ```