History

chenfeiyu ef1ea56ed6 fix typos and docs		2021-05-07 15:03:54 +08:00
..
README.md	add plot_multiple_attentions and update visualization code in transformer_tts	2021-04-27 17:40:50 +08:00
config.py	add extra_config keys into the default config of tacotron	2021-04-30 14:27:08 +08:00
ljspeech.py	format code	2021-04-28 20:02:29 +08:00
preprocess.py	format code	2021-04-28 20:02:29 +08:00
synthesize.ipynb	fix typos and docs	2021-05-07 15:03:54 +08:00
synthesize.py	format code	2021-04-28 20:02:29 +08:00
train.py	format code	2021-04-28 20:02:29 +08:00

README.md

Tacotron2

PaddlePaddle dynamic graph implementation of Tacotron2, a neural network architecture for speech synthesis directly from text. The implementation is based on Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.

Project Structure

├── config.py              # default configuration file
├── ljspeech.py            # dataset and dataloader settings for LJSpeech
├── preprocess.py          # script to preprocess LJSpeech dataset
├── synthesize.py          # script to synthesize spectrogram from text
├── train.py               # script for tacotron2 model training
├── synthesize.ipynb       # notebook example for end-to-end TTS

Dataset

We experiment with the LJSpeech dataset. Download and unzip LJSpeech.

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2

Then you need to preprocess the data by running preprocess.py, the preprocessed data will be placed in --output directory.

python preprocess.py \
--input=${DATAPATH} \
--output=${PREPROCESSEDDATAPATH} \
-v  \

For more help on arguments

python preprocess.py --help.

Train the model

Tacotron2 model can be trained by running train.py.

python train.py \
--data=${PREPROCESSEDDATAPATH} \
--output=${OUTPUTPATH} \
--device=gpu \

If you want to train on CPU, just set --device=cpu. If you want to train on multiple GPUs, just set --nprocs as num of GPU. By default, training will be resumed from the latest checkpoint in --output, if you want to start a new training, please use a new ${OUTPUTPATH} with no checkpoint. And if you want to resume from an other existing model, you should set checkpoint_path to be the checkpoint path you want to load.

Note: The checkpoint path cannot contain the file extension.

For more help on arguments

python train_transformer.py --help.

Synthesis

After training the Tacotron2, spectrogram can be synthesized by running synthesis.py.

python synthesis.py \
--config=${CONFIGPATH} \
--checkpoint_path=${CHECKPOINTPATH} \
--input=${TEXTPATH} \
--output=${OUTPUTPATH}
--device=gpu

The ${CONFIGPATH} needs to be matched with ${CHECKPOINTPATH}.

For more help on arguments

python synthesis.py --help.

Then you can find the spectrogram files in ${OUTPUTPATH}, and then they can be the input of vocoder like waveflow to get audio files.

Notebook: End-to-end TTS

See synthesize.ipynb for details about end-to-end TTS with tacotron2 and waveflow.