2.5 KiB
Tacotron2
PaddlePaddle dynamic graph implementation of Tacotron2, a neural network architecture for speech synthesis directly from text. The implementation is based on Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.
Project Structure
├── config.py # default configuration file
├── ljspeech.py # dataset and dataloader settings for LJSpeech
├── preprocess.py # script to preprocess LJSpeech dataset
├── synthesis.py # script to synthesize spectrogram from text
├── train.py # script for tacotron2 model training
Dataset
We experiment with the LJSpeech dataset. Download and unzip LJSpeech.
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
Then you need to preprocess the data by running preprocess.py
, the preprocessed data will be placed in --output
directory.
python preprocess.py \
--input=${DATAPATH} \
--output=${PREPROCESSEDDATAPATH} \
-v \
For more help on arguments
python preprocess.py --help
.
Train the model
Tacotron2 model can be trained by running train.py
.
python train.py \
--data=${PREPROCESSEDDATAPATH} \
--output=${OUTPUTPATH} \
--device=gpu \
If you want to train on CPU, just set --device=cpu
.
If you want to train on multiple GPUs, just set --nprocs
as num of GPU.
By default, training will be resumed from the latest checkpoint in --output
, if you want to start a new training, please use a new ${OUTPUTPATH}
with no checkpoint. And if you want to resume from an other existing model, you should set checkpoint_path
to be the checkpoint path you want to load.
Note: The checkpoint path cannot contain the file extension.
For more help on arguments
python train_transformer.py --help
.
Synthesis
After training the Tacotron2, spectrogram can be synthesized by running synthesis.py
.
python synthesis.py \
--config=${CONFIGPATH} \
--checkpoint_path=${CHECKPOINTPATH} \
--input=${TEXTPATH} \
--output=${OUTPUTPATH}
--device=gpu
The ${CONFIGPATH}
needs to be matched with ${CHECKPOINTPATH}
.
For more help on arguments
python synthesis.py --help
.
Then you can find the spectrogram files in ${OUTPUTPATH}
, and then they can be the input of vocoder like waveflow to get audio files.