commit
f9b39b97dd
|
@ -0,0 +1,77 @@
|
||||||
|
# Tacotron2
|
||||||
|
|
||||||
|
PaddlePaddle dynamic graph implementation of Tacotron2, a neural network architecture for speech synthesis directly from text. The implementation is based on [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884).
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```text
|
||||||
|
├── config.py # default configuration file
|
||||||
|
├── ljspeech.py # dataset and dataloader settings for LJSpeech
|
||||||
|
├── preprocess.py # script to preprocess LJSpeech dataset
|
||||||
|
├── synthesis.py # script to synthesize spectrogram from text
|
||||||
|
├── train.py # script for tacotron2 model training
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dataset
|
||||||
|
|
||||||
|
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
|
||||||
|
tar xjvf LJSpeech-1.1.tar.bz2
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you need to preprocess the data by running ``preprocess.py``, the preprocessed data will be placed in ``--output`` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python preprocess.py \
|
||||||
|
--input=${DATAPATH} \
|
||||||
|
--output=${PREPROCESSEDDATAPATH} \
|
||||||
|
-v \
|
||||||
|
```
|
||||||
|
|
||||||
|
For more help on arguments
|
||||||
|
|
||||||
|
``python preprocess.py --help``.
|
||||||
|
|
||||||
|
## Train the model
|
||||||
|
|
||||||
|
Tacotron2 model can be trained by running ``train.py``.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python train.py \
|
||||||
|
--data=${PREPROCESSEDDATAPATH} \
|
||||||
|
--output=${OUTPUTPATH} \
|
||||||
|
--device=gpu \
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to train on CPU, just set ``--device=cpu``.
|
||||||
|
If you want to train on multiple GPUs, just set ``--nprocs`` as num of GPU.
|
||||||
|
By default, training will be resumed from the latest checkpoint in ``--output``, if you want to start a new training, please use a new ``${OUTPUTPATH}`` with no checkpoint. And if you want to resume from an other existing model, you should set ``checkpoint_path`` to be the checkpoint path you want to load.
|
||||||
|
|
||||||
|
**Note: The checkpoint path cannot contain the file extension.**
|
||||||
|
|
||||||
|
For more help on arguments
|
||||||
|
|
||||||
|
``python train_transformer.py --help``.
|
||||||
|
|
||||||
|
## Synthesis
|
||||||
|
|
||||||
|
After training the Tacotron2, spectrogram can be synthesized by running ``synthesis.py``.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python synthesis.py \
|
||||||
|
--config=${CONFIGPATH} \
|
||||||
|
--checkpoint_path=${CHECKPOINTPATH} \
|
||||||
|
--input=${TEXTPATH} \
|
||||||
|
--output=${OUTPUTPATH}
|
||||||
|
--device=gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
The ``${CONFIGPATH}`` needs to be matched with ``${CHECKPOINTPATH}``.
|
||||||
|
|
||||||
|
For more help on arguments
|
||||||
|
|
||||||
|
``python synthesis.py --help``.
|
||||||
|
|
||||||
|
Then you can find the spectrogram files in ``${OUTPUTPATH}``, and then they can be the input of vocoder like [waveflow](../waveflow/README.md#Synthesis) to get audio files.
|
Loading…
Reference in New Issue