Merge branch 'add_readme' into 'master'
Add readme See merge request !14
This commit is contained in:
commit
5a325e2343
|
@ -1,4 +1,120 @@
|
||||||
# Fastspeech
|
# Fastspeech
|
||||||
Paddle fluid implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
|
Paddle fluid implementation of Fastspeech, a feed-forward network based on Transformer. The implementation is based on [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263).
|
||||||
|
|
||||||
We implement Fastspeech model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
|
We implement Fastspeech model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Install paddlepaddle
|
||||||
|
This implementation requires the latest develop version of paddlepaddle. You can either download the compiled package or build paddle from source.
|
||||||
|
1. Install the compiled package, via pip, conda or docker. See [**Installation Mannuals**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/index_en.html) for more details.
|
||||||
|
|
||||||
|
2. Build paddlepaddle from source. See [**Compile From Source Code**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/compile/fromsource_en.html) for more details. Note that if you want to enable data parallel training for multiple GPUs, you should set `-DWITH_DISTRIBUTE=ON` with cmake.
|
||||||
|
|
||||||
|
### Install parakeet
|
||||||
|
You can choose to install via pypi or clone the repository and install manually.
|
||||||
|
|
||||||
|
1. Install via pypi.
|
||||||
|
```bash
|
||||||
|
pip install parakeet
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install manually.
|
||||||
|
```bash
|
||||||
|
git clone <url>
|
||||||
|
cd Parakeet/
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
### Download cmudict for nltk
|
||||||
|
You also need to download cmudict for nltk, because convert text into phonemes with `cmudict`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import nltk
|
||||||
|
nltk.download("punkt")
|
||||||
|
nltk.download("cmudict")
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have completed all the above installations, but still report an error at runtime:
|
||||||
|
|
||||||
|
``` OSError: sndfile library not found ```
|
||||||
|
|
||||||
|
You need to install ```libsndfile``` using your distribution’s package manager. e.g. install via:
|
||||||
|
|
||||||
|
``` sudo apt-get install libsndfile1 ```
|
||||||
|
|
||||||
|
## Dataset
|
||||||
|
|
||||||
|
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
|
||||||
|
tar xjvf LJSpeech-1.1.tar.bz2
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model Architecture
|
||||||
|
|
||||||
|
![FastSpeech model architecture](./images/model_architecture.png)
|
||||||
|
|
||||||
|
FastSpeech is a feed-forward structure based on Transformer, instead of using the encoder-attention-decoder based architecture. This model extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length
|
||||||
|
regulator to expand the source phoneme sequence to match the length of the target
|
||||||
|
mel-spectrogram sequence for parallel mel-spectrogram generation. We use the TransformerTTS as teacher model.
|
||||||
|
The model consists of encoder, decoder and length regulator three parts.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
```text
|
||||||
|
├── config # yaml configuration files
|
||||||
|
├── synthesis.py # script to synthesize waveform from text
|
||||||
|
├── train.py # script for model training
|
||||||
|
```
|
||||||
|
|
||||||
|
## Train Transformer
|
||||||
|
|
||||||
|
FastSpeech model can train with ``train.py``.
|
||||||
|
```bash
|
||||||
|
python train.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=0 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--transtts_path='../transformer_tts/checkpoint' \
|
||||||
|
--transformer_step=160000 \
|
||||||
|
--config_path='config/fastspeech.yaml' \
|
||||||
|
```
|
||||||
|
or you can run the script file directly.
|
||||||
|
```bash
|
||||||
|
sh train.sh
|
||||||
|
```
|
||||||
|
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||||
|
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=1 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--transtts_path='../transformer_tts/checkpoint' \
|
||||||
|
--transformer_step=160000 \
|
||||||
|
--config_path='config/fastspeech.yaml' \
|
||||||
|
```
|
||||||
|
|
||||||
|
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--fastspeech_step``
|
||||||
|
|
||||||
|
For more help on arguments:
|
||||||
|
``python train.py --help``.
|
||||||
|
|
||||||
|
## Synthesis
|
||||||
|
After training the FastSpeech, audio can be synthesized with ``synthesis.py``.
|
||||||
|
```bash
|
||||||
|
python synthesis.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--alpha=1.0 \
|
||||||
|
--checkpoint_path='checkpoint/' \
|
||||||
|
--fastspeech_step=112000 \
|
||||||
|
```
|
||||||
|
|
||||||
|
or you can run the script file directly.
|
||||||
|
```bash
|
||||||
|
sh synthesis.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
For more help on arguments:
|
||||||
|
``python synthesis.py --help``.
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 513 KiB |
|
@ -1,6 +1,6 @@
|
||||||
# train model
|
# train model
|
||||||
# if you wish to resume from an exists model, uncomment --checkpoint_path and --fastspeech_step
|
# if you wish to resume from an exists model, uncomment --checkpoint_path and --fastspeech_step
|
||||||
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
|
CUDA_VISIBLE_DEVICES=0\
|
||||||
python -u train.py \
|
python -u train.py \
|
||||||
--batch_size=32 \
|
--batch_size=32 \
|
||||||
--epochs=10000 \
|
--epochs=10000 \
|
||||||
|
|
|
@ -1,4 +1,146 @@
|
||||||
# TransformerTTS
|
# TransformerTTS
|
||||||
Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).
|
Paddle fluid implementation of TransformerTTS, a neural TTS with Transformer. The implementation is based on [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895).
|
||||||
|
|
||||||
We implement TransformerTTS model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
|
We implement TransformerTTS model in paddle fluid with dynamic graph, which is convenient for flexible network architectures.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Install paddlepaddle
|
||||||
|
This implementation requires the latest develop version of paddlepaddle. You can either download the compiled package or build paddle from source.
|
||||||
|
1. Install the compiled package, via pip, conda or docker. See [**Installation Mannuals**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/index_en.html) for more details.
|
||||||
|
|
||||||
|
2. Build paddlepaddle from source. See [**Compile From Source Code**](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/install/compile/fromsource_en.html) for more details. Note that if you want to enable data parallel training for multiple GPUs, you should set `-DWITH_DISTRIBUTE=ON` with cmake.
|
||||||
|
|
||||||
|
### Install parakeet
|
||||||
|
You can choose to install via pypi or clone the repository and install manually.
|
||||||
|
|
||||||
|
1. Install via pypi.
|
||||||
|
```bash
|
||||||
|
pip install parakeet
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install manually.
|
||||||
|
```bash
|
||||||
|
git clone <url>
|
||||||
|
cd Parakeet/
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
### Download cmudict for nltk
|
||||||
|
You also need to download cmudict for nltk, because convert text into phonemes with `cmudict`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import nltk
|
||||||
|
nltk.download("punkt")
|
||||||
|
nltk.download("cmudict")
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have completed all the above installations, but still report an error at runtime:
|
||||||
|
|
||||||
|
``` OSError: sndfile library not found ```
|
||||||
|
|
||||||
|
You need to install ```libsndfile``` using your distribution’s package manager. e.g. install via:
|
||||||
|
|
||||||
|
``` sudo apt-get install libsndfile1 ```
|
||||||
|
|
||||||
|
## Dataset
|
||||||
|
|
||||||
|
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
|
||||||
|
tar xjvf LJSpeech-1.1.tar.bz2
|
||||||
|
```
|
||||||
|
## Model Architecture
|
||||||
|
|
||||||
|
![TransformerTTS model architecture](./images/model_architecture.jpg)
|
||||||
|
The model adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in [Tacotron2](https://arxiv.org/abs/1712.05884). The model consists of two main parts, encoder and decoder. We also implemented CBHG model of tacotron as a vocoder part and converted the spectrogram into raw wave using griffin-lim algorithm.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
```text
|
||||||
|
├── config # yaml configuration files
|
||||||
|
├── data.py # dataset and dataloader settings for LJSpeech
|
||||||
|
├── synthesis.py # script to synthesize waveform from text
|
||||||
|
├── train_transformer.py # script for transformer model training
|
||||||
|
├── train_vocoder.py # script for vocoder model training
|
||||||
|
```
|
||||||
|
|
||||||
|
## Train Transformer
|
||||||
|
|
||||||
|
TransformerTTS model can train with ``train_transformer.py``.
|
||||||
|
```bash
|
||||||
|
python train_trasformer.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=0 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--config_path='config/train_transformer.yaml' \
|
||||||
|
```
|
||||||
|
or you can run the script file directly.
|
||||||
|
```bash
|
||||||
|
sh train_transformer.sh
|
||||||
|
```
|
||||||
|
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||||
|
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_transformer.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=1 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--config_path='config/train_transformer.yaml' \
|
||||||
|
```
|
||||||
|
|
||||||
|
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--transformer_step``
|
||||||
|
|
||||||
|
For more help on arguments:
|
||||||
|
``python train_transformer.py --help``.
|
||||||
|
|
||||||
|
## Train Vocoder
|
||||||
|
Vocoder model can train with ``train_vocoder.py``.
|
||||||
|
```bash
|
||||||
|
python train_vocoder.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=0 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--config_path='config/train_vocoder.yaml' \
|
||||||
|
```
|
||||||
|
or you can run the script file directly.
|
||||||
|
```bash
|
||||||
|
sh train_vocoder.sh
|
||||||
|
```
|
||||||
|
If you want to train on multiple GPUs, you must set ``--use_data_parallel=1``, and then start training as follow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||||
|
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train_vocoder.py \
|
||||||
|
--use_gpu=1 \
|
||||||
|
--use_data_parallel=1 \
|
||||||
|
--data_path=${DATAPATH} \
|
||||||
|
--config_path='config/train_vocoder.yaml' \
|
||||||
|
```
|
||||||
|
if you wish to resume from an exists model, please set ``--checkpoint_path`` and ``--vocoder_step``
|
||||||
|
|
||||||
|
For more help on arguments:
|
||||||
|
``python train_vocoder.py --help``.
|
||||||
|
|
||||||
|
## Synthesis
|
||||||
|
After training the transformerTTS and vocoder model, audio can be synthesized with ``synthesis.py``.
|
||||||
|
```bash
|
||||||
|
python synthesis.py \
|
||||||
|
--max_len=50 \
|
||||||
|
--transformer_step=160000 \
|
||||||
|
--vocoder_step=70000 \
|
||||||
|
--use_gpu=1
|
||||||
|
--checkpoint_path='./checkpoint' \
|
||||||
|
--sample_path='./sample' \
|
||||||
|
--config_path='config/synthesis.yaml' \
|
||||||
|
```
|
||||||
|
|
||||||
|
or you can run the script file directly.
|
||||||
|
```bash
|
||||||
|
sh synthesis.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
And the audio file will be saved in ``--sample_path``.
|
||||||
|
|
||||||
|
For more help on arguments:
|
||||||
|
``python synthesis.py --help``.
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 322 KiB |
|
@ -1,10 +1,10 @@
|
||||||
|
|
||||||
# train model
|
# train model
|
||||||
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
|
CUDA_VISIBLE_DEVICES=0 \
|
||||||
python -u synthesis.py \
|
python -u synthesis.py \
|
||||||
--max_len=50 \
|
--max_len=50 \
|
||||||
--transformer_step=160000 \
|
--transformer_step=160000 \
|
||||||
--postnet_step=70000 \
|
--vocoder_step=70000 \
|
||||||
--use_gpu=1
|
--use_gpu=1
|
||||||
--checkpoint_path='./checkpoint' \
|
--checkpoint_path='./checkpoint' \
|
||||||
--log_dir='./log' \
|
--log_dir='./log' \
|
||||||
|
@ -15,4 +15,4 @@ if [ $? -ne 0 ]; then
|
||||||
echo "Failed in training!"
|
echo "Failed in training!"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
exit 0
|
exit 0
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
|
|
||||||
# train model
|
# train model
|
||||||
# if you wish to resume from an exists model, uncomment --checkpoint_path and --transformer_step
|
# if you wish to resume from an exists model, uncomment --checkpoint_path and --transformer_step
|
||||||
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
|
CUDA_VISIBLE_DEVICES=0 \
|
||||||
python -u train_transformer.py \
|
python -u train_transformer.py \
|
||||||
--batch_size=32 \
|
--batch_size=32 \
|
||||||
--epochs=10000 \
|
--epochs=10000 \
|
||||||
|
@ -22,4 +22,4 @@ if [ $? -ne 0 ]; then
|
||||||
echo "Failed in training!"
|
echo "Failed in training!"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
exit 0
|
exit 0
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
|
|
||||||
# train model
|
# train model
|
||||||
# if you wish to resume from an exists model, uncomment --checkpoint_path and --transformer_step
|
# if you wish to resume from an exists model, uncomment --checkpoint_path and --vocoder_step
|
||||||
#CUDA_VISIBLE_DEVICES=0,1,2,3 \
|
CUDA_VISIBLE_DEVICES=0 \
|
||||||
python -u train_vocoder.py \
|
python -u train_vocoder.py \
|
||||||
--batch_size=32 \
|
--batch_size=32 \
|
||||||
--epochs=10000 \
|
--epochs=10000 \
|
||||||
|
@ -21,4 +21,4 @@ if [ $? -ne 0 ]; then
|
||||||
echo "Failed in training!"
|
echo "Failed in training!"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
exit 0
|
exit 0
|
||||||
|
|
Loading…
Reference in New Issue