fix readme of aishell3

This commit is contained in:
TianYuan 2021-08-30 03:48:11 +00:00
parent 372208dd5b
commit d1163daa70
4 changed files with 20 additions and 16 deletions

View File

@ -1,7 +1,8 @@
# FastSpeech2 with AISHELL-3 # FastSpeech2 with AISHELL-3
## Introduction ## Introduction
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. [AISHELL-3](http://www.aishelltech.com/aishell_3) is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
We use AISHELL-3 to train a multi-speaker fastspeech2 model here. We use AISHELL-3 to train a multi-speaker fastspeech2 model here.
## Dataset ## Dataset
@ -11,7 +12,7 @@ Download AISHELL-3.
```bash ```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz wget https://www.openslr.org/resources/93/data_aishell3.tgz
``` ```
Extract AISHELL. Extract AISHELL-3.
```bash ```bash
mkdir data_aishell3 mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3 tar zxvf data_aishell3.tgz -C data_aishell3
@ -20,7 +21,7 @@ tar zxvf data_aishell3.tgz -C data_aishell3
### Get MFA result of BZNSYP and Extract it. ### Get MFA result of BZNSYP and Extract it.
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo.
### Preprocess the dataset. ### Preprocess the dataset.
@ -38,7 +39,7 @@ Run the command below to preprocess the dataset.
If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`. If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`.
## Synthesize ## Synthesize
We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder. We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder.
Download pretrained parallel wavegan model (Trained with baker) from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it. Download pretrained parallel wavegan model (Trained with baker) from [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) and unzip it.
```bash ```bash
unzip parallel_wavegan_baker_ckpt_0.4.zip unzip parallel_wavegan_baker_ckpt_0.4.zip
``` ```
@ -56,19 +57,23 @@ or
You can see the bash files for more datails of input parameters. You can see the bash files for more datails of input parameters.
## Pretrained Model ## Pretrained Model
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip) Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model. Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
```bash ```bash
python3 synthesize_e2e.py \ python3 synthesize_e2e.py \
--fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \ --fastspeech2-config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \
--fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \ --fastspeech2-checkpoint=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \
--fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \ --fastspeech2-stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
--pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \ --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
--text=sentences.txt \ --text=../sentences.txt \
--output-dir=exp/debug/test_e2e \ --output-dir=exp/debug/test_e2e \
--device="gpu" \ --device="gpu" \
--phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt --phones-dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \
--speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt
``` ```
## Future work
A multi-speaker vocoder is needed.

View File

@ -1,9 +1,8 @@
#!/bin/bash #!/bin/bash
python3 synthesize.py \ python3 synthesize.py \
--fastspeech2-config=conf/default.yaml \ --fastspeech2-config=conf/default.yaml \
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \ --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
--fastspeech2-stat=dump/train/speech_stats.npy \ --fastspeech2-stat=dump/train/speech_stats.npy \
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \

View File

@ -3,7 +3,7 @@
python3 synthesize_e2e.py \ python3 synthesize_e2e.py \
--fastspeech2-config=conf/default.yaml \ --fastspeech2-config=conf/default.yaml \
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \ --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
--fastspeech2-stat=dump/train/speech_stats.npy \ --fastspeech2-stat=dump/train/speech_stats.npy \
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \

View File

@ -45,7 +45,7 @@ You can see the bash files for more datails of input parameters.
## Pretrained Model ## Pretrained Model
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip) Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model. Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
```bash ```bash
python3 synthesize_e2e.py \ python3 synthesize_e2e.py \
--fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \ --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
@ -54,7 +54,7 @@ python3 synthesize_e2e.py \
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
--pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \ --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
--text=sentences.txt \ --text=../sentences.txt \
--output-dir=exp/debug/test_e2e \ --output-dir=exp/debug/test_e2e \
--device="gpu" \ --device="gpu" \
--phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt