From d1163daa707bf7c3bbc347c8e6796a261594aea5 Mon Sep 17 00:00:00 2001 From: TianYuan Date: Mon, 30 Aug 2021 03:48:11 +0000 Subject: [PATCH] fix readme of aishell3 --- examples/fastspeech2/aishell3/README.md | 27 +++++++++++-------- examples/fastspeech2/aishell3/synthesize.sh | 3 +-- .../fastspeech2/aishell3/synthesize_e2e.sh | 2 +- examples/fastspeech2/baker/README.md | 4 +-- 4 files changed, 20 insertions(+), 16 deletions(-) diff --git a/examples/fastspeech2/aishell3/README.md b/examples/fastspeech2/aishell3/README.md index ef7b71e..eff0be7 100644 --- a/examples/fastspeech2/aishell3/README.md +++ b/examples/fastspeech2/aishell3/README.md @@ -1,7 +1,8 @@ + # FastSpeech2 with AISHELL-3 ## Introduction -AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. +[AISHELL-3](http://www.aishelltech.com/aishell_3) is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. We use AISHELL-3 to train a multi-speaker fastspeech2 model here. ## Dataset @@ -11,7 +12,7 @@ Download AISHELL-3. ```bash wget https://www.openslr.org/resources/93/data_aishell3.tgz ``` -Extract AISHELL. +Extract AISHELL-3. ```bash mkdir data_aishell3 tar zxvf data_aishell3.tgz -C data_aishell3 @@ -20,7 +21,7 @@ tar zxvf data_aishell3.tgz -C data_aishell3 ### Get MFA result of BZNSYP and Extract it. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. -You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo. +You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo. ### Preprocess the dataset. @@ -38,7 +39,7 @@ Run the command below to preprocess the dataset. If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`. ## Synthesize We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder. -Download pretrained parallel wavegan model (Trained with baker) from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it. +Download pretrained parallel wavegan model (Trained with baker) from [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) and unzip it. ```bash unzip parallel_wavegan_baker_ckpt_0.4.zip ``` @@ -56,19 +57,23 @@ or You can see the bash files for more datails of input parameters. ## Pretrained Model -Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip) +Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) -Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model. +Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model. ```bash python3 synthesize_e2e.py \ - --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \ - --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \ - --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \ + --fastspeech2-config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \ + --fastspeech2-checkpoint=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \ + --fastspeech2-stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \ - --text=sentences.txt \ + --text=../sentences.txt \ --output-dir=exp/debug/test_e2e \ --device="gpu" \ - --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt + --phones-dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \ + --speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt + ``` +## Future work +A multi-speaker vocoder is needed. diff --git a/examples/fastspeech2/aishell3/synthesize.sh b/examples/fastspeech2/aishell3/synthesize.sh index dbde9eb..b656bc4 100755 --- a/examples/fastspeech2/aishell3/synthesize.sh +++ b/examples/fastspeech2/aishell3/synthesize.sh @@ -1,9 +1,8 @@ - #!/bin/bash python3 synthesize.py \ --fastspeech2-config=conf/default.yaml \ - --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \ + --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \ --fastspeech2-stat=dump/train/speech_stats.npy \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ diff --git a/examples/fastspeech2/aishell3/synthesize_e2e.sh b/examples/fastspeech2/aishell3/synthesize_e2e.sh index af46126..86171b6 100755 --- a/examples/fastspeech2/aishell3/synthesize_e2e.sh +++ b/examples/fastspeech2/aishell3/synthesize_e2e.sh @@ -3,7 +3,7 @@ python3 synthesize_e2e.py \ --fastspeech2-config=conf/default.yaml \ - --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \ + --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \ --fastspeech2-stat=dump/train/speech_stats.npy \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ diff --git a/examples/fastspeech2/baker/README.md b/examples/fastspeech2/baker/README.md index 32bd109..646c017 100644 --- a/examples/fastspeech2/baker/README.md +++ b/examples/fastspeech2/baker/README.md @@ -45,7 +45,7 @@ You can see the bash files for more datails of input parameters. ## Pretrained Model Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip) -Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model. +Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model. ```bash python3 synthesize_e2e.py \ --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \ @@ -54,7 +54,7 @@ python3 synthesize_e2e.py \ --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \ --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \ --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \ - --text=sentences.txt \ + --text=../sentences.txt \ --output-dir=exp/debug/test_e2e \ --device="gpu" \ --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt