fix readme of aishell3

2021-08-30 03:48:11 +00:00 · 2021-08-30 03:48:11 +00:00 · d1163daa70
parent 372208dd5b
commit d1163daa70
4 changed files with 20 additions and 16 deletions
--- a/examples/fastspeech2/aishell3/README.md
+++ b/examples/fastspeech2/aishell3/README.md
@ -1,7 +1,8 @@
+
 # FastSpeech2 with AISHELL-3

 ## Introduction
-AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
+[AISHELL-3](http://www.aishelltech.com/aishell_3) is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
 We use AISHELL-3 to train a multi-speaker fastspeech2 model here.

 ## Dataset
@ -11,7 +12,7 @@ Download AISHELL-3.
 ```bash
 wget https://www.openslr.org/resources/93/data_aishell3.tgz
 ```
-Extract AISHELL.
+Extract AISHELL-3.
 ```bash
 mkdir data_aishell3
 tar zxvf data_aishell3.tgz -C data_aishell3
@ -20,7 +21,7 @@ tar zxvf data_aishell3.tgz -C data_aishell3
 ### Get MFA result of BZNSYP and Extract it.

 We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
-You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo.
+You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo.

 ### Preprocess the dataset.

@ -38,7 +39,7 @@ Run the command below to preprocess the dataset.
 If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`.
 ## Synthesize
 We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder.
-Download pretrained parallel wavegan model (Trained with baker) from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it.
+Download pretrained parallel wavegan model (Trained with baker) from [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) and unzip it.
 ```bash
 unzip parallel_wavegan_baker_ckpt_0.4.zip
 ```
@ -56,19 +57,23 @@ or
 You can see the bash files for more datails of input parameters.

 ## Pretrained Model
-Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
+Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)

-Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model.
+Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
 ```bash
 python3 synthesize_e2e.py \
-  --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
-  --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \
-  --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \
+  --fastspeech2-config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \
+  --fastspeech2-checkpoint=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \
+  --fastspeech2-stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
  --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
-  --text=sentences.txt \
+  --text=../sentences.txt \
  --output-dir=exp/debug/test_e2e \
  --device="gpu" \
-  --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
+  --phones-dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \
+  --speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt
+
 ```
+## Future work
+A multi-speaker  vocoder is needed.
--- a/examples/fastspeech2/aishell3/synthesize.sh
+++ b/examples/fastspeech2/aishell3/synthesize.sh
@ -1,9 +1,8 @@
-
 #!/bin/bash

 python3 synthesize.py \
  --fastspeech2-config=conf/default.yaml \
-  --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \
+  --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
  --fastspeech2-stat=dump/train/speech_stats.npy \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
--- a/examples/fastspeech2/aishell3/synthesize_e2e.sh
+++ b/examples/fastspeech2/aishell3/synthesize_e2e.sh
@ -3,7 +3,7 @@

 python3 synthesize_e2e.py \
  --fastspeech2-config=conf/default.yaml \
-  --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \
+  --fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
  --fastspeech2-stat=dump/train/speech_stats.npy \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
--- a/examples/fastspeech2/baker/README.md
+++ b/examples/fastspeech2/baker/README.md
@ -45,7 +45,7 @@ You can see the bash files for more datails of input parameters.
 ## Pretrained Model
 Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)

-Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model.
+Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
 ```bash
 python3 synthesize_e2e.py \
  --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
@ -54,7 +54,7 @@ python3 synthesize_e2e.py \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
  --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
-  --text=sentences.txt \
+  --text=../sentences.txt \
  --output-dir=exp/debug/test_e2e \
  --device="gpu" \
  --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt