fix readme of aishell3
This commit is contained in:
parent
372208dd5b
commit
d1163daa70
|
@ -1,7 +1,8 @@
|
|||
|
||||
# FastSpeech2 with AISHELL-3
|
||||
|
||||
## Introduction
|
||||
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
|
||||
[AISHELL-3](http://www.aishelltech.com/aishell_3) is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
|
||||
We use AISHELL-3 to train a multi-speaker fastspeech2 model here.
|
||||
|
||||
## Dataset
|
||||
|
@ -11,7 +12,7 @@ Download AISHELL-3.
|
|||
```bash
|
||||
wget https://www.openslr.org/resources/93/data_aishell3.tgz
|
||||
```
|
||||
Extract AISHELL.
|
||||
Extract AISHELL-3.
|
||||
```bash
|
||||
mkdir data_aishell3
|
||||
tar zxvf data_aishell3.tgz -C data_aishell3
|
||||
|
@ -20,7 +21,7 @@ tar zxvf data_aishell3.tgz -C data_aishell3
|
|||
### Get MFA result of BZNSYP and Extract it.
|
||||
|
||||
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
|
||||
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo.
|
||||
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo.
|
||||
|
||||
### Preprocess the dataset.
|
||||
|
||||
|
@ -38,7 +39,7 @@ Run the command below to preprocess the dataset.
|
|||
If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`.
|
||||
## Synthesize
|
||||
We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder.
|
||||
Download pretrained parallel wavegan model (Trained with baker) from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it.
|
||||
Download pretrained parallel wavegan model (Trained with baker) from [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) and unzip it.
|
||||
```bash
|
||||
unzip parallel_wavegan_baker_ckpt_0.4.zip
|
||||
```
|
||||
|
@ -56,19 +57,23 @@ or
|
|||
You can see the bash files for more datails of input parameters.
|
||||
|
||||
## Pretrained Model
|
||||
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
|
||||
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
|
||||
|
||||
Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model.
|
||||
Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
|
||||
```bash
|
||||
python3 synthesize_e2e.py \
|
||||
--fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
|
||||
--fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \
|
||||
--fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \
|
||||
--fastspeech2-config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \
|
||||
--fastspeech2-checkpoint=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \
|
||||
--fastspeech2-stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \
|
||||
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
|
||||
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
|
||||
--pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
|
||||
--text=sentences.txt \
|
||||
--text=../sentences.txt \
|
||||
--output-dir=exp/debug/test_e2e \
|
||||
--device="gpu" \
|
||||
--phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
|
||||
--phones-dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \
|
||||
--speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt
|
||||
|
||||
```
|
||||
## Future work
|
||||
A multi-speaker vocoder is needed.
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
|
||||
#!/bin/bash
|
||||
|
||||
python3 synthesize.py \
|
||||
--fastspeech2-config=conf/default.yaml \
|
||||
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \
|
||||
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
|
||||
--fastspeech2-stat=dump/train/speech_stats.npy \
|
||||
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
|
||||
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
python3 synthesize_e2e.py \
|
||||
--fastspeech2-config=conf/default.yaml \
|
||||
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_153.pdz \
|
||||
--fastspeech2-checkpoint=exp/default/checkpoints/snapshot_iter_96400.pdz \
|
||||
--fastspeech2-stat=dump/train/speech_stats.npy \
|
||||
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
|
||||
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
|
||||
|
|
|
@ -45,7 +45,7 @@ You can see the bash files for more datails of input parameters.
|
|||
## Pretrained Model
|
||||
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
|
||||
|
||||
Then, you can use the following scripts to synthesize for `sentences.txt` using pretrained fastspeech2 model.
|
||||
Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
|
||||
```bash
|
||||
python3 synthesize_e2e.py \
|
||||
--fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
|
||||
|
@ -54,7 +54,7 @@ python3 synthesize_e2e.py \
|
|||
--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
|
||||
--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
|
||||
--pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
|
||||
--text=sentences.txt \
|
||||
--text=../sentences.txt \
|
||||
--output-dir=exp/debug/test_e2e \
|
||||
--device="gpu" \
|
||||
--phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
|
||||
|
|
Loading…
Reference in New Issue