History

TianYuan 5bd64d3869 update readme		2021-09-06 12:10:01 +00:00
..
conf	add fastspeech2 example inference	2021-07-22 11:09:58 +00:00
README.md	update readme	2021-09-06 12:10:01 +00:00
batch_fn.py	format	2021-08-17 09:54:07 +00:00
config.py	fix log format of fastspeech2 speedyspeech and pwg	2021-09-03 11:21:52 +00:00
fastspeech2_updater.py	fix log format of fastspeech2 speedyspeech and pwg	2021-09-03 11:21:52 +00:00
frontend.py	add traditional and simplified Chinese conversion and add typehint for frontend	2021-08-19 09:29:29 +00:00
preprocess.sh	add aishell3 example	2021-08-25 09:37:16 +00:00
run.sh	fix scripts	2021-08-03 10:10:39 +00:00
simple.lexicon	add fastspeech2 example data preprocess	2021-07-21 03:48:01 +00:00
synthesize.py	fix log format of fastspeech2 speedyspeech and pwg	2021-09-03 11:21:52 +00:00
synthesize.sh	fix outout dir in shell script	2021-09-06 03:41:03 +00:00
synthesize_e2e.py	fix log format of fastspeech2 speedyspeech and pwg	2021-09-03 11:21:52 +00:00
synthesize_e2e.sh	fix outout dir in shell script	2021-09-06 03:41:03 +00:00
train.py	fix log format of fastspeech2 speedyspeech and pwg	2021-09-03 11:21:52 +00:00

README.md

FastSpeech2 with the Baker dataset

This example contains code used to train a Fastspeech2 model with Chinese Standard Mandarin Speech Copus.

Dataset

Download and Extract the datasaet

Download CSMSC from it's Official Website.

Get MFA result of CSMSC and Extract it

We use MFA to get durations for fastspeech2. You can download from here baker_alignment_tone.tar.gz, or train your own MFA model reference to use_mfa example of our repo.

Preprocess the dataset

Assume the path to the dataset is ~/datasets/BZNSYP. Assume the path to the MFA result of BZNSYP is ./baker_alignment_tone. Run the command below to preprocess the dataset.

./preprocess.sh

Train the model

./run.sh

If you want to train fastspeech2 with cpu, please add --device=cpu arguments for python3 train.py in run.sh.

Synthesize

We use parallel wavegan as the neural vocoder. Download pretrained parallel wavegan model from parallel_wavegan_baker_ckpt_0.4.zip and unzip it.

unzip parallel_wavegan_baker_ckpt_0.4.zip

synthesize.sh can synthesize waveform from metadata.jsonl. synthesize_e2e.sh can synthesize waveform from text list.

./synthesize.sh

./synthesize_e2e.sh

You can see the bash files for more datails of input parameters.

Pretrained Model

Pretrained Model with no sil in the edge of audios can be downloaded here. fastspeech2_nosil_baker_ckpt_0.4.zip

Then, you can use the following scripts to synthesize for ../sentences.txt using pretrained fastspeech2 model.

python3 synthesize_e2e.py \
  --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
  --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \
  --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
  --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
  --text=../sentences.txt \
  --output-dir=exp/default/test_e2e \
  --device="gpu" \
  --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt