ParakeetRebeccaRosario/examples/fastspeech2/baker
TianYuan a141d39b38 fix scripts 2021-08-03 10:10:39 +00:00
..
conf add fastspeech2 example inference 2021-07-22 11:09:58 +00:00
README.md fix scripts 2021-08-03 10:10:39 +00:00
batch_fn.py add fastspeech2 example 2021-07-19 06:31:52 +00:00
compute_statistics.py add fastspeech2 example data preprocess 2021-07-21 03:48:01 +00:00
config.py add fastspeech2 example data preprocess 2021-07-21 03:48:01 +00:00
fastspeech2_updater.py add fastspeech2 example data preprocess 2021-07-21 03:48:01 +00:00
frontend.py fix scripts 2021-08-03 10:10:39 +00:00
gen_duration_from_textgrid.py modularize Chinese frontend 2021-08-02 08:10:08 +00:00
get_feats.py fix scripts 2021-08-03 10:10:39 +00:00
normalize.py fix scripts 2021-08-03 10:10:39 +00:00
preprocess.py fix scripts 2021-08-03 10:10:39 +00:00
preprocess.sh fix scripts 2021-08-03 10:10:39 +00:00
run.sh fix scripts 2021-08-03 10:10:39 +00:00
sentences.txt add fastspeech2 example inference 2021-07-22 11:09:58 +00:00
simple.lexicon add fastspeech2 example data preprocess 2021-07-21 03:48:01 +00:00
synthesize.py fix scripts 2021-08-03 10:10:39 +00:00
synthesize.sh fix scripts 2021-08-03 10:10:39 +00:00
synthesize_e2e.py fix scripts 2021-08-03 10:10:39 +00:00
synthesize_e2e.sh fix scripts 2021-08-03 10:10:39 +00:00
train.py fix scripts 2021-08-03 10:10:39 +00:00

README.md

FastSpeech2 with BZNSYP

Dataset

Download and Extract the datasaet.

Download BZNSYP from it's Official Website.

Get MFA result of BZNSYP and Extract it.

We use MFA to get durations for fastspeech2. You can download from here baker_alignmenti_tone.tar.gz, or train your own MFA model reference to use_mfa example of our repo.

Preprocess the dataset.

Assume the path to the dataset is ~/datasets/BZNSYP. Assume the path to the MFA result of BZNSYP is ./baker_alignment_tone. Run the command below to preprocess the dataset.

./preprocess.sh

Train the model

./run.sh

Synthesize

We use parallel wavegan as the neural vocoder. Download pretrained parallel wavegan model from parallel_wavegan_baker_ckpt_1.0.zip and unzip it.

unzip parallel_wavegan_baker_ckpt_1.0.zip

synthesize.sh can synthesize waveform from metadata.jsonl. synthesize_e2e.sh can synthesize waveform from text list.

./synthesize.sh

or

./synthesize_e2e.sh

You can see the bash files for more datails of input parameters.

Pretrained Model

Pretrained Model with no sil in the edge of audios can be downloaded here. fastspeech2_nosil_baker_ckpt_1.0.zip

Then, you can use the following scripts to synthesize for sentences.txt using pretrained fastspeech2 model.

python3 synthesize_e2e.py \↩
  --fastspeech2-config=fastspeech2_nosil_baker_ckpt_1.0/default.yaml \↩
  --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_1.0/snapshot_iter_76000.pdz \↩
  --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_1.0/speech_stats.npy \↩
  --pwg-config=parallel_wavegan_baker_ckpt_1.0/pwg_default.yaml \↩
  --pwg-params=parallel_wavegan_baker_ckpt_1.0/pwg_generator.pdparams \↩
  --pwg-stat=parallel_wavegan_baker_ckpt_1.0/pwg_stats.npy \↩
  --text=sentences.txt \↩
  --output-dir=exp/debug/test_e2e \↩
  --device="gpu" \↩
  --phones=fastspeech2_nosil_baker_ckpt_1.0/phone_id_map.txt↩