3d39385d5e | ||
---|---|---|
.. | ||
conf | ||
README.md | ||
batch_fn.py | ||
compute_statistics.py | ||
config.py | ||
fastspeech2_updater.py | ||
frontend.py | ||
gen_duration_from_textgrid.py | ||
get_feats.py | ||
normalize.py | ||
preprocess.py | ||
preprocess.sh | ||
run.sh | ||
sentences.txt | ||
simple.lexicon | ||
synthesize.py | ||
synthesize.sh | ||
synthesize_e2e.py | ||
synthesize_e2e.sh | ||
train.py |
README.md
FastSpeech2 with BZNSYP
Dataset
Download and Extract the datasaet.
Download BZNSYP from it's Official Website.
Get MFA result of BZNSYP and Extract it.
we use MFA to get durations for fastspeech2. you can download from here, or train your own MFA model reference to use_mfa example of our repo.
Preprocess the dataset.
Assume the path to the dataset is ~/datasets/BZNSYP
.
Assume the path to the MFA result of BZNSYP is ./baker_alignment_tone
.
Run the command below to preprocess the dataset.
./preprocess.sh
Train the model
./run.sh
Synthesize
we use parallel wavegan as the neural vocoder.
synthesize.sh
can synthesize waveform for metadata.jsonl
.
synthesize_e2e.sh
can synthesize waveform for text list.
./synthesize.sh
or
./synthesize_e2e.sh
you can see the bash files for more datails of input parameter.