ParakeetEricRoss/examples/fastspeech2/baker/README.md

# FastSpeech2 with the Baker dataset
This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).

## Dataset

### Download and Extract the datasaet
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).

### Get MFA result of CSMSC and Extract it
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your own MFA model reference to  [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo.

### Preprocess the dataset
Assume the path to the dataset is `~/datasets/BZNSYP`.
Assume the path to the MFA result of BZNSYP is `./baker_alignment_tone`.
Run the command below to preprocess the dataset.

```bash
./preprocess.sh
```

## Train the model
```bash
./run.sh
```
If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`.

## Synthesize
We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder.
Download pretrained parallel wavegan model from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it.
```bash
unzip parallel_wavegan_baker_ckpt_0.4.zip
```
`synthesize.sh` can synthesize waveform from `metadata.jsonl`.
`synthesize_e2e.sh` can synthesize waveform from text list.

```bash
./synthesize.sh
```
or
```bash
./synthesize_e2e.sh
```

You can see the bash files for more datails of input parameters.

## Pretrained Model
Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)

Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
```bash
python3 synthesize_e2e.py \
  --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
  --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \
  --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \
  --pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \
  --pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \
  --pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \
  --text=../sentences.txt \
  --output-dir=exp/default/test_e2e \
  --device="gpu" \
  --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
```
update readme 2021-09-06 20:10:01 +08:00			`# FastSpeech2 with the Baker dataset`
			`This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00			`## Dataset`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00
update readme 2021-09-06 20:10:01 +08:00			`### Download and Extract the datasaet`
			`Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).`
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00
update readme 2021-09-06 20:10:01 +08:00			`### Get MFA result of CSMSC and Extract it`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			`We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.`
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo.`
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00
update readme 2021-09-06 20:10:01 +08:00			`### Preprocess the dataset`
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00			Assume the path to the dataset is `~/datasets/BZNSYP`.
			Assume the path to the MFA result of BZNSYP is `./baker_alignment_tone`.
			`Run the command below to preprocess the dataset.`

			```bash
			`./preprocess.sh`
			```
update readme 2021-09-06 20:10:01 +08:00
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00			`## Train the model`
			```bash
			`./run.sh`
			```
fix readme 2021-08-18 11:45:58 +08:00			If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`.
update readme 2021-09-06 20:10:01 +08:00
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00			`## Synthesize`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			`We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder.`
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`Download pretrained parallel wavegan model from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it.`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			```bash
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`unzip parallel_wavegan_baker_ckpt_0.4.zip`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			```
fix scripts 2021-08-03 18:10:39 +08:00			`synthesize.sh` can synthesize waveform from `metadata.jsonl`.
			`synthesize_e2e.sh` can synthesize waveform from text list.
modularize Chinese frontend 2021-08-02 14:28:25 +08:00
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00			```bash
			`./synthesize.sh`
			```
			`or`
			```bash
			`./synthesize_e2e.sh`
			```

modularize Chinese frontend 2021-08-02 14:28:25 +08:00			`You can see the bash files for more datails of input parameters.`
add fastspeech2 example inference 2021-07-22 18:31:34 +08:00
			`## Pretrained Model`
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`Pretrained Model with no sil in the edge of audios can be downloaded here. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00
fix readme of aishell3 2021-08-30 11:48:11 +08:00			Then, you can use the following scripts to synthesize for `../sentences.txt` using pretrained fastspeech2 model.
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			```bash
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`python3 synthesize_e2e.py \`
			`--fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \`
			`--fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \`
			`--fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \`
			`--pwg-config=parallel_wavegan_baker_ckpt_0.4/pwg_default.yaml \`
			`--pwg-params=parallel_wavegan_baker_ckpt_0.4/pwg_generator.pdparams \`
			`--pwg-stat=parallel_wavegan_baker_ckpt_0.4/pwg_stats.npy \`
fix readme of aishell3 2021-08-30 11:48:11 +08:00			`--text=../sentences.txt \`
fix outout dir in shell script 2021-09-06 11:41:03 +08:00			`--output-dir=exp/default/test_e2e \`
format code and add typehint for tone_sandhi 2021-08-04 17:38:08 +08:00			`--device="gpu" \`
			`--phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt`
modularize Chinese frontend 2021-08-02 14:28:25 +08:00			```