This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
Download the dataset from the [official website of data-baker](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in directory `~/datasets/BZNSYP`.
The dataset is split into 3 parts, namely train, dev and test, each of which contains a `norm` and `raw` subfolder. The `raw` folder contains log magnitude of mel spectrogram of each utterances, while the norm folder contains normalized spectrogram. The statistics used to normalize the spectrogram is computed from the training set, which is located in `dump/train/stats.npy`.
4.`--device` is the type of the device to run the experiment, 'cpu' or 'gpu' are supported.
5.`--nprocs` is the number of processes to run in parallel, note that nprocs > 1 is only supported when `--device` is 'gpu'.
## Pretrained Models
Pretrained models can be downloaded here:
1. Parallel WaveGAN checkpoint. [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip), which is used as a vocoder in the end-to-end inference script.
--config CONFIG config file to overwrite default config
--checkpoint CHECKPOINT
snapshot to load
--test-metadata TEST_METADATA
dev data
--output-dir OUTPUT_DIR
output dir
--device DEVICE device to run
--verbose VERBOSE verbose
```
1.`--config` is the extra configuration file to overwrite the default config. You should use the same config with which the model is trained.
2.`--checkpoint` is the checkpoint to load. Pick one of the checkpoints from `/checkpoints` inside the training output directory. If you use the pretrained model, use the `pwg_snapshot_iter_400000.pdz`.
3.`--test-metadata` is the metadata of the test dataset. Use the `metadata.jsonl` in the `dev/norm` subfolder from the processed directory.
4.`--output-dir` is the directory to save the synthesized audio files.
5.`--device` is the type of device to run synthesis, 'cpu' and 'gpu' are supported.