ParakeetEricRoss/examples/wavenet
chenfeiyu 424c16a68d staged clarinet 2020-02-27 10:23:05 +00:00
..
configs staged clarinet 2020-02-27 10:23:05 +00:00
README.md refactor wavenet 2020-02-26 15:06:48 +00:00
data.py refactor wavenet 2020-02-26 15:06:48 +00:00
synthesis.py refactor wavenet 2020-02-26 15:06:48 +00:00
train.py refactor wavenet 2020-02-26 15:06:48 +00:00
utils.py staged clarinet 2020-02-27 10:23:05 +00:00

README.md

Wavenet

Paddle implementation of wavenet in dynamic graph, a convolutional network based vocoder. Wavenet is proposed in WaveNet: A Generative Model for Raw Audio, but in thie experiment, the implementation follows the teacher model in ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech.

Dataset

We experiment with the LJSpeech dataset. Download and unzip LJSpeech.

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2

Project Structure

├── data.py          data_processing
├── configs/         (example) configuration file
├── synthesis.py     script to synthesize waveform from mel_spectrogram
├── train.py         script to train a model
└── utils.py         utility functions

Train

Train the model using train.py, follow the usage displayed by python train.py --help.

usage: train.py [-h] [--data DATA] [--config CONFIG] [--output OUTPUT]
                [--device DEVICE] [--resume RESUME]

Train a wavenet model with LJSpeech.

optional arguments:
  -h, --help       show this help message and exit
  --data DATA      path of the LJspeech dataset.
  --config CONFIG  path of the config file.
  --output OUTPUT  path to save results.
  --device DEVICE  device to use.
  --resume RESUME  checkpoint to resume from.
  1. --config is the configuration file to use. The provided configurations can be used directly. And you can change some values in the configuration file and train the model with a different config.
  2. --data is the path of the LJSpeech dataset, the extracted folder from the downloaded archive (the folder which contains metadata.txt).
  3. --resume is the path of the checkpoint. If it is provided, the model would load the checkpoint before trainig.
  4. --output is the directory to save results, all result are saved in this directory. The structure of the output directory is shown below.
├── checkpoints      # checkpoint
└── log              # tensorboard log
  1. --device is the device (gpu id) to use for training. -1 means CPU.

example script:

python train.py --config=./configs/wavenet_single_gaussian.yaml --data=./LJSpeech-1.1/ --output=experiment --device=0

You can monitor training log via tensorboard, using the script below.

cd experiment/log
tensorboard --logdir=.

Synthesis

usage: synthesis.py [-h] [--data DATA] [--config CONFIG] [--device DEVICE]
                    checkpoint output

Synthesize valid data from LJspeech with a wavenet model.

positional arguments:
  checkpoint       checkpoint to load.
  output           path to save results.

optional arguments:
  -h, --help       show this help message and exit
  --data DATA      path of the LJspeech dataset.
  --config CONFIG  path of the config file.
  --device DEVICE  device to use.
  1. --config is the configuration file to use. You should use the same configuration with which you train you model.
  2. --data is the path of the LJspeech dataset. A dataset is not needed for synthesis, but since the input is mel spectrogram, we need to get mel spectrogram from audio files.
  3. checkpoint is the checkpoint to load.
  4. output_path is the directory to save results. The output path contains the generated audio files (*.wav).
  5. --device is the device (gpu id) to use for training. -1 means CPU.

example script:

python synthesis.py --config=./configs/wavenet_single_gaussian.yaml --data=./LJSpeech-1.1/ --device=0 experiment/checkpoints/step_500000 generated