note on conv queue

This commit is contained in:
Kexin Zhao 2020-03-09 23:32:22 -07:00
parent d16abc4952
commit c7176a8712
2 changed files with 10 additions and 2 deletions

View File

@ -22,11 +22,13 @@ PaddlePaddle dynamic graph implementation of [WaveFlow: A Compact Flow-based Mod
There are many hyperparameters to be tuned depending on the specification of model and dataset you are working on. There are many hyperparameters to be tuned depending on the specification of model and dataset you are working on.
We provide `wavenet_ljspeech.yaml` as a hyperparameter set that works well on the LJSpeech dataset. We provide `wavenet_ljspeech.yaml` as a hyperparameter set that works well on the LJSpeech dataset.
Note that we use [convolutional queue](https://arxiv.org/abs/1611.09482) at audio synthesis to cache the intermediate hidden states, which will speed up the autoregressive inference over the height dimension. Current implementation only supports height dimension equals 8 or 16, i.e., where there is no dilation on the height dimension. Therefore, you can only set value of `n_group` key in the yaml config file to be either 8 or 16.
Note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
Also note that `train.py`, `synthesis.py`, and `benchmark.py` all accept a `--config` parameter. To ensure consistency, you should use the same config yaml file for both training, synthesizing and benchmarking. You can also overwrite these preset hyperparameters with command line by updating parameters after `--config`.
For example `--config=${yaml} --batch_size=8` can overwrite the corresponding hyperparameters in the `${yaml}` config file. For more details about these hyperparameters, check `utils.add_config_options_to_parser`. For example `--config=${yaml} --batch_size=8` can overwrite the corresponding hyperparameters in the `${yaml}` config file. For more details about these hyperparameters, check `utils.add_config_options_to_parser`.
Note that you also need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively. Additionally, you need to specify some additional parameters for `train.py`, `synthesis.py`, and `benchmark.py`, and the details can be found in `train.add_options_to_parser`, `synthesis.add_options_to_parser`, and `benchmark.add_options_to_parser`, respectively.
### Dataset ### Dataset

View File

@ -391,6 +391,12 @@ class WaveFlowModule(dg.Layer):
These hidden states along with initial random gaussian latent variable These hidden states along with initial random gaussian latent variable
are passed to a stack of Flow modules to obtain the audio output. are passed to a stack of Flow modules to obtain the audio output.
Note that we use convolutional queue (https://arxiv.org/abs/1611.09482)
to cache the intermediate hidden states, which will speed up the
autoregressive inference over the height dimension. Current
implementation only supports height dimension (self.n_group) equals
8 or 16, i.e., where there is no dilation on the height dimension.
Args: Args:
mel (obj): mel spectrograms. mel (obj): mel spectrograms.
sigma (float, optional): standard deviation of the guassian latent sigma (float, optional): standard deviation of the guassian latent