Merge branch 'develop' into release/v0.2

2021-01-19 17:08:06 +08:00 · 2021-01-19 17:08:06 +08:00 · 5eebbd0716
parent 9f256e325c a0ce65211c
commit 5eebbd0716
32 changed files with 603 additions and 257 deletions
--- a/.readthedocs.yml
+++ b/.readthedocs.yml
@ -0,0 +1,30 @@
+# .readthedocs.yml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Build documentation in the docs/ directory with Sphinx
+sphinx:
+  configuration: docs/source/conf.py
+
+# Build documentation with MkDocs
+#mkdocs:
+#  configuration: mkdocs.yml
+
+# Optionally build your docs in additional formats such as PDF
+formats: []
+
+# Optionally set the version of Python and requirements required to build your docs
+python:
+  version: 3.7
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - doc
+    
+    - requirements: docs/requirements.txt
+
+
--- a/README.md
+++ b/README.md
@ -3,7 +3,7 @@
 Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by [Baidu Research](http://research.baidu.com) and other research groups.  

 <div align="center">
-  <img src="images/logo.png" width=450 /> <br>
+  <img src="images/logo.png" width=300 /> <br>
 </div>

 In particular, it features the latest [WaveFlow](https://arxiv.org/abs/1912.01219) model proposed by Baidu Research.
@ -18,17 +18,15 @@ In order to facilitate exploiting the existing TTS models directly and developin

 - Vocoders
  - [WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)
-  - [ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech](https://arxiv.org/abs/1807.07281)
  - [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499)

 - TTS models
-  - [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
  - [Neural Speech Synthesis with Transformer Network (Transformer TTS)](https://arxiv.org/abs/1809.08895)
-  - [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263)
+  - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](arxiv.org/abs/1712.05884)
+

 And more will be added in the future.

-See the [guide](docs/experiment_guide.md) for details about how to build your own model and experiment in Parakeet.

 ## Setup

@ -40,221 +38,37 @@ sudo apt-get install libsndfile1

 ### Install PaddlePaddle

-See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires PaddlePaddle **1.8.2** or above.
+See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires PaddlePaddle **2.0.0rc1** or above.

 ### Install Parakeet
+```bash
+pip install -U paddle-parakeet
+```

+or 
 ```bash
 git clone https://github.com/PaddlePaddle/Parakeet
 cd Parakeet
 pip install -e .
 ```

-### Install CMUdict for nltk
-
-CMUdict from nltk is used to transform text into phonemes.
-
-```python
-import nltk
-nltk.download("punkt")
-nltk.download("cmudict")
-```
+See [install](https://paddle-parakeet.readthedocs.io/en/latest/install.html) for more details.

 ## Examples

 Entries to the introduction, and the launch of training and synthsis for different example models:

 - [>>> WaveFlow](./examples/waveflow)
- [>>> Clarinet](./examples/clarinet)
 - [>>> WaveNet](./examples/wavenet)
- [>>> Deep Voice 3](./examples/deepvoice3)
 - [>>> Transformer TTS](./examples/transformer_tts)
- [>>> FastSpeech](./examples/fastspeech)
+- [>>> Tacotron2](./examples/tacotron2)


-## Pre-trained models and audio samples
+## Audio samples

-Parakeet also releases some well-trained parameters for the example models, which can be accessed in the following tables. Each column of these tables lists resources for one model, including the url link to the pre-trained model, the dataset that the model is trained on, and synthesized audio samples based on the pre-trained model. Click each model name to download, then you can get the compressed package which contains the pre-trained model and the `yaml` config describing how the model is trained.
+### TTS models (Acoustic Model + Neural Vocoder)

-#### Vocoders
-
-We provide the model checkpoints of WaveFlow with 64, 96 and 128 residual channels, ClariNet and WaveNet.
-
-<div align="center">
-<table>
-    <thead>
-        <tr>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 64)</a>
-            </th>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 96)</a>
-            </th>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 128)</a>
-            </th>
-        </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <th>LJSpeech </th>
-            <th>LJSpeech </th>
-            <th>LJSpeech </th>
-        </tr>
-        <tr>
-            <th>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>
-            </th>
-            <th>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>
-            </th>
-            <th>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>
-            </th>
-        </tr>
-    </tbody>
-    <thead>
-        <tr>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_ckpt_1.0.zip">ClariNet</a>
-            </th>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_ckpt_1.0.zip">WaveNet</a>
-            </th>
-        </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <th>LJSpeech </th>
-            <th>LJSpeech </th>
-        </tr>
-        <tr>
-            <th>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>  
-            </th>
-            <th>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>  
-            </th>
-        </tr>
-    </tbody>
-</table>
-</div>
-
-
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Note:** The input mel spectrogams are from validation dataset, which are not seen during training.
-
-#### TTS models
-
-We also provide checkpoints for different end-to-end TTS models, and present the synthesized audio examples for some randomly chosen famous quotes. The corresponding texts are displayed as follows.
-
-||Text | From |
-|:-:|:-- | :--: |
-0|*Life was like a box of chocolates, you never know what you're gonna get.* | *Forrest Gump* |  
-1|*With great power there must come great responsibility.* | *Spider-Man*|
-2|*To be or not to be, that’s a question.*|*Hamlet*|
-3|*Death is just a part of life, something we're all destined to do.*| *Forrest Gump*|
-4|*Don’t argue with the people of strong determination, because they may change the fact!*| *William Shakespeare* |
-
-Users have the option to use different vocoders to convert the linear/mel spectrogam to the raw audio in TTS models. Taking this into account, we are going to release the checkpoints for TTS models adapted to different vocoders, including the [Griffin-Lim](https://ieeexplore.ieee.org/document/1164317) algorithm and some neural vocoders.
-
-##### 1) Griffin-Lim
-
-<div align="center">
-<table>
-    <thead>
-        <tr>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_1.0.zip">Transformer TTS</a>
-            </th>
-            <th  style="width: 250px">
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_ckpt_1.0.zip">FastSpeech</a>
-            </th>
-                    </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <th>LJSpeech </th>
-            <th>LJSpeech </th>
-        </tr>
-        <tr>
-            <th >
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>
-            </th>
-            <th >
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_0.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_1.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_2.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_3.wav">
-            <img src="images/audio_icon.png" width=250 /></a><br>
-            <a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_4.wav">
-            <img src="images/audio_icon.png" width=250 /></a>
-            </th>
-        </tr>
-    </tbody>
-    <thead>
-</table>
-</div>
-
-##### 2) Neural vocoders
-
-under preparation
+Check our [website](https://paddle-parakeet.readthedocs.io/en/latest/demo.html) for audio sampels.

 ## Copyright and License

--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -1,20 +0,0 @@
-.. parakeet documentation master file, created by
-   sphinx-quickstart on Thu Dec 17 20:01:34 2020.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
-
-Welcome to parakeet's documentation!
-====================================
-
-.. toctree::
-   :maxdepth: 2
-   :caption: Contents:
-
-
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
--- a/docs/Makefile
+++ b/docs/Makefile
--- a/docs/make.bat
+++ b/docs/make.bat
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -0,0 +1 @@
+paddlepaddle==2.0.0.rc1
--- a/docs/source/advanced.rst
+++ b/docs/source/advanced.rst
@ -0,0 +1,155 @@
+======================
+Advanced Usage
+======================
+
+This sections covers how to extend parakeet by implementing your own models and 
+experiments. Guidelines on implementation are also elaborated.
+
+Model
+-------------
+
+As a common practice with paddlepaddle, models are implemented as subclasses
+of ``paddle.nn.Layer``. Models could be simple, like a single layer RNN. For 
+complicated models, it is recommended to split the model into different 
+components.
+
+For a encoder-decoder model, it is natural to split it into the encoder and 
+the decoder. For a model composed of several similar layers, it is natural to 
+extract the sublayer as a separate layer.
+
+There are two common ways to define a model which consists of several modules.
+
+#. Define a module given the specifications. Here is an example with multilayer 
+   perceptron.
+
+   .. code-block:: python
+
+      class MLP(nn.Layer):
+          def __init__(self, input_size, hidden_size, output_size):
+              self.linear1 = nn.Linear(input_size, hidden_size)
+              self.linear2 = nn.Linear(hidden_size, output_size)
+              
+          def forward(self, x):
+              return self.linear2(paddle.tanh(self.linear1(x))
+
+      module = MLP(16, 32, 4) # intialize a module
+
+   When the module is intended to be a generic and reusable layer that can be 
+   integrated into a larger model, we prefer to define it in this way.
+
+   For considerations of readability and usability, we strongly recommend 
+   **NOT** to pack specifications into a single object. Here's an example below.
+
+   .. code-block:: python
+
+      class MLP(nn.Layer):
+          def __init__(self, hparams):
+              self.linear1 = nn.Linear(hparams.input_size, hparams.hidden_size)
+              self.linear2 = nn.Linear(hparams.hidden_size, hparams.output_size)
+              
+          def forward(self, x):
+              return self.linear2(paddle.tanh(self.linear1(x))
+
+   For a module defined in this way, it's harder for the user to initialize an 
+   instance. Users have to read the code to check what attributes are used.
+
+   Also, code in this style tend to be abused by passing a huge config object 
+   to initialize every module used in an experiment, thought each module may 
+   not need the whole configuration.
+   
+   We prefer to be explicit.
+
+#. Define a module as a combination given its components. Here is an example 
+   for a sequence-to-sequence model.
+
+   .. code-block:: python
+   
+      class Seq2Seq(nn.Layer):
+          def __init__(self, encoder, decoder):
+              self.encoder = encoder
+              self.decoder = decoder
+              
+          def forward(self, x):
+              encoder_output = self.encoder(x)
+              output = self.decoder(encoder_output)
+              return output
+      
+      encoder = Encoder(...)
+      decoder = Decoder(...)
+      model = Seq2Seq(encoder, decoder) # compose two components
+
+   When a model is a complicated and made up of several components, each of which 
+   has a separate functionality, and can be replaced by other components with the 
+   same functionality, we prefer to define it in this way.
+
+Data
+-------------
+
+Another critical componnet for a deep learning project is data. As a common 
+practice, we use the dataset and dataloader abstraction. 
+
+Dataset
+^^^^^^^^^^
+Dataset is the representation of a set of examples used by a project. In most of 
+the cases, dataset is a collection of examples. Dataset is an object which has 
+methods below.
+
+#. ``__len__``, to get the size of the dataset.
+#. ``__getitem__``, to get an example by key or index.
+
+Examples is a record consisting of several fields. In practice, we usually 
+represent it as a namedtuple for convenience, yet dict and user-defined object 
+are also supported.
+
+We define our own dataset by subclassing ``paddle.io.Dataset``.
+
+DataLoader
+^^^^^^^^^^^
+In deep learning practice, models are trained with minibatches. DataLoader 
+meets the need for iterating the dataset in batches. It is done by providing 
+a sampler and a batch function in addition to a dataset.
+
+#. sampler, sample indices or keys used to get examples from the dataset.
+#. batch function, transform a list of examples into a batch.
+
+An commonly used sampler is ``RandomSampler``, it shuffles all the valid 
+indices and then iterate over them sequentially. ``DistributedBatchSampler`` is 
+a sampler used for distributed data parallel training, when the sampler handles 
+data sharding in a dynamic way.
+
+Batch function is used to transform selected examples into a batch. For a simple 
+case where an example is composed of several fields, each of which is represented 
+by an fixed size array, batch function can be simply stacking each field. For 
+cases where variable size arrays are included in the example, batching could 
+invlove padding and stacking. While in theory, batch function can do more like 
+randomly slicing, etc.
+
+For a custom dataset used for a custom model, it is required to define a batch 
+function for it.
+
+Config
+-------------
+
+It's common to change the running configuration to compare results. To keep track 
+of running configuration, we use ``yaml`` configuration files.
+
+Also, we want to interact with command line options. Some options that usually 
+change according to running environments is provided by command line arguments. 
+In addition, we want to override an option in the config file without editing 
+it. 
+
+Taking these requirements in to consideration, we use `yacs <https://github.com/rbgirshick/yacs>`_ 
+as a config management tool. Other tools like `omegaconf <https://github.com/omry/omegaconf>`_ 
+are also powerful and have similar functions.
+
+In each example provided, there is a ``config.py``, where the default config is 
+defined. If you want to get the default config, import ``config.py`` and call 
+``get_cfg_defaults()`` to get the default config. Then it can be updated with 
+yaml config file or command line arguments if needed.
+
+For details about how to use yacs in experiments, see `yacs <https://github.com/rbgirshick/yacs>`_.
+
+
+Experiment
+--------------
+
--- a/docs/source/basic.rst
+++ b/docs/source/basic.rst
@ -0,0 +1,69 @@
+===========
+Basic Usage
+===========
+
+This section shows how to use pretrained models provided by parakeet and make 
+inference with them.
+
+Pretrained models are provided in a archive. Extract it to get a folder like 
+this::
+
+    checkpoint_name/
+    ├──config.yaml
+    └──step-310000.pdparams
+
+The ``config.yaml`` stores the config used to train the model, the 
+``step-N.pdparams`` is the parameter file, where N is the steps it has been 
+trained.
+
+The example code below shows how to use the models for prediction.
+
+text to spectrogram
+^^^^^^^^^^^^^^^^^^^^^^
+
+The code below show how to use a transformer_tts model. After loading the 
+pretrained model, use ``model.predict(sentence)`` to generate spectrograms 
+(in numpy.ndarray format), which can be further used to synthesize raw audio
+with a vocoder.
+
+>>> import parakeet
+>>> from parakeet.frontend import English
+>>> from parakeet.models import TransformerTTS
+>>> from pathlib import Path
+>>> import yacs
+>>> 
+>>> # load the pretrained model
+>>> frontend = English()
+>>> checkpoint_dir = Path("transformer_tts_pretrained")
+>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
+>>> checkpoint_path = str(checkpoint_dir / "step-310000")
+>>> model = TransformerTTS.from_pretrained(
+>>>     frontend, config, checkpoint_path)
+>>> model.eval()
+>>> 
+>>> # text to spectrogram
+>>> sentence = "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition"
+>>> outputs = model.predict(sentence, verbose=args.verbose)
+>>> mel_output = outputs["mel_output"]
+
+vocoder
+^^^^^^^^^^
+
+Like the example above, after loading the pretrained ``ConditionalWaveFlow`` 
+model, call ``model.predict(mel)`` to synthesize raw audio (in wav format).
+
+>>> import soundfile as df
+>>> from parakeet.models import ConditionalWaveFlow
+>>> 
+>>> # load the pretrained model
+>>> checkpoint_dir = Path("waveflow_pretrained")
+>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
+>>> checkpoint_path = str(checkpoint_dir / "step-2000000")
+>>> vocoder = ConditionalWaveFlow.from_pretrained(config, checkpoint_path)
+>>> vocoder.eval()
+>>> 
+>>> # synthesize
+>>> audio = vocoder.predict(mel_output)
+>>> sf.write(audio_path, audio, config.data.sample_rate)
+
+For more details on how to use the model, please refer the documentation.
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -24,9 +24,10 @@
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #
-# import os
-# import sys
-# sys.path.insert(0, os.path.abspath('.'))
+import os
+import sys
+sys.path.insert(0, os.path.abspath('../..'))
+autodoc_mock_imports = ["soundfile", "librosa"]

 # -- Project information -----------------------------------------------------

@ -48,6 +49,7 @@ extensions = [
    "sphinx_rtd_theme",
    'sphinx.ext.mathjax',
    'numpydoc',
+    'sphinx.ext.autosummary',
 ]

 # Add any paths that contain templates here, relative to this directory.
@ -63,8 +65,10 @@ exclude_patterns = []
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
+
 html_theme = "sphinx_rtd_theme"

+
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
--- a/docs/source/demo.rst
+++ b/docs/source/demo.rst
@ -0,0 +1,143 @@
+Audio Sample 
+==================
+
+TTS audio samples
+-------------------
+
+Audio samples generated by a TTS system. Text is first transformed into spectrogram 
+by a text-to-spectrogram model, then the spectrogram is converted into raw audio by 
+a vocoder.
+
+.. raw:: html
+
+    <embed>
+    <table>
+        <tr>
+            <th  align="left"> TransformerTTS + WaveFlow</th>
+            <th  align="left"> Tacotron2 + WaveFlow </th>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_1.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_2.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_3.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_4.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_5.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_6.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_7.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_8.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_9.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_2.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_3.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_4.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_5.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_6.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_7.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_8.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_9.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        </tabel>
+    </table>
+    </embed>
+
+
+Vocoder audio samples
+--------------------------
+
+Audio samples generated from ground-truth spectrograms with a vocoder.
+
+
--- a/docs/source/design.rst
+++ b/docs/source/design.rst
@ -0,0 +1,4 @@
+==============================
+Design of Parakeet
+==============================
+
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -0,0 +1,55 @@
+.. parakeet documentation master file, created by
+   sphinx-quickstart on Thu Dec 17 20:01:34 2020.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Parakeet 
+====================================
+
+``parakeet`` is a deep learning based text-to-speech toolkit built upon ``paddlepaddle`` framework. It aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It includes many influential TTS models proposed by `Baidu Research <http://research.baidu.com>`_ and other research groups. 
+
+``parakeet`` mainly consists of components below.
+
+#. Implementation of models and commonly used neural network layers.
+#. Dataset abstraction and common data preprocessing pipelines.
+#. Ready-to-run experiments.
+
+.. toctree::
+    :caption: Getting started
+    :maxdepth: 1
+
+    install
+    basic
+    advanced 
+
+.. toctree::
+    :caption: Demos
+    :maxdepth: 1
+    
+    demo
+
+.. toctree::
+    :caption: Design of Parakeet
+    :maxdepth: 1
+    
+    design
+
+.. toctree::
+    :caption: Documentation
+    :maxdepth: 1
+
+    parakeet.audio
+    parakeet.data
+    parakeet.datasets
+    parakeet.frontend
+    parakeet.modules
+    parakeet.models
+    parakeet.training
+    parakeet.utils
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
--- a/docs/source/install.rst
+++ b/docs/source/install.rst
@ -0,0 +1,83 @@
+=============
+Installation
+=============
+
+
+Install PaddlePaddle
+------------------------
+Parakeet requires PaddlePaddle as its backend. Note that 2.0.0rc1 or newer versions
+of paddle is required.
+
+Since paddlepaddle has multiple packages depending on the device (cpu or gpu) 
+and the dependency libraries, it is recommended to install a proper package of 
+paddlepaddle with respect to the device and dependency library versons via 
+pip. 
+
+Installing paddlepaddle with conda or build paddlepaddle from source is also 
+supported. Please refer to `PaddlePaddle installation <https://www.paddlepaddle.org.cn/install/quick/)>`_ for more details.
+
+Example instruction to install paddlepaddle via pip is listed below.
+
+**PaddlePaddle with gpu**
+
+.. code-block:: bash
+
+    python -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
+    python -m pip install paddlepaddle-gpu==2.0.0rc1.post100 -f https://paddlepaddle.org.cn/whl/stable.html
+
+
+**PaddlePaddle with cpu**
+
+.. code-block:: bash
+
+    python -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
+
+
+Install libsndfile
+-------------------
+
+Experimemts in parakeet often involve audio and spectrum processing, thus 
+``librosa`` and ``soundfile`` are required. ``soundfile`` requires a extra 
+C library ``libsndfile``, which is not always handled by pip.
+
+For windows and mac users, ``libsndfile`` is also installed when installing
+``soundfile`` via pip, but for linux users, installing ``libsndfile`` via
+system package manager is required. Example commands for popular distributions 
+are listed below.
+
+.. code-block:: 
+
+    # ubuntu, debian
+    sudo apt-get install libsndfile1
+
+    # centos, fedora
+    sudo yum install libsndfile
+
+    # openSUSE
+    sudo zypper in libsndfile
+
+For any problem with installtion of soundfile, please refer to 
+`SoundFile <https://pypi.org/project/SoundFile>`_.
+
+Install Parakeet
+------------------
+
+There are two ways to install parakeet according to the purpose of using it.
+
+#. If you want to run experiments provided by parakeet or add new models and 
+   experiments, it is recommended to clone the project from github 
+   (`Parakeet <https://github.com/PaddlePaddle/Parakeet>`_), and install it in 
+   editable mode.
+
+   .. code-block:: bash
+       
+       git clone https://github.com/PaddlePaddle/Parakeet
+       cd Parakeet
+       pip install -e .
+
+#. If you only need to use the models for inference by parakeet, install from
+   pypi is recommended.
+
+   .. code-block:: bash
+   
+       pip install paddle-parakeet
--- a/docs/source/modules.rst
+++ b/docs/source/modules.rst
--- a/docs/source/parakeet.audio.rst
+++ b/docs/source/parakeet.audio.rst
--- a/docs/source/parakeet.data.rst
+++ b/docs/source/parakeet.data.rst
--- a/docs/source/parakeet.datasets.rst
+++ b/docs/source/parakeet.datasets.rst
--- a/docs/source/parakeet.frontend.normalizer.rst
+++ b/docs/source/parakeet.frontend.normalizer.rst
--- a/docs/source/parakeet.frontend.rst
+++ b/docs/source/parakeet.frontend.rst
--- a/docs/source/parakeet.models.rst
+++ b/docs/source/parakeet.models.rst
--- a/docs/source/parakeet.modules.rst
+++ b/docs/source/parakeet.modules.rst
--- a/docs/source/parakeet.rst
+++ b/docs/source/parakeet.rst
--- a/docs/source/parakeet.training.rst
+++ b/docs/source/parakeet.training.rst
--- a/docs/source/parakeet.utils.rst
+++ b/docs/source/parakeet.utils.rst
--- a/docs_cn/config_cn.md
+++ b/docs_cn/config_cn.md
--- a/docs_cn/data_cn.md
+++ b/docs_cn/data_cn.md
--- a/docs_cn/experiment_cn.md
+++ b/docs_cn/experiment_cn.md
--- a/docs_cn/experiment_guide_cn.md
+++ b/docs_cn/experiment_guide_cn.md
--- a/docs/installation_cn.md
+++ b/docs/installation_cn.md
@ -1,57 +1,63 @@
-# 安装
-
-[TOC]
+=============
+安装
+=============


-## 安装 PaddlePaddle
-
-Parakeet 以 PaddlePaddle 作为其后端，因此依赖 PaddlePaddle，值得说明的是 Parakeet 要求 2.0 及以上版本的 PaddlePaddle。你可以通过 pip 安装。如果需要安装支持 gpu 版本的 PaddlePaddle，需要根据环境中的 cuda 和 cudnn 的版本来选择 wheel 包的版本。使用 conda 安装以及源码编译安装的方式请参考 [PaddlePaddle 快速安装](https://www.paddlepaddle.org.cn/install/quick/zh/2.0rc-linux-pip).
+安装 PaddlePaddle
+-------------------
+Parakeet 以 PaddlePaddle 作为其后端，因此依赖 PaddlePaddle，值得说明的是 Parakeet 要求 2.0 及以上版本的 PaddlePaddle。你可以通过 pip 安装。如果需要安装支持 gpu 版本的 PaddlePaddle，需要根据环境中的 cuda 和 cudnn 的版本来选择 wheel 包的版本。使用 conda 安装以及源码编译安装的方式请参考 `PaddlePaddle 快速安装 <https://www.paddlepaddle.org.cn/install/quick/)>`_.

 **gpu 版 PaddlePaddle**

-```bash
-python -m pip install paddlepaddle-gpu==2.0.0rc0.post101 -f https://paddlepaddle.org.cn/whl/stable.html
-python -m pip install paddlepaddle-gpu==2.0.0rc0.post100 -f https://paddlepaddle.org.cn/whl/stable.html
-```
+.. code-block:: bash
+
+    python -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
+    python -m pip install paddlepaddle-gpu==2.0.0rc1.post100 -f https://paddlepaddle.org.cn/whl/stable.html
+

 **cpu 版 PaddlePaddle**

-```bash
-python -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
-```
+.. code-block:: bash

-## 安装 libsndfile
+    python -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple

-因为 Parakeet 的实验中常常会需要用到和音频处理，以及频谱处理相关的功能，所以我们依赖 librosa 和 soundfile 进行音频处理。而 librosa 和 soundfile  依赖一个 C 的库 libsndfile, 因为这不是 python 的包，对于 windows 用户和 mac 用户，使用 pip 安装 soundfile 的时候，libsndfile 也会被安装。如果遇到问题也可以参考 [SoundFile](https://pypi.org/project/SoundFile).
+
+安装 libsndfile
+-------------------
+
+因为 Parakeet 的实验中常常会需要用到和音频处理，以及频谱处理相关的功能，所以我们依赖 librosa 和 soundfile 进行音频处理。而 librosa 和 soundfile  依赖一个 C 的库 libsndfile, 因为这不是 python 的包，对于 windows 用户和 mac 用户，使用 pip 安装 soundfile 的时候，libsndfile 也会被安装。如果遇到问题也可以参考 `SoundFile <https://pypi.org/project/SoundFile>`_.

 对于 linux 用户，需要使用系统的包管理器安装这个包，常见发行版上的命令参考如下。


-```bash
-# ubuntu, debian
-sudo apt-get install libsndfile1
+.. code-block:: 

-# centos, fedora,
-sudo yum install libsndfile
+    # ubuntu, debian
+    sudo apt-get install libsndfile1

-# openSUSE
-sudo zypper in libsndfile
-```
+    # centos, fedora,
+    sudo yum install libsndfile

-## 安装 Parakeet
+    # openSUSE
+    sudo zypper in libsndfile


+安装 Parakeet
+------------------
+
 我们提供两种方式来使用 Parakeet.

-1. 需要运行 Parakeet 自带的实验代码，或者希望进行二次开发的用户，可以先从 github 克隆本工程，cd 仅工程目录，并进行可编辑式安装（不会被复制到 site-packages, 而且对工程的修改会立即生效，不需要重新安装），之后就可以使用了。
+#. 需要运行 Parakeet 自带的实验代码，或者希望进行二次开发的用户，可以先从 github 克隆本工程，cd 仅工程目录，并进行可编辑式安装（不会被复制到 site-packages, 而且对工程的修改会立即生效，不需要重新安装），之后就可以使用了。

-    ```bash
-    # -e 表示可编辑式安装
-    pip install -e .
-    ```
+   .. code-block:: bash

-2. 仅需要使用我们提供的训练好的模型进行预测，那么也可以直接安装 pypi 上的 wheel 包的版本。
+     # -e 表示可编辑式安装
+     pip install -e .
+
+
+#. 仅需要使用我们提供的训练好的模型进行预测，那么也可以直接安装 pypi 上的 wheel 包的版本。
+
+   .. code-block:: bash
+
+     pip install paddle-parakeet

-    ```bash
-    pip install paddle-parakeet
-    ```
--- a/docs_cn/overview_cn.md
+++ b/docs_cn/overview_cn.md
--- a/images/logo-small.png
+++ b/images/logo-small.png
--- a/parakeet/data/init.py
+++ b/parakeet/data/init.py
@ -1,3 +1,5 @@
+"""Parakeet's infrastructure for data processing.
+"""
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@ -12,5 +14,5 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from .dataset import *
-from .batch import *
+from parakeet.data.dataset import *
+from parakeet.data.batch import *