add tutorials into sdvanced

2021-01-14 15:12:36 +08:00 · 2021-01-14 15:12:36 +08:00 · 73374528d0
parent b017c73100
commit 73374528d0
4 changed files with 111 additions and 45 deletions
--- a/docs/source/advanced.rst
+++ b/docs/source/advanced.rst
@ -9,16 +9,18 @@ Model
 -------------

 As a common practice with paddlepaddle, models are implemented as subclasses
-of ``paddle.nn.Layer``. More complicated models, it is recommended to split 
-the model into different components.
+of ``paddle.nn.Layer``. Models could be simple, like a single layer RNN. For 
+complicated models, it is recommended to split the model into different 
+components.

 For a encoder-decoder model, it is natural to split it into the encoder and 
 the decoder. For a model composed of several similar layers, it is natural to 
-extract the sublayer as a seperate layer.
+extract the sublayer as a separate layer.

 There are two common ways to define a model which consists of several modules.

-#. Define a module given the specifications.
+#. Define a module given the specifications. Here is an example with multilayer 
+   perceptron.

   .. code-block:: python

@ -32,11 +34,11 @@ There are two common ways to define a model which consists of several modules.

      module = MLP(16, 32, 4) # intialize a module

-   When the module is intended to be a generic reusable layer that can be 
+   When the module is intended to be a generic and reusable layer that can be 
   integrated into a larger model, we prefer to define it in this way.

-   For considerations of readability and usability, we strongly recommend **NOT** to 
-   pack specifications into a single object. Here's an example below.
+   For considerations of readability and usability, we strongly recommend 
+   **NOT** to pack specifications into a single object. Here's an example below.

   .. code-block:: python

@ -48,16 +50,17 @@ There are two common ways to define a model which consists of several modules.
          def forward(self, x):
              return self.linear2(paddle.tanh(self.linear1(x))

-   For a module defined in this way, it's harder for the user to initialize a 
-   instance. The user have to read the code to check what attributes are used.
+   For a module defined in this way, it's harder for the user to initialize an 
+   instance. Users have to read the code to check what attributes are used.

-   Code in this style tend to pass a huge config object to initialize every 
-   module used in an experiment, thought each module may not need the whole 
-   configuration.
+   Also, code in this style tend to be abused by passing a huge config object 
+   to initialize every module used in an experiment, thought each module may 
+   not need the whole configuration.
   
   We prefer to be explicit.

-#. Define a module as a combination given its components.
+#. Define a module as a combination given its components. Here is an example 
+   for a sequence-to-sequence model.

   .. code-block:: python
   
@ -75,15 +78,78 @@ There are two common ways to define a model which consists of several modules.
      decoder = Decoder(...)
      model = Seq2Seq(encoder, decoder) # compose two components

-   When a model is a complicated one made up of several components, each of which 
+   When a model is a complicated and made up of several components, each of which 
   has a separate functionality, and can be replaced by other components with the 
   same functionality, we prefer to define it in this way.

 Data
 -------------

+Another critical componnet for a deep learning project is data. As a common 
+practice, we use the dataset and dataloader abstraction. 
+
+Dataset
+^^^^^^^^^^
+Dataset is the representation of a set of examples used for a projet. In most of 
+the cases, dataset is a collection of examples. Dataset is an object which has 
+methods below.
+
+#. ``__len__``, to get the size of the dataset.
+#. ``__getitem__``, to get an example by key or index.
+
+Examples is a record consisting of several fields. In practice, we usually 
+represent it as a namedtuple for convenience, yet dict and user-defined object 
+are also supported.
+
+We define our own dataset by subclassing ``paddle.io.Dataset``.
+
+DataLoader
+^^^^^^^^^^^
+In deep learning practice, models are trained with minibatches. DataLoader 
+meets the need for iterating the dataset in batches. It is done by providing 
+a sampler and a batch function in addition to a dataset.
+
+#. sampler, sample indices or keys used to get examples from the dataset.
+#. batch function, transform a list of examples into a batch.
+
+An commonly used sampler is ``RandomSampler``, it shuffles all the valid 
+indices and then iterate over them sequentially. ``DistributedBatchSampler`` is 
+a sampler used for distributed data parallel training, when the sampler handles 
+data sharding in a dynamic way.
+
+Batch function is used to transform selected examples into a batch. For a simple 
+case where an example is composed of several fields, each of which is represented 
+by an fixed size array, batch function can be simply stacking each field. For 
+cases where variable size arrays are included in the example, batching could 
+invlove padding and stacking. While in theory, batch function can do more like 
+randomly slicing, etc.
+
+For a custom dataset used for a custom model, it is required to define a batch 
+function for it.
+
 Config
 -------------

+It's common to change the running configuration to compare results. To keep track 
+of running configuration, we use ``yaml`` configuration files.
+
+Also, we want to interact with command line options. Some options that usually 
+change according to running environments is provided by command line arguments. 
+In addition, we wan to override an option in the config file without editing 
+it. 
+
+Taking these requirements in to consideration, we use `yacs <https://github.com/rbgirshick/yacs>`_ 
+as a confi management tool. Other tools like `omegaconf <https://github.com/omry/omegaconf>`_ 
+are also powerful and have similar functions.
+
+In each example provided, there is a ``config.py``, where the default config is 
+defined. If you want to get the default config, import ``config.py`` and call 
+``get_cfg_defaults()`` to get the default config. Then it can be updated with 
+yaml config file or command line arguments if needed.
+
+For details about how to use yacs in experiments, see `yacs <https://github.com/rbgirshick/yacs>`_.
+
+
 Experiment
 --------------
+
--- a/docs/source/tutorials.rst
+++ b/docs/source/tutorials.rst
@ -1,24 +1,30 @@
 ===========
-Tutorials
-===========
-
 Basic Usage
-------------------
+===========

-Pretrained models are provided in a archive. Extract it to get a folder like this::
+This section shows how to use pretrained models provided by parakeet and make 
+inference with them.
+
+Pretrained models are provided in a archive. Extract it to get a folder like 
+this::

    checkpoint_name/
    ├──config.yaml
    └──step-310000.pdparams

-The ``config.yaml`` stores the config used to train the model, the ``step-N.pdparams`` is the parameter file, where N is the steps it has been trained.
+The ``config.yaml`` stores the config used to train the model, the 
+``step-N.pdparams`` is the parameter file, where N is the steps it has been 
+trained.

 The example code below shows how to use the models for prediction.

 text to spectrogram
 ^^^^^^^^^^^^^^^^^^^^^^

-The code below show how to use a transformer_tts model. After loading the pretrained model, use ``model.predict(sentence)`` to generate spectrogram (in numpy.ndarray format), which can be further used to synthesize waveflow.
+The code below show how to use a transformer_tts model. After loading the 
+pretrained model, use ``model.predict(sentence)`` to generate spectrograms 
+(in numpy.ndarray format), which can be further used to synthesize raw audio
+with a vocoder.

 >>> import parakeet
 >>> from parakeet.frontend import English
@ -43,7 +49,8 @@ The code below show how to use a transformer_tts model. After loading the pretra
 vocoder
 ^^^^^^^^^^

-Like the example above, after loading the pretrained ConditionalWaveFlow model, call ``model.predict(mel)`` to synthesize waveflow (in numpy.ndarray format).
+Like the example above, after loading the pretrained ``ConditionalWaveFlow`` 
+model, call ``model.predict(mel)`` to synthesize raw audio (in wav format).

 >>> import soundfile as df
 >>> from parakeet.models import ConditionalWaveFlow
@ -60,8 +67,3 @@ Like the example above, after loading the pretrained ConditionalWaveFlow model,
 >>> sf.write(audio_path, audio, config.data.sample_rate)

 For more details on how to use the model, please refer the documentation.
-
-
-
-
-
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -19,7 +19,7 @@ Parakeet
    :maxdepth: 1

    install
-    tutorials
+    basic
    advanced 

 .. toctree::
--- a/docs/source/install.rst
+++ b/docs/source/install.rst
@ -4,8 +4,8 @@ Installation


 Install PaddlePaddle
-------------------
-Parakeet requires PaddlePaddle as its backend. Not that 2.0rc or newer versions
+------------------------
+Parakeet requires PaddlePaddle as its backend. Not that 2.0.0rc1 or newer versions
 of paddle is required.

 Since paddlepaddle has multiple packages depending on the device (cpu or gpu) 
@ -50,7 +50,7 @@ are listed below.
    # ubuntu, debian
    sudo apt-get install libsndfile1

-    # centos, fedora,
+    # centos, fedora
    sudo yum install libsndfile

    # openSUSE
@ -64,7 +64,7 @@ Install Parakeet

 There are two ways to install parakeet according to the purpose of using it.

-1. If you want to run experiments provided by parakeet or add new models and 
+#. If you want to run experiments provided by parakeet or add new models and 
   experiments, it is recommended to clone the project from github 
   (`Parakeet <https://github.com/PaddlePaddle/Parakeet>`_), and install it in 
   editable mode.
@ -75,11 +75,9 @@ editable mode.
       cd Parakeet
       pip install -e .

-
 #. If you only need to use the models for inference by parakeet, install from
-pypi is recommended。
+   pypi is recommended.

   .. code-block:: bash
   
       pip install paddle-parakeet
-