add tutorials into sdvanced

This commit is contained in:
iclementine 2021-01-14 15:12:36 +08:00
parent b017c73100
commit 73374528d0
4 changed files with 111 additions and 45 deletions

View File

@ -9,16 +9,18 @@ Model
-------------
As a common practice with paddlepaddle, models are implemented as subclasses
of ``paddle.nn.Layer``. More complicated models, it is recommended to split
the model into different components.
of ``paddle.nn.Layer``. Models could be simple, like a single layer RNN. For
complicated models, it is recommended to split the model into different
components.
For a encoder-decoder model, it is natural to split it into the encoder and
the decoder. For a model composed of several similar layers, it is natural to
extract the sublayer as a seperate layer.
extract the sublayer as a separate layer.
There are two common ways to define a model which consists of several modules.
#. Define a module given the specifications.
#. Define a module given the specifications. Here is an example with multilayer
perceptron.
.. code-block:: python
@ -32,11 +34,11 @@ There are two common ways to define a model which consists of several modules.
module = MLP(16, 32, 4) # intialize a module
When the module is intended to be a generic reusable layer that can be
When the module is intended to be a generic and reusable layer that can be
integrated into a larger model, we prefer to define it in this way.
For considerations of readability and usability, we strongly recommend **NOT** to
pack specifications into a single object. Here's an example below.
For considerations of readability and usability, we strongly recommend
**NOT** to pack specifications into a single object. Here's an example below.
.. code-block:: python
@ -48,16 +50,17 @@ There are two common ways to define a model which consists of several modules.
def forward(self, x):
return self.linear2(paddle.tanh(self.linear1(x))
For a module defined in this way, it's harder for the user to initialize a
instance. The user have to read the code to check what attributes are used.
For a module defined in this way, it's harder for the user to initialize an
instance. Users have to read the code to check what attributes are used.
Code in this style tend to pass a huge config object to initialize every
module used in an experiment, thought each module may not need the whole
configuration.
Also, code in this style tend to be abused by passing a huge config object
to initialize every module used in an experiment, thought each module may
not need the whole configuration.
We prefer to be explicit.
#. Define a module as a combination given its components.
#. Define a module as a combination given its components. Here is an example
for a sequence-to-sequence model.
.. code-block:: python
@ -75,15 +78,78 @@ There are two common ways to define a model which consists of several modules.
decoder = Decoder(...)
model = Seq2Seq(encoder, decoder) # compose two components
When a model is a complicated one made up of several components, each of which
When a model is a complicated and made up of several components, each of which
has a separate functionality, and can be replaced by other components with the
same functionality, we prefer to define it in this way.
Data
-------------
Another critical componnet for a deep learning project is data. As a common
practice, we use the dataset and dataloader abstraction.
Dataset
^^^^^^^^^^
Dataset is the representation of a set of examples used for a projet. In most of
the cases, dataset is a collection of examples. Dataset is an object which has
methods below.
#. ``__len__``, to get the size of the dataset.
#. ``__getitem__``, to get an example by key or index.
Examples is a record consisting of several fields. In practice, we usually
represent it as a namedtuple for convenience, yet dict and user-defined object
are also supported.
We define our own dataset by subclassing ``paddle.io.Dataset``.
DataLoader
^^^^^^^^^^^
In deep learning practice, models are trained with minibatches. DataLoader
meets the need for iterating the dataset in batches. It is done by providing
a sampler and a batch function in addition to a dataset.
#. sampler, sample indices or keys used to get examples from the dataset.
#. batch function, transform a list of examples into a batch.
An commonly used sampler is ``RandomSampler``, it shuffles all the valid
indices and then iterate over them sequentially. ``DistributedBatchSampler`` is
a sampler used for distributed data parallel training, when the sampler handles
data sharding in a dynamic way.
Batch function is used to transform selected examples into a batch. For a simple
case where an example is composed of several fields, each of which is represented
by an fixed size array, batch function can be simply stacking each field. For
cases where variable size arrays are included in the example, batching could
invlove padding and stacking. While in theory, batch function can do more like
randomly slicing, etc.
For a custom dataset used for a custom model, it is required to define a batch
function for it.
Config
-------------
It's common to change the running configuration to compare results. To keep track
of running configuration, we use ``yaml`` configuration files.
Also, we want to interact with command line options. Some options that usually
change according to running environments is provided by command line arguments.
In addition, we wan to override an option in the config file without editing
it.
Taking these requirements in to consideration, we use `yacs <https://github.com/rbgirshick/yacs>`_
as a confi management tool. Other tools like `omegaconf <https://github.com/omry/omegaconf>`_
are also powerful and have similar functions.
In each example provided, there is a ``config.py``, where the default config is
defined. If you want to get the default config, import ``config.py`` and call
``get_cfg_defaults()`` to get the default config. Then it can be updated with
yaml config file or command line arguments if needed.
For details about how to use yacs in experiments, see `yacs <https://github.com/rbgirshick/yacs>`_.
Experiment
--------------

View File

@ -1,24 +1,30 @@
===========
Tutorials
===========
Basic Usage
-------------------
===========
Pretrained models are provided in a archive. Extract it to get a folder like this::
This section shows how to use pretrained models provided by parakeet and make
inference with them.
Pretrained models are provided in a archive. Extract it to get a folder like
this::
checkpoint_name/
├──config.yaml
└──step-310000.pdparams
The ``config.yaml`` stores the config used to train the model, the ``step-N.pdparams`` is the parameter file, where N is the steps it has been trained.
The ``config.yaml`` stores the config used to train the model, the
``step-N.pdparams`` is the parameter file, where N is the steps it has been
trained.
The example code below shows how to use the models for prediction.
text to spectrogram
^^^^^^^^^^^^^^^^^^^^^^
The code below show how to use a transformer_tts model. After loading the pretrained model, use ``model.predict(sentence)`` to generate spectrogram (in numpy.ndarray format), which can be further used to synthesize waveflow.
The code below show how to use a transformer_tts model. After loading the
pretrained model, use ``model.predict(sentence)`` to generate spectrograms
(in numpy.ndarray format), which can be further used to synthesize raw audio
with a vocoder.
>>> import parakeet
>>> from parakeet.frontend import English
@ -43,7 +49,8 @@ The code below show how to use a transformer_tts model. After loading the pretra
vocoder
^^^^^^^^^^
Like the example above, after loading the pretrained ConditionalWaveFlow model, call ``model.predict(mel)`` to synthesize waveflow (in numpy.ndarray format).
Like the example above, after loading the pretrained ``ConditionalWaveFlow``
model, call ``model.predict(mel)`` to synthesize raw audio (in wav format).
>>> import soundfile as df
>>> from parakeet.models import ConditionalWaveFlow
@ -60,8 +67,3 @@ Like the example above, after loading the pretrained ConditionalWaveFlow model,
>>> sf.write(audio_path, audio, config.data.sample_rate)
For more details on how to use the model, please refer the documentation.

View File

@ -19,7 +19,7 @@ Parakeet
:maxdepth: 1
install
tutorials
basic
advanced
.. toctree::

View File

@ -4,8 +4,8 @@ Installation
Install PaddlePaddle
-------------------
Parakeet requires PaddlePaddle as its backend. Not that 2.0rc or newer versions
------------------------
Parakeet requires PaddlePaddle as its backend. Not that 2.0.0rc1 or newer versions
of paddle is required.
Since paddlepaddle has multiple packages depending on the device (cpu or gpu)
@ -50,7 +50,7 @@ are listed below.
# ubuntu, debian
sudo apt-get install libsndfile1
# centos, fedora,
# centos, fedora
sudo yum install libsndfile
# openSUSE
@ -64,10 +64,10 @@ Install Parakeet
There are two ways to install parakeet according to the purpose of using it.
1. If you want to run experiments provided by parakeet or add new models and
experiments, it is recommended to clone the project from github
(`Parakeet <https://github.com/PaddlePaddle/Parakeet>`_), and install it in
editable mode.
#. If you want to run experiments provided by parakeet or add new models and
experiments, it is recommended to clone the project from github
(`Parakeet <https://github.com/PaddlePaddle/Parakeet>`_), and install it in
editable mode.
.. code-block:: bash
@ -75,11 +75,9 @@ editable mode.
cd Parakeet
pip install -e .
#. If you only need to use the models for inference by parakeet, install from
pypi is recommended。
pypi is recommended.
.. code-block:: bash
pip install paddle-parakeet