Merge branch 'develop' into release/v0.2

This commit is contained in:
iclementine 2021-01-19 17:08:06 +08:00
commit 5eebbd0716
32 changed files with 603 additions and 257 deletions

30
.readthedocs.yml Normal file
View File

@ -0,0 +1,30 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py
# Build documentation with MkDocs
#mkdocs:
# configuration: mkdocs.yml
# Optionally build your docs in additional formats such as PDF
formats: []
# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- doc
- requirements: docs/requirements.txt

212
README.md
View File

@ -3,7 +3,7 @@
Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by [Baidu Research](http://research.baidu.com) and other research groups.
<div align="center">
<img src="images/logo.png" width=450 /> <br>
<img src="images/logo.png" width=300 /> <br>
</div>
In particular, it features the latest [WaveFlow](https://arxiv.org/abs/1912.01219) model proposed by Baidu Research.
@ -18,17 +18,15 @@ In order to facilitate exploiting the existing TTS models directly and developin
- Vocoders
- [WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)
- [ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech](https://arxiv.org/abs/1807.07281)
- [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499)
- TTS models
- [Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
- [Neural Speech Synthesis with Transformer Network (Transformer TTS)](https://arxiv.org/abs/1809.08895)
- [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263)
- [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](arxiv.org/abs/1712.05884)
And more will be added in the future.
See the [guide](docs/experiment_guide.md) for details about how to build your own model and experiment in Parakeet.
## Setup
@ -40,221 +38,37 @@ sudo apt-get install libsndfile1
### Install PaddlePaddle
See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires PaddlePaddle **1.8.2** or above.
See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires PaddlePaddle **2.0.0rc1** or above.
### Install Parakeet
```bash
pip install -U paddle-parakeet
```
or
```bash
git clone https://github.com/PaddlePaddle/Parakeet
cd Parakeet
pip install -e .
```
### Install CMUdict for nltk
CMUdict from nltk is used to transform text into phonemes.
```python
import nltk
nltk.download("punkt")
nltk.download("cmudict")
```
See [install](https://paddle-parakeet.readthedocs.io/en/latest/install.html) for more details.
## Examples
Entries to the introduction, and the launch of training and synthsis for different example models:
- [>>> WaveFlow](./examples/waveflow)
- [>>> Clarinet](./examples/clarinet)
- [>>> WaveNet](./examples/wavenet)
- [>>> Deep Voice 3](./examples/deepvoice3)
- [>>> Transformer TTS](./examples/transformer_tts)
- [>>> FastSpeech](./examples/fastspeech)
- [>>> Tacotron2](./examples/tacotron2)
## Pre-trained models and audio samples
## Audio samples
Parakeet also releases some well-trained parameters for the example models, which can be accessed in the following tables. Each column of these tables lists resources for one model, including the url link to the pre-trained model, the dataset that the model is trained on, and synthesized audio samples based on the pre-trained model. Click each model name to download, then you can get the compressed package which contains the pre-trained model and the `yaml` config describing how the model is trained.
### TTS models (Acoustic Model + Neural Vocoder)
#### Vocoders
We provide the model checkpoints of WaveFlow with 64, 96 and 128 residual channels, ClariNet and WaveNet.
<div align="center">
<table>
<thead>
<tr>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 64)</a>
</th>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 96)</a>
</th>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_ckpt_1.0.zip">WaveFlow (res. channels 128)</a>
</th>
</tr>
</thead>
<tbody>
<tr>
<th>LJSpeech </th>
<th>LJSpeech </th>
<th>LJSpeech </th>
</tr>
<tr>
<th>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_samples_1.0/step_3020k_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
<th>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res96_ljspeech_samples_1.0/step_2000k_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
<th>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_samples_1.0/step_2000k_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
</tr>
</tbody>
<thead>
<tr>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_ckpt_1.0.zip">ClariNet</a>
</th>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_ckpt_1.0.zip">WaveNet</a>
</th>
</tr>
</thead>
<tbody>
<tr>
<th>LJSpeech </th>
<th>LJSpeech </th>
</tr>
<tr>
<th>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/clarinet_ljspeech_samples_1.0/step_500000_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
<th>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/wavenet_ljspeech_samples_1.0/step_2450k_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
</tr>
</tbody>
</table>
</div>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Note:** The input mel spectrogams are from validation dataset, which are not seen during training.
#### TTS models
We also provide checkpoints for different end-to-end TTS models, and present the synthesized audio examples for some randomly chosen famous quotes. The corresponding texts are displayed as follows.
||Text | From |
|:-:|:-- | :--: |
0|*Life was like a box of chocolates, you never know what you're gonna get.* | *Forrest Gump* |
1|*With great power there must come great responsibility.* | *Spider-Man*|
2|*To be or not to be, thats a question.*|*Hamlet*|
3|*Death is just a part of life, something we're all destined to do.*| *Forrest Gump*|
4|*Dont argue with the people of strong determination, because they may change the fact!*| *William Shakespeare* |
Users have the option to use different vocoders to convert the linear/mel spectrogam to the raw audio in TTS models. Taking this into account, we are going to release the checkpoints for TTS models adapted to different vocoders, including the [Griffin-Lim](https://ieeexplore.ieee.org/document/1164317) algorithm and some neural vocoders.
##### 1) Griffin-Lim
<div align="center">
<table>
<thead>
<tr>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_1.0.zip">Transformer TTS</a>
</th>
<th style="width: 250px">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_ckpt_1.0.zip">FastSpeech</a>
</th>
</tr>
</thead>
<tbody>
<tr>
<th>LJSpeech </th>
<th>LJSpeech </th>
</tr>
<tr>
<th >
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_griffin-lim_samples_1.0/step_120000_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
<th >
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_0.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_1.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_2.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_3.wav">
<img src="images/audio_icon.png" width=250 /></a><br>
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech_ljspeech_griffin-lim_samples_1.0/step_162000_sentence_4.wav">
<img src="images/audio_icon.png" width=250 /></a>
</th>
</tr>
</tbody>
<thead>
</table>
</div>
##### 2) Neural vocoders
under preparation
Check our [website](https://paddle-parakeet.readthedocs.io/en/latest/demo.html) for audio sampels.
## Copyright and License

View File

@ -1,20 +0,0 @@
.. parakeet documentation master file, created by
sphinx-quickstart on Thu Dec 17 20:01:34 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to parakeet's documentation!
====================================
.. toctree::
:maxdepth: 2
:caption: Contents:
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

1
docs/requirements.txt Normal file
View File

@ -0,0 +1 @@
paddlepaddle==2.0.0.rc1

155
docs/source/advanced.rst Normal file
View File

@ -0,0 +1,155 @@
======================
Advanced Usage
======================
This sections covers how to extend parakeet by implementing your own models and
experiments. Guidelines on implementation are also elaborated.
Model
-------------
As a common practice with paddlepaddle, models are implemented as subclasses
of ``paddle.nn.Layer``. Models could be simple, like a single layer RNN. For
complicated models, it is recommended to split the model into different
components.
For a encoder-decoder model, it is natural to split it into the encoder and
the decoder. For a model composed of several similar layers, it is natural to
extract the sublayer as a separate layer.
There are two common ways to define a model which consists of several modules.
#. Define a module given the specifications. Here is an example with multilayer
perceptron.
.. code-block:: python
class MLP(nn.Layer):
def __init__(self, input_size, hidden_size, output_size):
self.linear1 = nn.Linear(input_size, hidden_size)
self.linear2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
return self.linear2(paddle.tanh(self.linear1(x))
module = MLP(16, 32, 4) # intialize a module
When the module is intended to be a generic and reusable layer that can be
integrated into a larger model, we prefer to define it in this way.
For considerations of readability and usability, we strongly recommend
**NOT** to pack specifications into a single object. Here's an example below.
.. code-block:: python
class MLP(nn.Layer):
def __init__(self, hparams):
self.linear1 = nn.Linear(hparams.input_size, hparams.hidden_size)
self.linear2 = nn.Linear(hparams.hidden_size, hparams.output_size)
def forward(self, x):
return self.linear2(paddle.tanh(self.linear1(x))
For a module defined in this way, it's harder for the user to initialize an
instance. Users have to read the code to check what attributes are used.
Also, code in this style tend to be abused by passing a huge config object
to initialize every module used in an experiment, thought each module may
not need the whole configuration.
We prefer to be explicit.
#. Define a module as a combination given its components. Here is an example
for a sequence-to-sequence model.
.. code-block:: python
class Seq2Seq(nn.Layer):
def __init__(self, encoder, decoder):
self.encoder = encoder
self.decoder = decoder
def forward(self, x):
encoder_output = self.encoder(x)
output = self.decoder(encoder_output)
return output
encoder = Encoder(...)
decoder = Decoder(...)
model = Seq2Seq(encoder, decoder) # compose two components
When a model is a complicated and made up of several components, each of which
has a separate functionality, and can be replaced by other components with the
same functionality, we prefer to define it in this way.
Data
-------------
Another critical componnet for a deep learning project is data. As a common
practice, we use the dataset and dataloader abstraction.
Dataset
^^^^^^^^^^
Dataset is the representation of a set of examples used by a project. In most of
the cases, dataset is a collection of examples. Dataset is an object which has
methods below.
#. ``__len__``, to get the size of the dataset.
#. ``__getitem__``, to get an example by key or index.
Examples is a record consisting of several fields. In practice, we usually
represent it as a namedtuple for convenience, yet dict and user-defined object
are also supported.
We define our own dataset by subclassing ``paddle.io.Dataset``.
DataLoader
^^^^^^^^^^^
In deep learning practice, models are trained with minibatches. DataLoader
meets the need for iterating the dataset in batches. It is done by providing
a sampler and a batch function in addition to a dataset.
#. sampler, sample indices or keys used to get examples from the dataset.
#. batch function, transform a list of examples into a batch.
An commonly used sampler is ``RandomSampler``, it shuffles all the valid
indices and then iterate over them sequentially. ``DistributedBatchSampler`` is
a sampler used for distributed data parallel training, when the sampler handles
data sharding in a dynamic way.
Batch function is used to transform selected examples into a batch. For a simple
case where an example is composed of several fields, each of which is represented
by an fixed size array, batch function can be simply stacking each field. For
cases where variable size arrays are included in the example, batching could
invlove padding and stacking. While in theory, batch function can do more like
randomly slicing, etc.
For a custom dataset used for a custom model, it is required to define a batch
function for it.
Config
-------------
It's common to change the running configuration to compare results. To keep track
of running configuration, we use ``yaml`` configuration files.
Also, we want to interact with command line options. Some options that usually
change according to running environments is provided by command line arguments.
In addition, we want to override an option in the config file without editing
it.
Taking these requirements in to consideration, we use `yacs <https://github.com/rbgirshick/yacs>`_
as a config management tool. Other tools like `omegaconf <https://github.com/omry/omegaconf>`_
are also powerful and have similar functions.
In each example provided, there is a ``config.py``, where the default config is
defined. If you want to get the default config, import ``config.py`` and call
``get_cfg_defaults()`` to get the default config. Then it can be updated with
yaml config file or command line arguments if needed.
For details about how to use yacs in experiments, see `yacs <https://github.com/rbgirshick/yacs>`_.
Experiment
--------------

69
docs/source/basic.rst Normal file
View File

@ -0,0 +1,69 @@
===========
Basic Usage
===========
This section shows how to use pretrained models provided by parakeet and make
inference with them.
Pretrained models are provided in a archive. Extract it to get a folder like
this::
checkpoint_name/
├──config.yaml
└──step-310000.pdparams
The ``config.yaml`` stores the config used to train the model, the
``step-N.pdparams`` is the parameter file, where N is the steps it has been
trained.
The example code below shows how to use the models for prediction.
text to spectrogram
^^^^^^^^^^^^^^^^^^^^^^
The code below show how to use a transformer_tts model. After loading the
pretrained model, use ``model.predict(sentence)`` to generate spectrograms
(in numpy.ndarray format), which can be further used to synthesize raw audio
with a vocoder.
>>> import parakeet
>>> from parakeet.frontend import English
>>> from parakeet.models import TransformerTTS
>>> from pathlib import Path
>>> import yacs
>>>
>>> # load the pretrained model
>>> frontend = English()
>>> checkpoint_dir = Path("transformer_tts_pretrained")
>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
>>> checkpoint_path = str(checkpoint_dir / "step-310000")
>>> model = TransformerTTS.from_pretrained(
>>> frontend, config, checkpoint_path)
>>> model.eval()
>>>
>>> # text to spectrogram
>>> sentence = "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition"
>>> outputs = model.predict(sentence, verbose=args.verbose)
>>> mel_output = outputs["mel_output"]
vocoder
^^^^^^^^^^
Like the example above, after loading the pretrained ``ConditionalWaveFlow``
model, call ``model.predict(mel)`` to synthesize raw audio (in wav format).
>>> import soundfile as df
>>> from parakeet.models import ConditionalWaveFlow
>>>
>>> # load the pretrained model
>>> checkpoint_dir = Path("waveflow_pretrained")
>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
>>> checkpoint_path = str(checkpoint_dir / "step-2000000")
>>> vocoder = ConditionalWaveFlow.from_pretrained(config, checkpoint_path)
>>> vocoder.eval()
>>>
>>> # synthesize
>>> audio = vocoder.predict(mel_output)
>>> sf.write(audio_path, audio, config.data.sample_rate)
For more details on how to use the model, please refer the documentation.

View File

@ -24,9 +24,10 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
autodoc_mock_imports = ["soundfile", "librosa"]
# -- Project information -----------------------------------------------------
@ -48,6 +49,7 @@ extensions = [
"sphinx_rtd_theme",
'sphinx.ext.mathjax',
'numpydoc',
'sphinx.ext.autosummary',
]
# Add any paths that contain templates here, relative to this directory.
@ -63,8 +65,10 @@ exclude_patterns = []
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".

143
docs/source/demo.rst Normal file
View File

@ -0,0 +1,143 @@
Audio Sample
==================
TTS audio samples
-------------------
Audio samples generated by a TTS system. Text is first transformed into spectrogram
by a text-to-spectrogram model, then the spectrogram is converted into raw audio by
a vocoder.
.. raw:: html
<embed>
<table>
<tr>
<th align="left"> TransformerTTS + WaveFlow</th>
<th align="left"> Tacotron2 + WaveFlow </th>
</tr>
<tr>
<td>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_1.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_2.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_3.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_4.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_5.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_6.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_7.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_8.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_waveflow_samples_0.2/sentence_9.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
</td>
<td>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_2.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_3.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_4.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_5.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_6.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_7.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_8.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
<audio controls="controls">
<source
src="https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_waveflow_samples_0.2/sentence_9.wav"
type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
</td>
</tr>
</tabel>
</table>
</embed>
Vocoder audio samples
--------------------------
Audio samples generated from ground-truth spectrograms with a vocoder.

4
docs/source/design.rst Normal file
View File

@ -0,0 +1,4 @@
==============================
Design of Parakeet
==============================

55
docs/source/index.rst Normal file
View File

@ -0,0 +1,55 @@
.. parakeet documentation master file, created by
sphinx-quickstart on Thu Dec 17 20:01:34 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Parakeet
====================================
``parakeet`` is a deep learning based text-to-speech toolkit built upon ``paddlepaddle`` framework. It aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It includes many influential TTS models proposed by `Baidu Research <http://research.baidu.com>`_ and other research groups.
``parakeet`` mainly consists of components below.
#. Implementation of models and commonly used neural network layers.
#. Dataset abstraction and common data preprocessing pipelines.
#. Ready-to-run experiments.
.. toctree::
:caption: Getting started
:maxdepth: 1
install
basic
advanced
.. toctree::
:caption: Demos
:maxdepth: 1
demo
.. toctree::
:caption: Design of Parakeet
:maxdepth: 1
design
.. toctree::
:caption: Documentation
:maxdepth: 1
parakeet.audio
parakeet.data
parakeet.datasets
parakeet.frontend
parakeet.modules
parakeet.models
parakeet.training
parakeet.utils
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

83
docs/source/install.rst Normal file
View File

@ -0,0 +1,83 @@
=============
Installation
=============
Install PaddlePaddle
------------------------
Parakeet requires PaddlePaddle as its backend. Note that 2.0.0rc1 or newer versions
of paddle is required.
Since paddlepaddle has multiple packages depending on the device (cpu or gpu)
and the dependency libraries, it is recommended to install a proper package of
paddlepaddle with respect to the device and dependency library versons via
pip.
Installing paddlepaddle with conda or build paddlepaddle from source is also
supported. Please refer to `PaddlePaddle installation <https://www.paddlepaddle.org.cn/install/quick/)>`_ for more details.
Example instruction to install paddlepaddle via pip is listed below.
**PaddlePaddle with gpu**
.. code-block:: bash
python -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
python -m pip install paddlepaddle-gpu==2.0.0rc1.post100 -f https://paddlepaddle.org.cn/whl/stable.html
**PaddlePaddle with cpu**
.. code-block:: bash
python -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
Install libsndfile
-------------------
Experimemts in parakeet often involve audio and spectrum processing, thus
``librosa`` and ``soundfile`` are required. ``soundfile`` requires a extra
C library ``libsndfile``, which is not always handled by pip.
For windows and mac users, ``libsndfile`` is also installed when installing
``soundfile`` via pip, but for linux users, installing ``libsndfile`` via
system package manager is required. Example commands for popular distributions
are listed below.
.. code-block::
# ubuntu, debian
sudo apt-get install libsndfile1
# centos, fedora
sudo yum install libsndfile
# openSUSE
sudo zypper in libsndfile
For any problem with installtion of soundfile, please refer to
`SoundFile <https://pypi.org/project/SoundFile>`_.
Install Parakeet
------------------
There are two ways to install parakeet according to the purpose of using it.
#. If you want to run experiments provided by parakeet or add new models and
experiments, it is recommended to clone the project from github
(`Parakeet <https://github.com/PaddlePaddle/Parakeet>`_), and install it in
editable mode.
.. code-block:: bash
git clone https://github.com/PaddlePaddle/Parakeet
cd Parakeet
pip install -e .
#. If you only need to use the models for inference by parakeet, install from
pypi is recommended.
.. code-block:: bash
pip install paddle-parakeet

View File

@ -1,57 +1,63 @@
# 安装
[TOC]
=============
安装
=============
## 安装 PaddlePaddle
Parakeet 以 PaddlePaddle 作为其后端,因此依赖 PaddlePaddle值得说明的是 Parakeet 要求 2.0 及以上版本的 PaddlePaddle。你可以通过 pip 安装。如果需要安装支持 gpu 版本的 PaddlePaddle需要根据环境中的 cuda 和 cudnn 的版本来选择 wheel 包的版本。使用 conda 安装以及源码编译安装的方式请参考 [PaddlePaddle 快速安装](https://www.paddlepaddle.org.cn/install/quick/zh/2.0rc-linux-pip).
安装 PaddlePaddle
-------------------
Parakeet 以 PaddlePaddle 作为其后端,因此依赖 PaddlePaddle值得说明的是 Parakeet 要求 2.0 及以上版本的 PaddlePaddle。你可以通过 pip 安装。如果需要安装支持 gpu 版本的 PaddlePaddle需要根据环境中的 cuda 和 cudnn 的版本来选择 wheel 包的版本。使用 conda 安装以及源码编译安装的方式请参考 `PaddlePaddle 快速安装 <https://www.paddlepaddle.org.cn/install/quick/)>`_.
**gpu 版 PaddlePaddle**
```bash
python -m pip install paddlepaddle-gpu==2.0.0rc0.post101 -f https://paddlepaddle.org.cn/whl/stable.html
python -m pip install paddlepaddle-gpu==2.0.0rc0.post100 -f https://paddlepaddle.org.cn/whl/stable.html
```
.. code-block:: bash
python -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
python -m pip install paddlepaddle-gpu==2.0.0rc1.post100 -f https://paddlepaddle.org.cn/whl/stable.html
**cpu 版 PaddlePaddle**
```bash
python -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
```
.. code-block:: bash
## 安装 libsndfile
python -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
因为 Parakeet 的实验中常常会需要用到和音频处理,以及频谱处理相关的功能,所以我们依赖 librosa 和 soundfile 进行音频处理。而 librosa 和 soundfile 依赖一个 C 的库 libsndfile, 因为这不是 python 的包,对于 windows 用户和 mac 用户,使用 pip 安装 soundfile 的时候libsndfile 也会被安装。如果遇到问题也可以参考 [SoundFile](https://pypi.org/project/SoundFile).
安装 libsndfile
-------------------
因为 Parakeet 的实验中常常会需要用到和音频处理,以及频谱处理相关的功能,所以我们依赖 librosa 和 soundfile 进行音频处理。而 librosa 和 soundfile 依赖一个 C 的库 libsndfile, 因为这不是 python 的包,对于 windows 用户和 mac 用户,使用 pip 安装 soundfile 的时候libsndfile 也会被安装。如果遇到问题也可以参考 `SoundFile <https://pypi.org/project/SoundFile>`_.
对于 linux 用户,需要使用系统的包管理器安装这个包,常见发行版上的命令参考如下。
```bash
# ubuntu, debian
sudo apt-get install libsndfile1
.. code-block::
# centos, fedora,
sudo yum install libsndfile
# ubuntu, debian
sudo apt-get install libsndfile1
# openSUSE
sudo zypper in libsndfile
```
# centos, fedora,
sudo yum install libsndfile
## 安装 Parakeet
# openSUSE
sudo zypper in libsndfile
安装 Parakeet
------------------
我们提供两种方式来使用 Parakeet.
1. 需要运行 Parakeet 自带的实验代码,或者希望进行二次开发的用户,可以先从 github 克隆本工程cd 仅工程目录,并进行可编辑式安装(不会被复制到 site-packages, 而且对工程的修改会立即生效,不需要重新安装),之后就可以使用了。
#. 需要运行 Parakeet 自带的实验代码,或者希望进行二次开发的用户,可以先从 github 克隆本工程cd 仅工程目录,并进行可编辑式安装(不会被复制到 site-packages, 而且对工程的修改会立即生效,不需要重新安装),之后就可以使用了。
```bash
# -e 表示可编辑式安装
pip install -e .
```
.. code-block:: bash
2. 仅需要使用我们提供的训练好的模型进行预测,那么也可以直接安装 pypi 上的 wheel 包的版本。
# -e 表示可编辑式安装
pip install -e .
#. 仅需要使用我们提供的训练好的模型进行预测,那么也可以直接安装 pypi 上的 wheel 包的版本。
.. code-block:: bash
pip install paddle-parakeet
```bash
pip install paddle-parakeet
```

BIN
images/logo-small.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

View File

@ -1,3 +1,5 @@
"""Parakeet's infrastructure for data processing.
"""
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
@ -12,5 +14,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .dataset import *
from .batch import *
from parakeet.data.dataset import *
from parakeet.data.batch import *