From 5bd64d3869a83c138e2438d37584029bb7260c70 Mon Sep 17 00:00:00 2001 From: TianYuan Date: Mon, 6 Sep 2021 12:10:01 +0000 Subject: [PATCH] update readme --- README.md | 79 +++++++++++++++---------- examples/fastspeech2/aishell3/README.md | 15 ++--- examples/fastspeech2/baker/README.md | 14 +++-- 3 files changed, 63 insertions(+), 45 deletions(-) diff --git a/README.md b/README.md index 1baf612..036f929 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,4 @@ # Parakeet - Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by [Baidu Research](http://research.baidu.com) and other research groups.
@@ -13,20 +12,29 @@ In particular, it features the latest [WaveFlow](https://arxiv.org/abs/1912.0121 - WaveFlow is directly trained with maximum likelihood without probability density distillation and auxiliary losses as used in Parallel WaveNet and ClariNet, which simplifies the training pipeline and reduces the cost of development. ## Overview - -In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides their reference implementations in PaddlePaddle. Further more, Parakeet abstracts the TTS pipeline and standardizes the procedure of data preprocessing, common modules sharing, model configuration, and the process of training and synthesis. The models supported here include Vocoders and end-to-end TTS models: +In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides their reference implementations in PaddlePaddle. Further more, Parakeet abstracts the TTS pipeline and standardizes the procedure of data preprocessing, common modules sharing, model configuration, and the process of training and synthesis. The models supported here include Vocoders and end-to-end Acoustic models: - Vocoders - - [WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219) + - [【Parallel WaveGAN】Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) + - [【WaveFlow】WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219) -- TTS models - - [Neural Speech Synthesis with Transformer Network (Transformer TTS)](https://arxiv.org/abs/1809.08895) - - [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](arxiv.org/abs/1712.05884) +- Acoustic models + - [【FastSpeech2】FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558) + - [【SpeedySpeech】SpeedySpeech: Efficient Neural Speech Synthesis](https://arxiv.org/abs/2008.03802) + - [【Transformer TTS】Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895) + - [【Tacotron2】Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) + +- Voice Conversion + - [【GE2E】Generalized End-to-End Loss for Speaker Verification](https://arxiv.org/abs/1710.10467) ## Updates - -May-07-2021, Add an example for voice cloning in Chinese. Check [examples/tacotron2_aishell3](./examples/tacotron2_aishell3). - +- Aug-31-2021, Add an example for Chinese Text Frontend](). Check [examples/text_frontend](./examples/text_frontend) +- Aug-23-2021, Add an example for FastSpeech2 with AISHELL-3. Check [fastspeech2/aishell3](./fastspeech2/aishell3) +- Aug-3-2021, Add an example for FastSpeech2 with CSMSC. Check [fastspeech2/baker](./fastspeech2/baker) +- Jul-19-2021, Add an example for SpeedySpeech with CSMSC. Check [speedyspeech/baker](./speedyspeech/baker) +- Jul-01-2021, Add an example for Parallel WaveGAN with CSMSC. Check [parallelwave_gan/baker](./parallelwave_gan/baker) +- Jul-01-2021, Add an example for usage of Montreal-Forced-Aligner. Check [examples/use_mfa](./examples/use_mfa). +- May-07-2021, Add an example for voice cloning in Chinese. Check [examples/tacotron2_aishell3](./examples/tacotron2_aishell3). ## Setup It's difficult to install some dependent libraries for this repo in Windows system, we recommend that you **DO NOT** use Windows system, please use `Linux`. @@ -36,9 +44,7 @@ Make sure the library `libsndfile1` is installed, e.g., on Ubuntu. ```bash sudo apt-get install libsndfile1 ``` - ### Install PaddlePaddle - See [install](https://www.paddlepaddle.org.cn/install/quick) for more details. This repo requires PaddlePaddle **2.1.2** or above. ### Install Parakeet @@ -62,41 +68,50 @@ sudo apt install -y python3.6-dev See [install](https://paddle-parakeet.readthedocs.io/en/latest/install.html) for more details. ## Examples - Entries to the introduction, and the launch of training and synthsis for different example models: -- [>>> WaveFlow](./examples/waveflow) -- [>>> Transformer TTS](./examples/transformer_tts) -- [>>> Tacotron2](./examples/tacotron2) +- [>>> Chinese Text Frontend](./examples/text_frontend) +- [>>> FastSpeech2](./examples/fastspeech2) +- [>>> Montreal-Forced-Aligner](./examples/use_mfa) +- [>>> Parallel WaveGAN](./parallelwave_gan) +- [>>> SpeedySpeech](.examples/speedyspeech) - [>>> Tacotron2_AISHELL3](./examples/tacotron2_aishell3) - [>>> GE2E](./examples/ge2e) - +- [>>> WaveFlow](./examples/waveflow) +- [>>> TransformerTTS](./examples/transformer_tts) +- [>>> Tacotron2](./examples/tacotron2) ## Audio samples - ### TTS models (Acoustic Model + Neural Vocoder) - Check our [website](https://paddle-parakeet.readthedocs.io/en/latest/demo.html) for audio sampels. - ## Checkpoints +### FastSpeech2 +1. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip) +2. [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) + +### Parallel WaveGAN +1. [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip) + +### SpeedySpeech +1. [speedyspeech_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_baker_ckpt_0.4.zip) + +### Tacotron2_AISHELL3 +1. [tacotron2_aishell3_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_aishell3_ckpt_0.3.zip) + +### GE2E +1. [ge2e_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/ge2e_ckpt_0.3.zip) + +### WaveFlow +1. [waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_ljspeech_ckpt_0.3.zip) + +### TransformerTTS +1. [transformer_tts_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.3.zip) ### Tacotron2 1. [tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip) 2. [tacotron2_ljspeech_ckpt_0.3_alternative.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3_alternative.zip) -### Tacotron2_AISHELL3 -1. [tacotron2_aishell3_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_aishell3_ckpt_0.3.zip) - -### TransformerTTS -1. [transformer_tts_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.3.zip) - -### WaveFlow -1. [waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_ljspeech_ckpt_0.3.zip) - -### GE2E -1. [ge2e_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/ge2e_ckpt_0.3.zip) - ## Copyright and License Parakeet is provided under the [Apache-2.0 license](LICENSE). diff --git a/examples/fastspeech2/aishell3/README.md b/examples/fastspeech2/aishell3/README.md index 7f33ae9..15dcbc6 100644 --- a/examples/fastspeech2/aishell3/README.md +++ b/examples/fastspeech2/aishell3/README.md @@ -1,13 +1,13 @@ - # FastSpeech2 with AISHELL-3 +This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [AISHELL-3](http://www.aishelltech.com/aishell_3). ## Introduction -[AISHELL-3](http://www.aishelltech.com/aishell_3) is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. +AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. We use AISHELL-3 to train a multi-speaker fastspeech2 model here. ## Dataset -### Download and Extract the datasaet. +### Download and Extract the datasaet Download AISHELL-3. ```bash wget https://www.openslr.org/resources/93/data_aishell3.tgz @@ -18,13 +18,11 @@ mkdir data_aishell3 tar zxvf data_aishell3.tgz -C data_aishell3 ``` -### Get MFA result of BZNSYP and Extract it. - +### Get MFA result of BZNSYP and Extract it We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) (use MFA1.x now) of our repo. -### Preprocess the dataset. - +### Preprocess the dataset Assume the path to the dataset is `~/datasets/data_aishell3`. Assume the path to the MFA result of AISHELL-3 is `./aishell3_alignment_tone`. Run the command below to preprocess the dataset. @@ -32,11 +30,13 @@ Run the command below to preprocess the dataset. ```bash ./preprocess.sh ``` + ## Train the model ```bash ./run.sh ``` If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`. + ## Synthesize We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder. Download pretrained parallel wavegan model (Trained with baker) from [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip) and unzip it. @@ -75,5 +75,6 @@ python3 synthesize_e2e.py \ --speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt ``` + ## Future work A multi-speaker vocoder is needed. diff --git a/examples/fastspeech2/baker/README.md b/examples/fastspeech2/baker/README.md index c597d01..fc0b383 100644 --- a/examples/fastspeech2/baker/README.md +++ b/examples/fastspeech2/baker/README.md @@ -1,16 +1,16 @@ -# FastSpeech2 with BZNSYP +# FastSpeech2 with the Baker dataset +This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). ## Dataset -### Download and Extract the datasaet. -Download BZNSYP from it's [Official Website](https://test.data-baker.com/data/index/source). -### Get MFA result of BZNSYP and Extract it. +### Download and Extract the datasaet +Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source). +### Get MFA result of CSMSC and Extract it We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/use_mfa) of our repo. -### Preprocess the dataset. - +### Preprocess the dataset Assume the path to the dataset is `~/datasets/BZNSYP`. Assume the path to the MFA result of BZNSYP is `./baker_alignment_tone`. Run the command below to preprocess the dataset. @@ -18,11 +18,13 @@ Run the command below to preprocess the dataset. ```bash ./preprocess.sh ``` + ## Train the model ```bash ./run.sh ``` If you want to train fastspeech2 with cpu, please add `--device=cpu` arguments for `python3 train.py` in `run.sh`. + ## Synthesize We use [parallel wavegan](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/parallelwave_gan/baker) as the neural vocoder. Download pretrained parallel wavegan model from [parallel_wavegan_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/parallel_wavegan_baker_ckpt_0.4.zip) and unzip it.