Commit Graph

33 Commits

Author SHA1 Message Date
Feiyu Chan 4f288a6d4f
add ge2e and tacotron2_aishell3 example (#107)
* hacky thing, add tone support for acoustic model

* fix experiments for waveflow and wavenet, only write visual log in rank-0

* use emb add in tacotron2

* 1. remove space from numericalized representation;
2. fix decoder paddign mask's unsqueeze dim.

* remove bn in postnet

* refactoring code

* add an option to normalize volume when loading audio.

* add an embedding layer.

* 1. change the default min value of LogMagnitude to 1e-5;
2. remove stop logit prediction from tacotron2 model.

* WIP: baker

* add ge2e

* fix lstm speaker encoder

* fix lstm speaker encoder

* fix speaker encoder and add support for 2 more datasets

* simplify visualization code

* add a simple strategy to support multispeaker for tacotron.

* add vctk example for refactored tacotron

* fix indentation

* fix class name

* fix visualizer

* fix root path

* fix root path

* fix root path

* fix typos

* fix bugs

* fix text log extention name

* add example for baker and aishell3

* update experiment and display

* format code for tacotron_vctk, add plot_waveform to display

* add new trainer

* minor fix

* add global condition support for tacotron2

* add gst layer

* add 2 frontend

* fix fmax for example/waveflow

* update collate function, data loader not does not convert nested list into numpy array.

* WIP: add hifigan

* WIP:update hifigan

* change stft to use conv1d

* add audio datasets

* change batch_text_id, batch_spec, batch_wav to include valid lengths in the returned value

* change wavenet to use on-the-fly prepeocessing

* fix typos

* resolve conflict

* remove imports that are removed

* remove files not included in this release

* remove imports to deleted modules

* move tacotron2_msp

* clean code

* fix argument order

* fix argument name

* clean code for data processing

* WIP: add README

* add more details to thr README, fix some preprocess scripts

* add voice cloning notebook

* add an optional to alter the loss and model structure of tacotron2, add an alternative config

* add plot_multiple_attentions and update visualization code in transformer_tts

* format code

* remove tacotron2_msp

* update tacotron2 from_pretrained, update setup.py

* update tacotron2

* update tacotron_aishell3's README

* add images for exampels/tacotron2_aishell3's README

* update README for examples/ge2e

* add STFT back

* add extra_config keys into the default config of tacotron

* fix typos and docs

* update README and doc

* update docstrings for tacotron

* update doc

* update README

* add links to downlaod pretrained models

* refine READMEs and clean code

* add praatio into requirements for running the experiments

* format code with pre-commit

* simplify text processing code and update notebook
2021-05-13 17:49:50 +08:00
iclementine c321fcd098 polish documentation 2021-01-13 14:58:26 +08:00
iclementine c2a279c433 add documentation sections 2021-01-13 11:06:15 +08:00
iclementine e03e96d9e4 format all the code with yapf 2020-12-20 13:15:07 +08:00
chenfeiyu 29cc759241 add access control by __all__ in modules 2020-12-09 15:58:39 +08:00
iclementine 49231ca8e5 move datasets 2020-11-19 22:04:25 +08:00
iclementine abee3ecdd4 move datasets into parakeet.datasets 2020-11-19 20:31:21 +08:00
chenfeiyu 57d820f055 add support for channel last in batch_spec, and Conv1dBatchNorm 2020-10-30 15:13:57 +08:00
chenfeiyu c43216ae9b 1. API renaming Conv1d -> Conv1D, BatchNorm1d -> BatchNorm1D;
2. add losses in parakeet/modules;
3. fix a bug in phonetics;
4. TransformerTTS update: encoder dim can be different from decoder dim;
5. MultiHeadAttention in TransformerTTS: add k_input_dim & v_input_dim in __init__ to allow differemt feature sizes for k and v.
2020-10-22 05:04:45 +00:00
iclementine 6aa7af1aa4 add AudioFolderDataset 2020-10-15 23:15:27 +08:00
iclementine 53d0382fc7 clean code: remove deprecated modules 2020-10-15 23:07:30 +08:00
iclementine a8192c79cc WIP: refactor 2020-10-10 15:51:54 +08:00
chenfeiyu 6aac18278e refactor for deep voice 3, update wavenet and clarinet to use enable_dygraph 2020-05-20 12:37:19 +00:00
chenfeiyu 6a9eab4b73 fix typos and refine doc 2020-03-09 15:33:13 +00:00
chenfeiyu e0e40c5379 Merge branch 'master' of upstream. 2020-03-09 07:30:19 +00:00
chenfeiyu 4b2b974eb4 refine docstring for parakeet.data and deep voice 3, wavenet and clarinet 2020-03-09 03:06:28 +00:00
lifuchen a302bf21f4 fix conflicts of dataset.py 2020-03-06 11:49:53 +00:00
chenfeiyu 86fff7a077 add doc for parakeet.data, python2 compatability for DataIterator and lazy CacheDataset 2020-03-06 02:55:42 +00:00
lifuchen d08779d61e Modified data.py to generate masks as models inputs 2020-03-05 07:22:50 +00:00
lifuchen 078d22e51c Modified data.py to generate masks as models inputs 2020-03-05 07:08:12 +00:00
chenfeiyu 424c16a68d staged clarinet 2020-02-27 10:23:05 +00:00
lifuchen 9d79699432 add license 2020-02-26 21:03:51 +08:00
chenfeiyu 78582dbecd make DataIterator compatible for python 2 2020-02-24 06:54:57 +00:00
chenfeiyu 173693f469 fix missing imports, fix ljspeech.yaml config key: encoder_channels 2020-02-16 17:54:11 +00:00
lifuchen d0015239db Eliminated conflict 2020-02-07 01:07:51 +00:00
chenfeiyu 837749a32c update statset and datacargo's design 2020-02-06 15:40:04 +08:00
lifuchen 9fe6ad11f0 Training with multi-GPU 2019-12-17 06:23:34 +00:00
lifuchen 8a9bbc2634 add_TransformerTTS 2019-12-16 09:04:22 +00:00
Kexin Zhao 98841ee48a clean code 2019-12-02 22:58:17 -08:00
Kexin Zhao b15c313423 working integraton with parakeet 2019-12-02 14:00:53 -08:00
chenfeiyu 5bd396712c fix sampler length 2019-11-25 10:47:31 +00:00
chenfeiyu 34bd1e984d add setup.py 2019-11-22 11:32:59 +08:00
chenfeiyu 617605c8fe place parakeet into Parakeet/parakeet, and add tests 2019-11-21 23:02:32 +08:00