Merge branch 'master' into 'master'
clean unused files and refine doc See merge request !37
This commit is contained in:
commit
5b8eee0892
|
@ -52,7 +52,7 @@ Finally, if preprocessing the dataset is slow and the processed dataset is too l
|
||||||
|
|
||||||
## DataCargo
|
## DataCargo
|
||||||
|
|
||||||
`DataCargo`, like `Dataset`, is an iterable, but it is an iterable of batches. We need `Datacargo` because in deep learning, batching examples into batches exploits the computational resources of modern hardwares. You can iterate it by `iter(datacargo)` or `for batch in datacargo`. `DataCargo` is an iterable but not an iterator, in that in can be iterated more than once.
|
`DataCargo`, like `Dataset`, is an iterable object, but it is an iterable oject of batches. We need `Datacargo` because in deep learning, batching examples into batches exploits the computational resources of modern hardwares. You can iterate it by `iter(datacargo)` or `for batch in datacargo`. `DataCargo` is an iterable object but not an iterator, in that in can be iterated more than once.
|
||||||
|
|
||||||
### batch function
|
### batch function
|
||||||
|
|
||||||
|
@ -91,7 +91,7 @@ Usually we need to define the batch function as an callable object which stores
|
||||||
|
|
||||||
Equipped with a batch function(we have known __how to batch__), here comes the next question. __What to batch?__ We need to decide which examples to pick when creating a batch. Since a dataset is a list of examples, we only need to pick indices for the corresponding examples. A sampler object is what we use to do this.
|
Equipped with a batch function(we have known __how to batch__), here comes the next question. __What to batch?__ We need to decide which examples to pick when creating a batch. Since a dataset is a list of examples, we only need to pick indices for the corresponding examples. A sampler object is what we use to do this.
|
||||||
|
|
||||||
A `Sampler` is represented as an iterable of integers. Assume the dataset has `N` examples, then an iterable of intergers in the range`[0, N)` is an appropriate sampler for this dataset to build a `DataCargo`.
|
A `Sampler` is represented as an iterable object of integers. Assume the dataset has `N` examples, then an iterable object of intergers in the range`[0, N)` is an appropriate sampler for this dataset to build a `DataCargo`.
|
||||||
|
|
||||||
We provide several samplers that is ready to use. The `SequentialSampler`, `RandomSampler` and so on.
|
We provide several samplers that is ready to use. The `SequentialSampler`, `RandomSampler` and so on.
|
||||||
|
|
||||||
|
@ -323,7 +323,7 @@ for batch in train_cargo:
|
||||||
# your training code here
|
# your training code here
|
||||||
```
|
```
|
||||||
|
|
||||||
In the code above, processing of the data and training of the model run in the same process. So the next batch starts to load after the training of the current batch has finished. There is actually better solutions for this. Data processing and model training can be run asynchronously. To accomplish this, we would use `DataLoader` from Paddle. This serves as an adapter to transform an Iterable of batches into another iterable of batches, which runs asynchronously and transform each ndarray into `Variable`.
|
In the code above, processing of the data and training of the model run in the same process. So the next batch starts to load after the training of the current batch has finished. There is actually better solutions for this. Data processing and model training can be run asynchronously. To accomplish this, we would use `DataLoader` from Paddle. This serves as an adapter to transform an iterable object of batches into another iterable object of batches, which runs asynchronously and transform each ndarray into `Variable`.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# connects our data cargos with corresponding DataLoader
|
# connects our data cargos with corresponding DataLoader
|
||||||
|
|
|
@ -11,9 +11,9 @@ For a general deep learning experiment, there are 4 parts to care for.
|
||||||
|
|
||||||
For processing data, `parakeet.data` provides `DatasetMixin`, `DataCargo` and `DataIterator`.
|
For processing data, `parakeet.data` provides `DatasetMixin`, `DataCargo` and `DataIterator`.
|
||||||
|
|
||||||
Dataset is an iterable of examples. `DatasetMixin` provides the standard indexing interface, and other classes in [parakeet.data.dataset](../parakeet/data/dataset.py) provide flexible interfaces for building customized datasets.
|
Dataset is an iterable object of examples. `DatasetMixin` provides the standard indexing interface, and other classes in [parakeet.data.dataset](../parakeet/data/dataset.py) provide flexible interfaces for building customized datasets.
|
||||||
|
|
||||||
`DataCargo` is an iterable of batches. It differs from a dataset in that it can be iterated in batches. In addition to a dataset, a `Sampler` and a `batch function` are required to build a `DataCargo`. `Sampler` specifies which examples to pick, and `batch function` specifies how to create a batch from them. Commonly used `Samplers` are provides by [parakeet.data](../parakeet/data/). Users should define a `batch function` for a datasets, in order to batch its examples.
|
`DataCargo` is an iterable object of batches. It differs from a dataset in that it can be iterated in batches. In addition to a dataset, a `Sampler` and a `batch function` are required to build a `DataCargo`. `Sampler` specifies which examples to pick, and `batch function` specifies how to create a batch from them. Commonly used `Samplers` are provides by [parakeet.data](../parakeet/data/). Users should define a `batch function` for a datasets, in order to batch its examples.
|
||||||
|
|
||||||
`DataIterator` is an iterator class for `DataCargo`. It is create when explicitly creating an iterator of a `DataCargo` by `iter(DataCargo)`, or iterating a `DataCargo` with `for` loop.
|
`DataIterator` is an iterator class for `DataCargo`. It is create when explicitly creating an iterator of a `DataCargo` by `iter(DataCargo)`, or iterating a `DataCargo` with `for` loop.
|
||||||
|
|
||||||
|
|
|
@ -1 +0,0 @@
|
||||||
# train deepvoice 3 with ljspeech (just a place holder now)
|
|
File diff suppressed because one or more lines are too long
|
@ -1,24 +0,0 @@
|
||||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
|
|
||||||
from parakeet.datasets.ljspeech import LJSpeech
|
|
||||||
from parakeet.data.datacargo import DataCargo
|
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
LJSPEECH_ROOT = Path("/workspace/datasets/LJSpeech-1.1")
|
|
||||||
ljspeech = LJSpeech(LJSPEECH_ROOT)
|
|
||||||
ljspeech_cargo = DataCargo(ljspeech, batch_size=16, shuffle=True)
|
|
||||||
for i, batch in enumerate(ljspeech_cargo):
|
|
||||||
print(i)
|
|
|
@ -1,25 +0,0 @@
|
||||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
|
|
||||||
from parakeet.datasets import vctk
|
|
||||||
from pathlib import Path
|
|
||||||
from parakeet.data.datacargo import DataCargo
|
|
||||||
|
|
||||||
root = Path("/workspace/datasets/VCTK-Corpus")
|
|
||||||
vctk_dataset = vctk.VCTK(root)
|
|
||||||
vctk_cargo = DataCargo(
|
|
||||||
vctk_dataset, batch_size=16, shuffle=True, drop_last=True)
|
|
||||||
|
|
||||||
for i, batch in enumerate(vctk_cargo):
|
|
||||||
print(i)
|
|
Loading…
Reference in New Issue