304 lines
9.7 KiB
Markdown
304 lines
9.7 KiB
Markdown
|
# Add new algorithm
|
||
|
|
||
|
PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
|
||
|
|
||
|
* Data loading and processing
|
||
|
* Network
|
||
|
* Post-processing
|
||
|
* Loss
|
||
|
* Metric
|
||
|
* Optimizer
|
||
|
|
||
|
The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.
|
||
|
|
||
|
|
||
|
## Data loading and processing
|
||
|
|
||
|
Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:
|
||
|
|
||
|
```bash
|
||
|
ppocr/data/
|
||
|
├── imaug # Scripts for image reading, data augment and label production
|
||
|
│ ├── label_ops.py # Modules that transform the label
|
||
|
│ ├── operators.py # Modules that transform the image
|
||
|
│ ├──.....
|
||
|
├── __init__.py
|
||
|
├── lmdb_dataset.py # The dataset that reads the lmdb
|
||
|
└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
|
||
|
```
|
||
|
|
||
|
PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:
|
||
|
|
||
|
1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
|
||
|
2. Add code in the my_module.py file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
class MyModule:
|
||
|
def __init__(self, *args, **kwargs):
|
||
|
# your init code
|
||
|
pass
|
||
|
|
||
|
def __call__(self, data):
|
||
|
img = data['image']
|
||
|
label = data['label']
|
||
|
# your process code
|
||
|
|
||
|
data['image'] = img
|
||
|
data['label'] = label
|
||
|
return data
|
||
|
```
|
||
|
|
||
|
3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.
|
||
|
|
||
|
All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:
|
||
|
|
||
|
```yaml
|
||
|
# angle class data process
|
||
|
transforms:
|
||
|
- DecodeImage: # load image
|
||
|
img_mode: BGR
|
||
|
channel_first: False
|
||
|
- MyModule:
|
||
|
args1: args1
|
||
|
args2: args2
|
||
|
- KeepKeys:
|
||
|
keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
|
||
|
```
|
||
|
|
||
|
## Network
|
||
|
|
||
|
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
|
||
|
necks->heads).
|
||
|
|
||
|
```bash
|
||
|
├── architectures # Code for building network
|
||
|
├── transforms # Image Transformation Module
|
||
|
├── backbones # Feature extraction module
|
||
|
├── necks # Feature enhancement module
|
||
|
└── heads # Output module
|
||
|
```
|
||
|
|
||
|
PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:
|
||
|
|
||
|
1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
|
||
|
2. Add code in the my_backbone.py file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
import paddle
|
||
|
import paddle.nn as nn
|
||
|
import paddle.nn.functional as F
|
||
|
|
||
|
|
||
|
class MyBackbone(nn.Layer):
|
||
|
def __init__(self, *args, **kwargs):
|
||
|
super(MyBackbone, self).__init__()
|
||
|
# your init code
|
||
|
self.conv = nn.xxxx
|
||
|
|
||
|
def forward(self, inputs):
|
||
|
# your necwork forward
|
||
|
y = self.conv(inputs)
|
||
|
return y
|
||
|
```
|
||
|
|
||
|
3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
|
||
|
|
||
|
After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
|
||
|
|
||
|
```yaml
|
||
|
Architecture:
|
||
|
model_type: rec
|
||
|
algorithm: CRNN
|
||
|
Transform:
|
||
|
name: MyTransform
|
||
|
args1: args1
|
||
|
args2: args2
|
||
|
Backbone:
|
||
|
name: MyBackbone
|
||
|
args1: args1
|
||
|
Neck:
|
||
|
name: MyNeck
|
||
|
args1: args1
|
||
|
Head:
|
||
|
name: MyHead
|
||
|
args1: args1
|
||
|
```
|
||
|
|
||
|
## Post-processing
|
||
|
|
||
|
Post-processing mainly completes the transformation from network output to human-friendly results. This part is under [ppocr/postprocess](../../ppocr/postprocess).
|
||
|
PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:
|
||
|
|
||
|
1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
|
||
|
2. Add code in the my_postprocess.py file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
import paddle
|
||
|
|
||
|
|
||
|
class MyPostProcess:
|
||
|
def __init__(self, *args, **kwargs):
|
||
|
# your init code
|
||
|
pass
|
||
|
|
||
|
def __call__(self, preds, label=None, *args, **kwargs):
|
||
|
if isinstance(preds, paddle.Tensor):
|
||
|
preds = preds.numpy()
|
||
|
# you preds decode code
|
||
|
preds = self.decode_preds(preds)
|
||
|
if label is None:
|
||
|
return preds
|
||
|
# you label decode code
|
||
|
label = self.decode_label(label)
|
||
|
return preds, label
|
||
|
|
||
|
def decode_preds(self, preds):
|
||
|
# you preds decode code
|
||
|
pass
|
||
|
|
||
|
def decode_label(self, preds):
|
||
|
# you label decode code
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.
|
||
|
|
||
|
After the post-processing module is added, you only need to configure it in the configuration file to use, such as:
|
||
|
|
||
|
```yaml
|
||
|
PostProcess:
|
||
|
name: MyPostProcess
|
||
|
args1: args1
|
||
|
args2: args2
|
||
|
```
|
||
|
|
||
|
## Loss
|
||
|
|
||
|
The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
|
||
|
PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:
|
||
|
|
||
|
1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
|
||
|
2. Add code in the my_loss.py file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
import paddle
|
||
|
from paddle import nn
|
||
|
|
||
|
|
||
|
class MyLoss(nn.Layer):
|
||
|
def __init__(self, **kwargs):
|
||
|
super(MyLoss, self).__init__()
|
||
|
# you init code
|
||
|
pass
|
||
|
|
||
|
def __call__(self, predicts, batch):
|
||
|
label = batch[1]
|
||
|
# your loss code
|
||
|
loss = self.loss(input=predicts, label=label)
|
||
|
return {'loss': loss}
|
||
|
```
|
||
|
|
||
|
3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.
|
||
|
|
||
|
After the loss function module is added, you only need to configure it in the configuration file to use it, such as:
|
||
|
|
||
|
```yaml
|
||
|
Loss:
|
||
|
name: MyLoss
|
||
|
args1: args1
|
||
|
args2: args2
|
||
|
```
|
||
|
|
||
|
## Metric
|
||
|
|
||
|
Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:
|
||
|
|
||
|
1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
|
||
|
2. Add code in the my_metric.py file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
|
||
|
class MyMetric(object):
|
||
|
def __init__(self, main_indicator='acc', **kwargs):
|
||
|
# main_indicator is used for select best model
|
||
|
self.main_indicator = main_indicator
|
||
|
self.reset()
|
||
|
|
||
|
def __call__(self, preds, batch, *args, **kwargs):
|
||
|
# preds is out of postprocess
|
||
|
# batch is out of dataloader
|
||
|
labels = batch[1]
|
||
|
cur_correct_num = 0
|
||
|
cur_all_num = 0
|
||
|
# you metric code
|
||
|
self.correct_num += cur_correct_num
|
||
|
self.all_num += cur_all_num
|
||
|
return {'acc': cur_correct_num / cur_all_num, }
|
||
|
|
||
|
def get_metric(self):
|
||
|
"""
|
||
|
return metircs {
|
||
|
'acc': 0,
|
||
|
'norm_edit_dis': 0,
|
||
|
}
|
||
|
"""
|
||
|
acc = self.correct_num / self.all_num
|
||
|
self.reset()
|
||
|
return {'acc': acc}
|
||
|
|
||
|
def reset(self):
|
||
|
# reset metric
|
||
|
self.correct_num = 0
|
||
|
self.all_num = 0
|
||
|
|
||
|
```
|
||
|
|
||
|
3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.
|
||
|
|
||
|
After the metric module is added, you only need to configure it in the configuration file to use it, such as:
|
||
|
|
||
|
```yaml
|
||
|
Metric:
|
||
|
name: MyMetric
|
||
|
main_indicator: acc
|
||
|
```
|
||
|
|
||
|
## 优化器
|
||
|
|
||
|
The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
|
||
|
Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
|
||
|
Modules without built-in can be added through the following steps, take `optimizer` as an example:
|
||
|
|
||
|
1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:
|
||
|
|
||
|
```python
|
||
|
from paddle import optimizer as optim
|
||
|
|
||
|
|
||
|
class MyOptim(object):
|
||
|
def __init__(self, learning_rate=0.001, *args, **kwargs):
|
||
|
self.learning_rate = learning_rate
|
||
|
|
||
|
def __call__(self, parameters):
|
||
|
# It is recommended to wrap the built-in optimizer of paddle
|
||
|
opt = optim.XXX(
|
||
|
learning_rate=self.learning_rate,
|
||
|
parameters=parameters)
|
||
|
return opt
|
||
|
|
||
|
```
|
||
|
|
||
|
After the optimizer module is added, you only need to configure it in the configuration file to use, such as:
|
||
|
|
||
|
```yaml
|
||
|
Optimizer:
|
||
|
name: MyOptim
|
||
|
args1: args1
|
||
|
args2: args2
|
||
|
lr:
|
||
|
name: Cosine
|
||
|
learning_rate: 0.001
|
||
|
regularizer:
|
||
|
name: 'L2'
|
||
|
factor: 0
|
||
|
```
|