Merge pull request #3909 from tink2123/doc_2.1
update doc for rec and training
This commit is contained in:
@ -14,8 +14,8 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ppocr/utils/ic15_dict.txt
character_type: ch
character_dict_path: ppocr/utils/en_dict.txt
character_type: EN
max_text_length: 25
infer_mode: False
use_space_char: False
Binary file not shown.
After Width: | Height: | Size: 921 KiB |
@ -126,7 +126,6 @@
## 3. 多语言配置文件生成
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
@ -176,7 +175,7 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi
2. 手动修改配置文件
@ -1,5 +1,6 @@
## 文字识别
# 文字识别
- [1 数据准备](#数据准备)
- [1.1 自定义数据集](#自定义数据集)
@ -9,22 +10,21 @@
- [2 启动训练](#启动训练)
- [2.1 数据增强](#数据增强)
- [2.2 训练](#训练)
- [2.3 小语种](#小语种)
- [2.2 通用模型训练](#通用模型训练)
- [2.3 多语言模型训练](#多语言模型训练)
- [3 评估](#评估)
- [4 预测](#预测)
- [4.1 训练引擎预测](#训练引擎预测)
<a name="数据准备"></a>
### 1. 数据准备
## 1. 数据准备
PaddleOCR 支持两种数据格式:
- `lmdb` 用于训练以lmdb格式存储的数据集;
- `通用数据` 用于训练以文本文件存储的数据集:
- `lmdb` 用于训练以lmdb格式存储的数据集(LMDBDataSet);
- `通用数据` 用于训练以文本文件存储的数据集(SimpleDataSet);
训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
<a name="准备数据集"></a>
#### 1.1 自定义数据集
### 1.1 自定义数据集
下面以通用数据集为例, 介绍如何准备数据集:
* 训练集
@ -82,14 +82,13 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
<a name="数据下载"></a>
1.2 数据下载
### 1.2 数据下载
若您本地没有数据集,可以在官网下载 [icdar2015]( 数据,用于快速验证。也可以参考[DTRB]( ,下载 benchmark 所需的lmdb格式数据集。
- ICDAR2015
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载:
如果希望复现SRN的论文指标,需要下载离线[增广数据](,提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
若您本地没有数据集,可以在官网下载 [ICDAR2015]( 数据,用于快速验证。也可以参考[DTRB]( ,下载 benchmark 所需的lmdb格式数据集。
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:
# 训练集标签
wget -P ./train_data/ic15_data
@ -97,15 +96,25 @@ wget -P ./train_data/ic15_data
wget -P ./train_data/ic15_data
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `ppocr/utils/`, 这里以训练集为例:
PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 `ppocr/utils/`, 这里以训练集为例:
# 将官网下载的标签文件转换为 rec_gt_label.txt
python --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件:

- 多语言数据集
多语言模型的训练数据集均为100w的合成数据,使用了开源合成工具 [text_renderer]( ,少量的字体可以通过下面两种方式下载。
* [百度网盘]( 提取码:frgi
* [google drive](
<a name="字典"></a>
1.3 字典
### 1.3 字典
@ -152,13 +161,27 @@ PaddleOCR内置了一部分字典,可以按需使用。
并将 `character_type` 设置为 `ch`。
<a name="支持空格"></a>
1.4 添加空格类别
### 1.4 添加空格类别
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`。
<a name="启动训练"></a>
### 2. 启动训练
## 2. 启动训练
<a name="数据增强"></a>
### 2.1 数据增强
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)、TIA数据增广。
<a name="通用模型训练"></a>
### 2.2 通用模型训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
@ -178,23 +201,16 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号
# GPU训练 支持单卡,多卡训练
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
python3 tools/ -c configs/rec/rec_icdar15_train.yml
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/ -c configs/rec/rec_icdar15_train.yml
<a name="数据增强"></a>
#### 2.1 数据增强
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)、TIA数据增广。
<a name="训练"></a>
#### 2.2 训练
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` 。
@ -283,85 +299,12 @@ Eval:
<a name="小语种"></a>
#### 2.3 小语种
<a name="多语言模型训练"></a>
### 2.3 多语言模型训练
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
1. 通过脚本自动生成
[](../../configs/rec/multi_language/ 可以帮助您生成多语言模型的配置文件
- 以意大利语为例,如果您的数据是按如下格式准备的:
|- it_train.txt # 训练集标签
|- it_val.txt # 验证集标签
|- data
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
# 该代码需要在指定目录运行
cd PaddleOCR/configs/rec/multi_language/
# 通过-l或者--language参数设置需要生成的语种的配置文件,该命令会将默认参数写入配置文件
python3 -l it
- 如果您的数据放置在其他位置,或希望使用自己的字典,可以通过指定相关参数来生成配置文件:
# -l或者--language字段是必须的
# --train修改训练集,--val修改验证集,--data_dir修改数据集目录,--dict修改字典路径, -o修改对应默认参数
cd PaddleOCR/configs/rec/multi_language/
python3 -l it \ # 语种
--train {path/of/train_label.txt} \ # 训练标签文件的路径
--val {path/of/val_label.txt} \ # 验证集标签文件的路径
--data_dir {train_data/path} \ # 训练数据的根目录
--dict {path/of/dict} \ # 字典文件路径
-o Global.use_gpu=False # 是否使用gpu
意大利文由拉丁字母组成,因此执行完命令后会得到名为 rec_latin_lite_train.yml 的配置文件。
2. 手动修改配置文件
use_gpu: True
epoch_num: 500
character_type: it # 需要识别的语种
character_dict_path: {path/of/dict} # 字典文件所在路径
name: SimpleDataSet
data_dir: train_data/ # 数据存放根目录
label_file_list: ["./train_data/train_list.txt"] # 训练集label路径
name: SimpleDataSet
data_dir: train_data/ # 数据存放根目录
label_file_list: ["./train_data/val_list.txt"] # 验证集label路径
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
@ -378,10 +321,6 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi
更多支持语种请参考: [多语言模型](
* [百度网盘](。提取码:frgi。
* [google drive](
以 `rec_french_lite_train` 为例:
@ -417,7 +356,7 @@ Eval:
<a name="评估"></a>
### 3 评估
## 3 评估
评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
@ -427,14 +366,29 @@ python3 -m paddle.distributed.launch --gpus '0' tools/ -c configs/rec/rec
<a name="预测"></a>
### 4 预测
<a name="训练引擎预测"></a>
#### 4.1 训练引擎的预测
## 4 预测
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 指定权重:
默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 加载训练好的参数文件:
根据配置文件中设置的的 `save_model_dir` 和 `save_epoch_step` 字段,会有以下几种参数被保存下来:
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── config.yml
├── iter_epoch_3.pdopt
├── iter_epoch_3.pdparams
├── iter_epoch_3.states
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
其中 best_accuracy.* 是评估集上的最优模型;iter_epoch_x.* 是以 `save_epoch_step` 为间隔保存下来的模型;latest.* 是最后一个epoch的模型。
# 预测英文结果
@ -0,0 +1,128 @@
# 模型训练
- [1. 基本概念](#基本概念)
* [1.1 学习率](#学习率)
* [1.2 正则化](#正则化)
* [1.3 评估指标](#评估指标)
- [2. 常见问题](#常见问题)
- [3. 数据与垂类场景](#数据与垂类场景)
* [3.1 训练数据](#训练数据)
* [3.2 垂类场景](#垂类场景)
* [3.3 自己构建数据集](#自己构建数据集)
<a name="基本概念"></a>
## 1. 基本概念
OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析识别处理,获取文字和版面信息的过程,是典型的计算机视觉任务,
<a name="学习率"></a>
### 1.1 学习率
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
Piecewise 代表分段常数衰减,在不同的学习阶段指定不同的学习率,在每段内学习率相同。
warmup_epoch 代表在前5个epoch中,学习率将逐渐从0增加到base_lr。全部策略可以参考代码[](../../ppocr/optimizer/ 。
<a name="正则化"></a>
### 1.2 正则化
正则化可以有效的避免算法过拟合,PaddleOCR中提供了L1、L2正则方法,L1 和 L2 正则化是最常用的正则化方法。L1 正则化向目标函数添加正则化项,以减少参数的绝对值总和;而 L2 正则化中,添加正则化项的目的在于减少参数平方的总和。配置方法如下:
name: L2
factor: 2.0e-05
<a name="评估指标"></a>
### 1.3 评估指标
(2)识别阶段: 字符识别准确率,即正确识别的文本行占标注的文本行数量的比例,只有整行文本识别对才算正确识别。
(3)端到端统计: 端对端召回率:准确检测并正确识别文本行在全部标注文本行的占比; 端到端准确率:准确检测并正确识别文本行在 检测到的文本行数量 的占比; 准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的的检测框中的文本与标注的文本相同。
<a name="常见问题"></a>
## 2. 常见问题
(2)调大系统的[l2 dcay值](
**Q**: 识别模型训练时,loss能正常下降,但acc一直为0
<a name="数据与垂类场景"></a>
## 3. 数据与垂类场景
<a name="训练数据"></a>
### 3.1 训练数据
- 检测:
- 英文数据集,ICDAR2015
- 中文数据集,LSVT街景数据集训练数据3w张图片
- 识别:
- 英文数据集,MJSynth和SynthText合成数据,数据量上千万。
- 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。
- 小语种数据集,使用不同语料和字体,分别生成了100w合成数据集,并使用ICDAR-MLT作为验证集。
其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./,合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer]( 、[SynthText]( 、[TextRecognitionDataGenerator]( 等。
<a name="垂类场景"></a>
### 3.2 垂类场景
<a name="自己构建数据集"></a>
### 3.3 自己构建数据集
(1) 训练集的数据量:
a. 检测需要的数据相对较少,在PaddleOCR模型的基础上进行Fine-tune,一般需要500张可达到不错的效果。
b. 识别分英文和中文,一般英文场景需要几十万数据可达到不错的效果,中文则需要几百万甚至更多。
a. 人工采集更多的训练数据,最直接也是最有效的方式。
b. 基于PIL和opencv基本图像处理或者变换。例如PIL中ImageFont, Image, ImageDraw三个模块将文字写到背景中,opencv的旋转仿射变换,高斯滤波等。
c. 利用数据生成算法合成数据,例如pix2pix或StyleText等算法。
@ -122,7 +122,7 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
## 3. Multi-language config yml file generation
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
@ -204,6 +204,7 @@ Italian is made up of Latin letters, so after executing the command, you will ge
Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type |
@ -1,4 +1,4 @@
- [1.1 Costom Dataset](#Costom_Dataset)
@ -8,8 +8,8 @@
- [2.1 Data Augmentation](#Data_Augmentation)
- [2.2 Training](#Training)
- [2.3 Multi-language](#Multi_language)
- [2.2 General Training](#Training)
- [2.3 Multi-language Training](#Multi_language)
@ -17,12 +17,12 @@
- [4.1 Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a>
PaddleOCR supports two data formats:
- `LMDB` is used to train data sets stored in lmdb format;
- `general data` is used to train data sets stored in text files:
- `LMDB` is used to train data sets stored in lmdb format(LMDBDataSet);
- `general data` is used to train data sets stored in text files(SimpleDataSet):
Please organize the dataset as follows:
@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
<a name="Costom_Dataset"></a>
#### 1.1 Costom dataset
### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data.
@ -84,11 +84,12 @@ Similar to the training set, the test set also needs to be provided a folder con
<a name="Dataset_download"></a>
#### 1.2 Dataset download
### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015]( Also refer to [DTRB]( ,download the lmdb format dataset required for benchmark
- ICDAR2015
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](, extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
If you do not have a dataset locally, you can download it on the official website [icdar2015](
Also refer to [DTRB]( ,download the lmdb format dataset required for benchmark
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
@ -99,8 +100,28 @@ wget -P ./train_data/ic15_data
wget -P ./train_data/ic15_data
PaddleOCR also provides a data format conversion script, which can convert ICDAR official website label to a data format
supported by PaddleOCR. The data conversion tool is in `ppocr/utils/`, here is the training set as an example:
# convert the official gt to rec_gt_label.txt
python --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
The data format is as follows, (a) is the original picture, (b) is the Ground Truth text file corresponding to each picture:

- Multilingual dataset
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk]( ,Extraction code:frgi.
* [Google drive](
<a name="Dictionary"></a>
#### 1.3 Dictionary
### 1.3 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
@ -145,14 +166,26 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a>
#### 1.4 Add space category
### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a>
<a name="Data_Augmentation"></a>
### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, TIA augmentation.
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [](../../ppocr/data/imaug/
<a name="Training"></a>
### 2.2 General Training
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
@ -170,21 +203,15 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
Start training:
# GPU training Support single card and multi-card training, specify the card number through --gpus
# GPU training Support single card and multi-card training
# Training icdar15 English data and The training log will be automatically saved as train.log under "{save_model_dir}"
#specify the single card training(Long training time, not recommended)
python3 tools/ -c configs/rec/rec_icdar15_train.yml
#specify the card number through --gpus
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/ -c configs/rec/rec_icdar15_train.yml
<a name="Data_Augmentation"></a>
#### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, TIA augmentation.
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [](../../ppocr/data/imaug/
<a name="Training"></a>
#### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
@ -277,87 +304,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="Multi_language"></a>
#### 2.3 Multi-language
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
There are two ways to create the required configuration file::
1. Automatically generated by script
[](../../configs/rec/multi_language/ Can help you generate configuration files for multi-language models
- Take Italian as an example, if your data is prepared in the following format:
|- it_train.txt # train_set label
|- it_val.txt # val_set label
|- data
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
You can use the default parameters to generate a configuration file:
# The code needs to be run in the specified directory
cd PaddleOCR/configs/rec/multi_language/
# Set the configuration file of the language to be generated through the -l or --language parameter.
# This command will write the default parameters into the configuration file
python3 -l it
- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:
# -l or --language field is required
# --train to modify the training set
# --val to modify the validation set
# --data_dir to modify the data set directory
# --dict to modify the dict path
# -o to modify the corresponding default parameters
cd PaddleOCR/configs/rec/multi_language/
python3 -l it \ # language
--train {path/of/train_label.txt} \ # path of train_label
--val {path/of/val_label.txt} \ # path of val_label
--data_dir {train_data/path} \ # root directory of training data
--dict {path/of/dict} \ # path of dict
-o Global.use_gpu=False # whether to use gpu
Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.
2. Manually modify the configuration file
You can also manually modify the following fields in the template:
use_gpu: True
epoch_num: 500
character_type: it # language
character_dict_path: {path/of/dict} # path of dict
name: SimpleDataSet
data_dir: train_data/ # root directory of training data
label_file_list: ["./train_data/train_list.txt"] # train label path
name: SimpleDataSet
data_dir: train_data/ # root directory of val data
label_file_list: ["./train_data/val_list.txt"] # val label path
### 2.3 Multi-language Training
Currently, the multi-language algorithms supported by PaddleOCR are:
@ -376,9 +323,6 @@ Currently, the multi-language algorithms supported by PaddleOCR are:
For more supported languages, please refer to : [Multi-language model](
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](,Extraction code:frgi.
* [Google drive](
If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
@ -417,7 +361,7 @@ Eval:
<a name="EVALUATION"></a>
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
@ -427,20 +371,39 @@ python3 -m paddle.distributed.launch --gpus '0' tools/ -c configs/rec/rec
<a name="PREDICTION"></a>
<a name="Training_engine_prediction"></a>
#### 4.1 Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script.
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
The default prediction picture is stored in `infer_img`, and the trained weight is specified via `-o Global.checkpoints`:
According to the `save_model_dir` and `save_epoch_step` fields set in the configuration file, the following parameters will be saved:
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── config.yml
├── iter_epoch_3.pdopt
├── iter_epoch_3.pdparams
├── iter_epoch_3.states
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
Among them, best_accuracy.* is the best model on the evaluation set; iter_epoch_x.* is the model saved at intervals of `save_epoch_step`; latest.* is the model of the last epoch.
# Predict English results
python3 tools/ -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words/en/word_1.jpg
Input image:

@ -0,0 +1,135 @@
- [1. Basic concepts](#1-basic-concepts)
* [1.1 Learning rate](#11-learning-rate)
* [1.2 Regularization](#12-regularization)
* [1.3 Evaluation indicators](#13-evaluation-indicators-)
- [2. FAQ](#2-faq)
- [3. Data and vertical scenes](#3-data-and-vertical-scenes)
* [3.1 Training data](#31-training-data)
* [3.2 Vertical scene](#32-vertical-scene)
* [3.3 Build your own data set](#33-build-your-own-data-set)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<a name="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
It usually consists of two subtasks: text detection and text recognition.
The following parameters need to be paid attention to when tuning the model:
<a name="11-learning-rate"></a>
## 1.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
Piecewise stands for piecewise constant attenuation. Different learning rates are specified in different learning stages,
and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [](../../ppocr/optimizer/
<a name="12-regularization"></a>
## 1.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
The configuration method is as follows:
name: L2
factor: 2.0e-05
<a name="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
(2) Recognition stage: Character recognition accuracy, that is, the ratio of correctly recognized text lines to the number of marked text lines. Only the entire line of text recognition pairs can be regarded as correct recognition.
(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.
<a name="2-faq"></a>
# 2. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition?
A: The general height is 32, the longest width is selected, there are two methods:
(1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
(1) Add more augmentation methods or increase the [probability] of augmented prob (, The default is 0.4.
(2) Increase the [l2 dcay value] of the system (
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period.
<a name="3-data-and-vertical-scenes"></a>
# 3. Data and vertical scenes
<a name="31-training-data"></a>
## 3.1 Training data
The current open source models, data sets and magnitudes are as follows:
- Detection:
- English data set, ICDAR2015
- Chinese data set, LSVT street view data set training data 3w pictures
- Identification:
- English data set, MJSynth and SynthText synthetic data, the data volume is tens of millions.
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./, synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](, [SynthText](, [TextRecognitionDataGenerator]( etc.
<a name="32-vertical-scene"></a>
## 3.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<a name="33-build-your-own-data-set"></a>
## 3.3 Build your own data set
There are several experiences for reference when constructing the data set:
(1) The amount of data in the training set:
a. The data required for detection is relatively small. For Fine-tune based on the PaddleOCR model, 500 sheets are generally required to achieve good results.
b. Recognition is divided into English and Chinese. Generally, English scenarios require hundreds of thousands of data to achieve good results, while Chinese requires several million or more.
(2) When the amount of training data is small, you can try the following three ways to get more data:
a. Manually collect more training data, the most direct and effective way.
b. Basic image processing or transformation based on PIL and opencv. For example, the three modules of ImageFont, Image, ImageDraw in PIL write text into the background, opencv's rotating affine transformation, Gaussian filtering and so on.
c. Use data generation algorithms to synthesize data, such as algorithms such as pix2pix.
Reference in New Issue