fix typo and add srn for inference,test=document_fix
This commit is contained in:
parent
1614282b23
commit
e5d7571503
|
@ -23,8 +23,9 @@ inference 模型(`fluid.io.save_inference_model`保存的模型)
|
|||
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
||||
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
||||
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
|
||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [5. 多语言模型的推理](#多语言模型的推理)
|
||||
- [4. 基于SRN损失的识别模型推理](#基于SRN损失的识别模型推理)
|
||||
- [5. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [6. 多语言模型的推理](#多语言模型的推理)
|
||||
|
||||
- [四、方向分类模型推理](#方向识别模型推理)
|
||||
- [1. 方向分类模型推理](#方向分类模型推理)
|
||||
|
@ -297,9 +298,21 @@ Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555]
|
|||
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
||||
dict_character = list(self.character_str)
|
||||
```
|
||||
<a name="基于SRN损失的识别模型推理"></a>
|
||||
### 4. 基于SRN损失的识别模型推理
|
||||
|
||||
基于SRN损失的识别模型,需要额外设置识别算法参数 --rec_algorithm="SRN"。 同时需要保证预测shape与训练时一致,如: --rec_image_shape="1, 64, 256"
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
|
||||
--rec_model_dir="./inference/srn/" \
|
||||
--rec_image_shape="1, 64, 256" \
|
||||
--rec_char_type="en" \
|
||||
--rec_algorithm="SRN"
|
||||
```
|
||||
|
||||
<a name="自定义文本识别字典的推理"></a>
|
||||
### 4. 自定义文本识别字典的推理
|
||||
### 5. 自定义文本识别字典的推理
|
||||
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径
|
||||
|
||||
```
|
||||
|
@ -307,7 +320,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
|||
```
|
||||
|
||||
<a name="多语言模型的推理"></a>
|
||||
### 5. 多语言模型的推理
|
||||
### 6. 多语言模型的推理
|
||||
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
||||
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
||||
|
||||
|
|
|
@ -1,5 +1,24 @@
|
|||
## 文字识别
|
||||
|
||||
|
||||
- [一、数据准备](#数据准备)
|
||||
- [数据下载](#数据下载)
|
||||
- [自定义数据集](#自定义数据集)
|
||||
- [字典](#字典)
|
||||
- [支持空格](#支持空格)
|
||||
|
||||
- [二、启动训练](#文本检测模型推理)
|
||||
- [1. 数据增强](#数据增强)
|
||||
- [2. 训练](#训练)
|
||||
- [3. 小语种](#小语种)
|
||||
|
||||
- [三、评估](#评估)
|
||||
|
||||
- [四、预测](#预测)
|
||||
- [1. 训练引擎预测](#训练引擎预测)
|
||||
|
||||
|
||||
<a name="数据准备"></a>
|
||||
### 数据准备
|
||||
|
||||
|
||||
|
@ -13,13 +32,14 @@ PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算
|
|||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
|
||||
```
|
||||
|
||||
|
||||
<a name="数据下载"></a>
|
||||
* 数据下载
|
||||
|
||||
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。
|
||||
|
||||
如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
|
||||
|
||||
<a name="自定义数据集"></a>
|
||||
* 使用自己数据集
|
||||
|
||||
若您希望使用自己的数据进行训练,请参考下文组织您的数据。
|
||||
|
@ -78,7 +98,7 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_
|
|||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
<a name="字典"></a>
|
||||
- 字典
|
||||
|
||||
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
|
||||
|
@ -119,13 +139,14 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,
|
|||
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
|
||||
并将 `character_type` 设置为 `ch`。
|
||||
|
||||
<a name="支持空格"></a>
|
||||
- 添加空格类别
|
||||
|
||||
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。
|
||||
|
||||
**注意:`use_space_char` 仅在 `character_type=ch` 时生效**
|
||||
|
||||
|
||||
<a name="启动训练"></a>
|
||||
### 启动训练
|
||||
|
||||
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
|
||||
|
@ -151,7 +172,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
|
|||
# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
|
||||
```
|
||||
|
||||
<a name="数据增强"></a>
|
||||
- 数据增强
|
||||
|
||||
PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。
|
||||
|
@ -162,6 +183,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
|
|||
|
||||
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
|
||||
|
||||
<a name="训练"></a>
|
||||
- 训练
|
||||
|
||||
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` 。
|
||||
|
@ -224,17 +246,18 @@ Optimizer:
|
|||
```
|
||||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
<a name="小语种"></a>
|
||||
- 小语种
|
||||
|
||||
PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有:
|
||||
|
||||
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language |
|
||||
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
|
||||
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 |
|
||||
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 |
|
||||
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 |
|
||||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
|
||||
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language |
|
||||
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
|
||||
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 |
|
||||
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 |
|
||||
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 |
|
||||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
|
||||
|
||||
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载,提取码:frgi。
|
||||
|
||||
|
@ -269,7 +292,7 @@ TrainReader:
|
|||
|
||||
...
|
||||
```
|
||||
|
||||
<a name="评估"></a>
|
||||
### 评估
|
||||
|
||||
评估数据集可以通过 `configs/rec/rec_icdar15_reader.yml` 修改EvalReader中的 `label_file_path` 设置。
|
||||
|
@ -281,8 +304,10 @@ export CUDA_VISIBLE_DEVICES=0
|
|||
python3 tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
<a name="预测"></a>
|
||||
### 预测
|
||||
|
||||
<a name="训练引擎预测"></a>
|
||||
* 训练引擎的预测
|
||||
|
||||
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
|
||||
|
|
|
@ -26,8 +26,10 @@ Next, we first introduce how to convert a trained model into an inference model,
|
|||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
||||
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
|
||||
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
|
||||
- [4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
|
||||
- [5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
- [6. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
|
||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
|
||||
|
@ -299,17 +301,31 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
|||
dict_character = list(self.character_str)
|
||||
```
|
||||
|
||||
<a name="SRN-BASED_RECOGNITION"></a>
|
||||
### 4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE
|
||||
|
||||
The recognition model based on SRN requires additional setting of the recognition algorithm parameter --rec_algorithm="SRN".
|
||||
At the same time, it is necessary to ensure that the predicted shape is consistent with the training, such as: --rec_image_shape="1, 64, 256"
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
|
||||
--rec_model_dir="./inference/srn/" \
|
||||
--rec_image_shape="1, 64, 256" \
|
||||
--rec_char_type="en" \
|
||||
--rec_algorithm="SRN"
|
||||
```
|
||||
|
||||
|
||||
<a name="USING_CUSTOM_CHARACTERS"></a>
|
||||
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
### 5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
<a name="Multilingual model inference"></a>
|
||||
|
||||
### 5. Multilingual Model Reasoning
|
||||
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
||||
### 6. MULTILINGAUL MODEL INFERENCE
|
||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
||||
|
||||
|
|
|
@ -1,5 +1,22 @@
|
|||
## TEXT RECOGNITION
|
||||
|
||||
- [DATA PREPARATION](#DATA_PREPARATION)
|
||||
- [Dataset Download](#Dataset_download)
|
||||
- [Costom Dataset](#Costom_Dataset)
|
||||
- [Dictionary](#Dictionary)
|
||||
- [Add Space Category](#Add_space_category)
|
||||
|
||||
- [TRAINING](#TRAINING)
|
||||
- [Data Augmentation](#Data_Augmentation)
|
||||
- [Training](#Training)
|
||||
- [Multi-language](#Multi_language)
|
||||
|
||||
- [EVALUATION](#EVALUATION)
|
||||
|
||||
- [PREDICTION](#PREDICTION)
|
||||
- [Training engine prediction](#Training_engine_prediction)
|
||||
|
||||
<a name="DATA_PREPARATION"></a>
|
||||
### DATA PREPARATION
|
||||
|
||||
|
||||
|
@ -13,13 +30,14 @@ The default storage path for training data is `PaddleOCR/train_data`, if you alr
|
|||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
|
||||
```
|
||||
|
||||
|
||||
<a name="Dataset_download"></a>
|
||||
* Dataset download
|
||||
|
||||
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
|
||||
|
||||
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
|
||||
|
||||
<a name="Costom_Dataset"></a>
|
||||
* Use your own dataset:
|
||||
|
||||
If you want to use your own data for training, please refer to the following to organize your data.
|
||||
|
@ -72,7 +90,7 @@ Similar to the training set, the test set also needs to be provided a folder con
|
|||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
<a name="Dictionary"></a>
|
||||
- Dictionary
|
||||
|
||||
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
|
||||
|
@ -114,12 +132,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
|
|||
|
||||
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
|
||||
|
||||
<a name="Add_space_category"></a>
|
||||
- Add space category
|
||||
|
||||
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`.
|
||||
|
||||
**Note: use_space_char only takes effect when character_type=ch**
|
||||
|
||||
<a name="TRAINING"></a>
|
||||
### TRAINING
|
||||
|
||||
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
|
||||
|
@ -143,7 +163,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
|
|||
# Training icdar15 English data and saving the log as train_rec.log
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
|
||||
```
|
||||
|
||||
<a name="Data_Augmentation"></a>
|
||||
- Data Augmentation
|
||||
|
||||
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
|
||||
|
@ -152,7 +172,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
|
|||
|
||||
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
|
||||
|
||||
|
||||
<a name="Training"></a>
|
||||
- Training
|
||||
|
||||
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
|
||||
|
@ -215,7 +235,8 @@ Optimizer:
|
|||
```
|
||||
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
|
||||
|
||||
-Minor language
|
||||
<a name="Multi_language"></a>
|
||||
- Multi-language
|
||||
|
||||
PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are:
|
||||
|
||||
|
@ -250,7 +271,7 @@ Global:
|
|||
...
|
||||
```
|
||||
|
||||
|
||||
<a name="EVALUATION"></a>
|
||||
### EVALUATION
|
||||
|
||||
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
|
||||
|
@ -261,8 +282,10 @@ export CUDA_VISIBLE_DEVICES=0
|
|||
python3 tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
<a name="PREDICTION"></a>
|
||||
### PREDICTION
|
||||
|
||||
<a name="Training_engine_prediction"></a>
|
||||
* Training engine prediction
|
||||
|
||||
Using the model trained by paddleocr, you can quickly get prediction through the following script.
|
||||
|
|
Loading…
Reference in New Issue