Merge remote-tracking branch 'origin/dygraph' into dygraph
This commit is contained in:
commit
3db92cf3eb
47
README.md
47
README.md
|
@ -22,27 +22,21 @@ English | [简体中文](README_ch.md)
|
|||
|
||||
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
## Notice
|
||||
PaddleOCR supports both dynamic graph and static graph programming paradigm
|
||||
- Dynamic graph: dygraph branch (default), **supported by paddle 2.0.0 ([installation](./doc/doc_en/installation_en.md))**
|
||||
- Static graph: develop branch
|
||||
|
||||
**Recent updates**
|
||||
|
||||
- PaddleOCR R&D team would like to share the released tools with developers, at 20:15 pm on August 4th, [Live Address](https://live.bilibili.com/21689802).
|
||||
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Live Address](https://live.bilibili.com/21689802).
|
||||
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
|
||||
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
|
||||
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
|
||||
|
||||
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
|
||||
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](./StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
|
||||
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
|
||||
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## Features
|
||||
- PPOCR series of high-quality pre-trained models, comparable to commercial effects
|
||||
- Ultra lightweight ppocr_mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
|
||||
- General ppocr_server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
|
||||
- PP-OCR series of high-quality pre-trained models, comparable to commercial effects
|
||||
- Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
|
||||
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
|
||||
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Rich toolkits related to the OCR areas
|
||||
|
@ -88,16 +82,16 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
|
|||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
|
||||
## PP-OCR 2.0 series model list(Update on Dec 15)
|
||||
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
|
||||
## PP-OCR series model list(Update on September 8th)
|
||||
|
||||
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
|
||||
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||||
| Chinese and English ultra-lightweight OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| Chinese and English general OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_PP-OCRv2_xx |Mobile&Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/ch/ch_PP-OCRv2_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
|
||||
| Chinese and English ultra-lightweight PP-OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md).
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md).
|
||||
|
||||
For a new language request, please refer to [Guideline for new language_requests](#language_requests).
|
||||
|
||||
|
@ -109,13 +103,12 @@ For a new language request, please refer to [Guideline for new language_requests
|
|||
- [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md)
|
||||
- [PP-OCR Model Download](./doc/doc_en/models_list_en.md)
|
||||
- [Yml Configuration](./doc/doc_en/config_en.md)
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md)
|
||||
- [PP-OCR Training](./doc/doc_en/training_en.md)
|
||||
- [Text Detection](./doc/doc_en/detection_en.md)
|
||||
- [Text Recognition](./doc/doc_en/recognition_en.md)
|
||||
- [Direction Classification](./doc/doc_en/angle_class_en.md)
|
||||
- Inference and Deployment
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving](./deploy/pdserving/README.md)
|
||||
- [Mobile](./deploy/lite/readme_en.md)
|
||||
|
@ -126,6 +119,7 @@ For a new language request, please refer to [Guideline for new language_requests
|
|||
- Academic Circles
|
||||
- [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [PGNet Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- Data Annotation and Synthesis
|
||||
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
|
||||
- [Data Synthesis Tool: Style-Text](./StyleText/README.md)
|
||||
|
@ -143,17 +137,18 @@ For a new language request, please refer to [Guideline for new language_requests
|
|||
- [License](#LICENSE)
|
||||
- [Contribution](#CONTRIBUTION)
|
||||
|
||||
<a name="PP-OCRv2"></a>
|
||||
|
||||
|
||||
<a name="PP-OCR-Pipeline"></a>
|
||||
|
||||
## PP-OCR Pipeline
|
||||
|
||||
## PP-OCRv2 Pipeline
|
||||
<div align="center">
|
||||
<img src="./doc/ppocr_framework.png" width="800">
|
||||
<img src="./doc/ppocrv2_framework.jpg" width="800">
|
||||
</div>
|
||||
|
||||
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection[2], detection frame correction and CRNN text recognition[7]. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner [8] and PACT quantization [9] is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
|
||||
[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
|
||||
|
||||
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (arXiv link is coming soon).
|
||||
|
||||
|
||||
|
||||
|
||||
## Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
|
|
41
README_ch.md
41
README_ch.md
|
@ -21,17 +21,11 @@
|
|||
## 简介
|
||||
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
## 注意
|
||||
PaddleOCR同时支持动态图与静态图两种编程范式
|
||||
|
||||
- 动态图版本:release/2.3(默认分支,开发分支为dygraph分支),需将paddle版本升级至2.0.0或以上版本([快速安装](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/installation.md))
|
||||
|
||||
- 静态图版本:develop分支
|
||||
|
||||
**近期更新**
|
||||
|
||||
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[直播地址](https://live.bilibili.com/21689802)。
|
||||
- 2021.9.7 发布PaddleOCR v2.3,发布PP-OCRv2算法,CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
|
||||
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
|
||||
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
|
||||
|
@ -39,15 +33,16 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
|
||||
## 特性
|
||||
|
||||
- PPOCR系列高质量预训练模型,准确的识别效果
|
||||
- 超轻量ppocrv2系列:检测(3.1M)+ 识别(8.5M)= 11.6M
|
||||
- 超轻量ppocr_mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
|
||||
- 通用ppocr_server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
|
||||
- PP-OCR系列高质量预训练模型,准确的识别效果
|
||||
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
|
||||
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
|
||||
- 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
|
||||
- 支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 支持多语言识别:韩语、日语、德语、法语
|
||||
- 丰富易用的OCR相关工具组件
|
||||
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
|
||||
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
|
||||
- 文档分析能力PP-Structure:版面分析与表格识别
|
||||
- 支持用户自定义训练,提供丰富的预测推理部署方案
|
||||
- 支持PIP快速安装使用
|
||||
- 可运行于Linux、Windows、MacOS等多种系统
|
||||
|
@ -59,7 +54,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
</div>
|
||||
|
||||
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
上图是通用PP-OCR server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
|
@ -82,15 +77,15 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始
|
||||
|
||||
<a name="模型下载"></a>
|
||||
## PP-OCR 2.0系列模型列表(更新中)
|
||||
**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
|
||||
## PP-OCR系列模型列表(更新中)
|
||||
|
||||
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
|
||||
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
|
||||
| 中英文超轻量OCR模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| 中英文通用OCR模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/chinese/ch_PP-OCRv2_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
|
||||
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| 中英文通用PP-OCR server模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](./doc/doc_ch/models_list.md)
|
||||
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md)
|
||||
|
||||
## 文档教程
|
||||
- [运行环境准备](./doc/doc_ch/environment.md)
|
||||
|
@ -100,13 +95,12 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
- [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md)
|
||||
- [PP-OCR模型下载](./doc/doc_ch/models_list.md)
|
||||
- [配置文件内容与生成](./doc/doc_ch/config.md)
|
||||
- [模型库快速使用](./doc/doc_ch/inference.md)
|
||||
- [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
|
||||
- [PP-OCR模型训练](./doc/doc_ch/training.md)
|
||||
- [文本检测](./doc/doc_ch/detection.md)
|
||||
- [文本识别](./doc/doc_ch/recognition.md)
|
||||
- [方向分类器](./doc/doc_ch/angle_class.md)
|
||||
- PP-OCR模型推理部署
|
||||
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
|
||||
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
|
||||
- [服务化部署](./deploy/pdserving/README_CN.md)
|
||||
- [端侧部署](./deploy/lite/readme.md)
|
||||
|
@ -122,6 +116,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
- OCR学术圈
|
||||
- [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
|
||||
- [端到端PGNet算法](./doc/doc_ch/pgnet.md)
|
||||
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
|
||||
- 数据集
|
||||
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
|
||||
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
|
||||
|
@ -129,8 +124,8 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
- [效果展示](#效果展示)
|
||||
- FAQ
|
||||
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用32个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战110个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用50个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战183个问题](./doc/doc_ch/FAQ.md)
|
||||
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
|
||||
- [参考文献](./doc/doc_ch/reference.md)
|
||||
- [许可证书](#许可证书)
|
||||
|
@ -147,15 +142,13 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
|
||||
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
|
||||
|
||||
[2] PP-OCRv2在PP—OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案(arxiv链接生成中)。
|
||||
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案(arxiv链接生成中)。
|
||||
|
||||
|
||||
<a name="效果展示"></a>
|
||||
## 效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
- 中文模型
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
|
||||
</div>
|
||||
|
|
|
@ -8,7 +8,7 @@ Global:
|
|||
# evaluation is run every 5000 iterations after the 4000th iteration
|
||||
eval_batch_step: [3000, 2000]
|
||||
cal_metric_during_train: False
|
||||
pretrained_model: ./pretrain_models/ch_ppocr_mobile_v2.1_det_distill_train/best_accuracy
|
||||
pretrained_model: ./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
|
@ -128,4 +128,4 @@ Eval:
|
|||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 8
|
||||
use_shared_memory: False
|
||||
use_shared_memory: False
|
||||
|
|
|
@ -98,7 +98,7 @@ Train:
|
|||
shuffle: True
|
||||
drop_last: False
|
||||
batch_size_per_card: 16
|
||||
num_workers: 8
|
||||
num_workers: 4
|
||||
|
||||
Eval:
|
||||
dataset:
|
||||
|
@ -125,4 +125,4 @@ Eval:
|
|||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 8
|
||||
num_workers: 8
|
||||
|
|
|
@ -0,0 +1,99 @@
|
|||
Global:
|
||||
use_gpu: true
|
||||
epoch_num: 5
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 20
|
||||
save_model_dir: ./sar_rec
|
||||
save_epoch_step: 1
|
||||
# evaluation is run every 2000 iterations
|
||||
eval_batch_step: [0, 2000]
|
||||
cal_metric_during_train: True
|
||||
pretrained_model:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
||||
infer_img:
|
||||
# for data or label process
|
||||
character_dict_path: ppocr/utils/dict90.txt
|
||||
character_type: EN_symbol
|
||||
max_text_length: 30
|
||||
infer_mode: False
|
||||
use_space_char: False
|
||||
rm_symbol: True
|
||||
save_res_path: ./output/rec/predicts_sar.txt
|
||||
|
||||
Optimizer:
|
||||
name: Adam
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
lr:
|
||||
name: Piecewise
|
||||
decay_epochs: [3, 4]
|
||||
values: [0.001, 0.0001, 0.00001]
|
||||
regularizer:
|
||||
name: 'L2'
|
||||
factor: 0
|
||||
|
||||
Architecture:
|
||||
model_type: rec
|
||||
algorithm: SAR
|
||||
Transform:
|
||||
Backbone:
|
||||
name: ResNet31
|
||||
Head:
|
||||
name: SARHead
|
||||
|
||||
Loss:
|
||||
name: SARLoss
|
||||
|
||||
PostProcess:
|
||||
name: SARLabelDecode
|
||||
|
||||
Metric:
|
||||
name: RecMetric
|
||||
|
||||
|
||||
Train:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
label_file_list: ['./train_data/train_list.txt']
|
||||
data_dir: ./train_data/
|
||||
ratio_list: 1.0
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- SARLabelEncode: # Class handling label
|
||||
- SARRecResizeImg:
|
||||
image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
|
||||
width_downsample_ratio: 0.25
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
|
||||
loader:
|
||||
shuffle: True
|
||||
batch_size_per_card: 64
|
||||
drop_last: True
|
||||
num_workers: 8
|
||||
use_shared_memory: False
|
||||
|
||||
Eval:
|
||||
dataset:
|
||||
name: LMDBDataSet
|
||||
data_dir: ./eval_data/evaluation/
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- SARLabelEncode: # Class handling label
|
||||
- SARRecResizeImg:
|
||||
image_shape: [3, 48, 48, 160]
|
||||
width_downsample_ratio: 0.25
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 64
|
||||
num_workers: 4
|
||||
use_shared_memory: False
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 19 MiB |
|
@ -5,20 +5,20 @@ PaddleOCR在Windows 平台下基于`Visual Studio 2019 Community` 进行了测
|
|||
|
||||
## 前置条件
|
||||
* Visual Studio 2019
|
||||
* CUDA 9.0 / CUDA 10.0,cudnn 7+ (仅在使用GPU版本的预测库时需要)
|
||||
* CUDA 10.2,cudnn 7+ (仅在使用GPU版本的预测库时需要)
|
||||
* CMake 3.0+
|
||||
|
||||
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
|
||||
|
||||
**下面所有示例以工作目录为 `D:\projects`演示**。
|
||||
|
||||
### Step1: 下载PaddlePaddle C++ 预测库 fluid_inference
|
||||
### Step1: 下载PaddlePaddle C++ 预测库 paddle_inference
|
||||
|
||||
PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#windows)
|
||||
|
||||
解压后`D:\projects\fluid_inference`目录包含内容为:
|
||||
解压后`D:\projects\paddle_inference`目录包含内容为:
|
||||
```
|
||||
fluid_inference
|
||||
paddle_inference
|
||||
├── paddle # paddle核心库和头文件
|
||||
|
|
||||
├── third_party # 第三方依赖库和头文件
|
||||
|
@ -46,13 +46,13 @@ fluid_inference
|
|||
|
||||
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
|
||||
|
||||
3. 点击:`项目`->`cpp_inference_demo的CMake设置`
|
||||
3. 点击:`项目`->`CMake设置`
|
||||
|
||||
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
|
||||
|
||||
4. 点击`浏览`,分别设置编译选项指定`CUDA`、`CUDNN_LIB`、`OpenCV`、`Paddle预测库`的路径
|
||||
4. 分别设置编译选项指定`CUDA`、`CUDNN_LIB`、`OpenCV`、`Paddle预测库`的路径
|
||||
|
||||
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐,**使用9.0、10.0版本,不使用9.2、10.1等版本CUDA库**):
|
||||
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐):
|
||||
|
||||
| 参数名 | 含义 |
|
||||
| ---- | ---- |
|
||||
|
@ -67,6 +67,11 @@ fluid_inference
|
|||
|
||||
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
|
||||
|
||||
下面给出with GPU的配置示例:
|
||||
![step5](./vs2019_build_withgpu_config.png)
|
||||
**注意:**
|
||||
CMAKE_BACKWARDS的版本要根据平台安装cmake的版本进行设置。
|
||||
|
||||
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`。
|
||||
|
||||
5. 点击`生成`->`全部生成`
|
||||
|
@ -74,24 +79,34 @@ fluid_inference
|
|||
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
|
||||
|
||||
|
||||
### Step4: 预测及可视化
|
||||
### Step4: 预测
|
||||
|
||||
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
|
||||
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release\Release`目录下,打开`cmd`,并切换到`D:\projects\PaddleOCR\deploy\cpp_infer\`:
|
||||
|
||||
```
|
||||
cd D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release
|
||||
cd D:\projects\PaddleOCR\deploy\cpp_infer
|
||||
```
|
||||
可执行文件`ocr_system.exe`即为样例的预测程序,其主要使用方法如下
|
||||
可执行文件`ppocr.exe`即为样例的预测程序,其主要使用方法如下,更多使用方法可以参考[说明文档](../readme.md)`运行demo`部分。
|
||||
|
||||
```shell
|
||||
#预测图片 `D:\projects\PaddleOCR\doc\imgs\10.jpg`
|
||||
.\ocr_system.exe D:\projects\PaddleOCR\deploy\cpp_infer\tools\config.txt D:\projects\PaddleOCR\doc\imgs\10.jpg
|
||||
#识别中文图片 `D:\projects\PaddleOCR\doc\imgs_words\ch\`
|
||||
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\ch_ppocr_mobile_v2.0_rec_infer --image_dir=D:\projects\PaddleOCR\doc\imgs_words\ch\
|
||||
|
||||
#识别英文图片 'D:\projects\PaddleOCR\doc\imgs_words\en\'
|
||||
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\inference\rec_mv3crnn --image_dir=D:\projects\PaddleOCR\doc\imgs_words\en\ --char_list_file=D:\projects\PaddleOCR\ppocr\utils\dict\en_dict.txt
|
||||
```
|
||||
|
||||
第一个参数为配置文件路径,第二个参数为需要预测的图片路径。
|
||||
|
||||
第一个参数为配置文件路径,第二个参数为需要预测的图片路径,第三个参数为配置文本识别的字典。
|
||||
|
||||
|
||||
### 注意
|
||||
### FQA
|
||||
* 在Windows下的终端中执行文件exe时,可能会发生乱码的现象,此时需要在终端中输入`CHCP 65001`,将终端的编码方式由GBK编码(默认)改为UTF-8编码,更加具体的解释可以参考这篇博客:[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)。
|
||||
|
||||
* 编译时,如果报错`错误:C1083 无法打开包括文件:"dirent.h":No such file or directory`,可以参考该[文档](https://blog.csdn.net/Dora_blank/article/details/117740837#41_C1083_direnthNo_such_file_or_directory_54),新建`dirent.h`文件,并添加到`VC++`的包含目录中。
|
||||
* 编译时,如果报错`错误:C1083 无法打开包括文件:"dirent.h":No such file or directory`,可以参考该[文档](https://blog.csdn.net/Dora_blank/article/details/117740837#41_C1083_direnthNo_such_file_or_directory_54),新建`dirent.h`文件,并添加到`utility.cpp`的头文件引用中。同时修改`utility.cpp`70行:`lstat`改成`stat`。
|
||||
|
||||
* 编译时,如果报错`Autolog未定义`,新建`autolog.h`文件,内容为:[autolog.h](https://github.com/LDOUBLEV/AutoLog/blob/main/auto_log/autolog.h),并添加到`main.cpp`的头文件引用中,再次编译。
|
||||
|
||||
* 运行时,如果弹窗报错找不到`paddle_inference.dll`或者`openblas.dll`,在`D:\projects\paddle_inference`预测库内找到这两个文件,复制到`D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release\Release`目录下。不用重新编译,再次运行即可。
|
||||
|
||||
* 运行时,弹窗报错提示`应用程序无法正常启动(0xc0000142)`,并且`cmd`窗口内提示`You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found.`,把tensort目录下的lib里面的所有dll文件复制到release目录下,再次运行即可。
|
||||
|
|
|
@ -44,7 +44,7 @@ DEFINE_int32(cpu_threads, 10, "Num of threads with CPU.");
|
|||
DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
|
||||
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
|
||||
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
|
||||
DEFINE_bool(benchmark, true, "Whether use benchmark.");
|
||||
DEFINE_bool(benchmark, false, "Whether use benchmark.");
|
||||
DEFINE_string(save_log_path, "./log_output/", "Save benchmark log path.");
|
||||
// detection related
|
||||
DEFINE_string(image_dir, "", "Dir of input image.");
|
||||
|
@ -127,9 +127,15 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
|
|||
|
||||
int main_rec(std::vector<cv::String> cv_all_img_names) {
|
||||
std::vector<double> time_info = {0, 0, 0};
|
||||
|
||||
std::string char_list_file = FLAGS_char_list_file;
|
||||
if (FLAGS_benchmark)
|
||||
char_list_file = FLAGS_char_list_file.substr(6);
|
||||
cout << "label file: " << char_list_file << endl;
|
||||
|
||||
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
|
||||
FLAGS_gpu_mem, FLAGS_cpu_threads,
|
||||
FLAGS_enable_mkldnn, FLAGS_char_list_file,
|
||||
FLAGS_enable_mkldnn, char_list_file,
|
||||
FLAGS_use_tensorrt, FLAGS_precision);
|
||||
|
||||
for (int i = 0; i < cv_all_img_names.size(); ++i) {
|
||||
|
@ -148,12 +154,28 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
|
|||
time_info[1] += rec_times[1];
|
||||
time_info[2] += rec_times[2];
|
||||
}
|
||||
|
||||
|
||||
if (FLAGS_benchmark) {
|
||||
AutoLogger autolog("ocr_rec",
|
||||
FLAGS_use_gpu,
|
||||
FLAGS_use_tensorrt,
|
||||
FLAGS_enable_mkldnn,
|
||||
FLAGS_cpu_threads,
|
||||
1,
|
||||
"dynamic",
|
||||
FLAGS_precision,
|
||||
time_info,
|
||||
cv_all_img_names.size());
|
||||
autolog.report();
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
int main_system(std::vector<cv::String> cv_all_img_names) {
|
||||
std::vector<double> time_info_det = {0, 0, 0};
|
||||
std::vector<double> time_info_rec = {0, 0, 0};
|
||||
|
||||
DBDetector det(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
|
||||
FLAGS_gpu_mem, FLAGS_cpu_threads,
|
||||
FLAGS_enable_mkldnn, FLAGS_max_side_len, FLAGS_det_db_thresh,
|
||||
|
@ -169,17 +191,20 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
|
|||
FLAGS_use_tensorrt, FLAGS_precision);
|
||||
}
|
||||
|
||||
std::string char_list_file = FLAGS_char_list_file;
|
||||
if (FLAGS_benchmark)
|
||||
char_list_file = FLAGS_char_list_file.substr(6);
|
||||
cout << "label file: " << char_list_file << endl;
|
||||
|
||||
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
|
||||
FLAGS_gpu_mem, FLAGS_cpu_threads,
|
||||
FLAGS_enable_mkldnn, FLAGS_char_list_file,
|
||||
FLAGS_enable_mkldnn, char_list_file,
|
||||
FLAGS_use_tensorrt, FLAGS_precision);
|
||||
|
||||
auto start = std::chrono::system_clock::now();
|
||||
|
||||
for (int i = 0; i < cv_all_img_names.size(); ++i) {
|
||||
LOG(INFO) << "The predict img: " << cv_all_img_names[i];
|
||||
|
||||
cv::Mat srcimg = cv::imread(FLAGS_image_dir, cv::IMREAD_COLOR);
|
||||
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
|
||||
if (!srcimg.data) {
|
||||
std::cerr << "[ERROR] image read failed! image path: " << cv_all_img_names[i] << endl;
|
||||
exit(1);
|
||||
|
@ -189,7 +214,10 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
|
|||
std::vector<double> rec_times;
|
||||
|
||||
det.Run(srcimg, boxes, &det_times);
|
||||
|
||||
time_info_det[0] += det_times[0];
|
||||
time_info_det[1] += det_times[1];
|
||||
time_info_det[2] += det_times[2];
|
||||
|
||||
cv::Mat crop_img;
|
||||
for (int j = 0; j < boxes.size(); j++) {
|
||||
crop_img = Utility::GetRotateCropImage(srcimg, boxes[j]);
|
||||
|
@ -198,18 +226,36 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
|
|||
crop_img = cls->Run(crop_img);
|
||||
}
|
||||
rec.Run(crop_img, &rec_times);
|
||||
time_info_rec[0] += rec_times[0];
|
||||
time_info_rec[1] += rec_times[1];
|
||||
time_info_rec[2] += rec_times[2];
|
||||
}
|
||||
|
||||
auto end = std::chrono::system_clock::now();
|
||||
auto duration =
|
||||
std::chrono::duration_cast<std::chrono::microseconds>(end - start);
|
||||
std::cout << "Cost "
|
||||
<< double(duration.count()) *
|
||||
std::chrono::microseconds::period::num /
|
||||
std::chrono::microseconds::period::den
|
||||
<< "s" << std::endl;
|
||||
}
|
||||
|
||||
if (FLAGS_benchmark) {
|
||||
AutoLogger autolog_det("ocr_det",
|
||||
FLAGS_use_gpu,
|
||||
FLAGS_use_tensorrt,
|
||||
FLAGS_enable_mkldnn,
|
||||
FLAGS_cpu_threads,
|
||||
1,
|
||||
"dynamic",
|
||||
FLAGS_precision,
|
||||
time_info_det,
|
||||
cv_all_img_names.size());
|
||||
AutoLogger autolog_rec("ocr_rec",
|
||||
FLAGS_use_gpu,
|
||||
FLAGS_use_tensorrt,
|
||||
FLAGS_enable_mkldnn,
|
||||
FLAGS_cpu_threads,
|
||||
1,
|
||||
"dynamic",
|
||||
FLAGS_precision,
|
||||
time_info_rec,
|
||||
cv_all_img_names.size());
|
||||
autolog_det.report();
|
||||
std::cout << endl;
|
||||
autolog_rec.report();
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
|
|
@ -0,0 +1,77 @@
|
|||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from paddle_serving_server.web_service import WebService, Op
|
||||
|
||||
import logging
|
||||
import numpy as np
|
||||
import cv2
|
||||
import base64
|
||||
# from paddle_serving_app.reader import OCRReader
|
||||
from ocr_reader import OCRReader, DetResizeForTest
|
||||
from paddle_serving_app.reader import Sequential, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
|
||||
|
||||
_LOGGER = logging.getLogger()
|
||||
|
||||
|
||||
class DetOp(Op):
|
||||
def init_op(self):
|
||||
self.det_preprocess = Sequential([
|
||||
DetResizeForTest(), Div(255),
|
||||
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
|
||||
(2, 0, 1))
|
||||
])
|
||||
self.filter_func = FilterBoxes(10, 10)
|
||||
self.post_func = DBPostProcess({
|
||||
"thresh": 0.3,
|
||||
"box_thresh": 0.5,
|
||||
"max_candidates": 1000,
|
||||
"unclip_ratio": 1.5,
|
||||
"min_size": 3
|
||||
})
|
||||
|
||||
def preprocess(self, input_dicts, data_id, log_id):
|
||||
(_, input_dict), = input_dicts.items()
|
||||
data = base64.b64decode(input_dict["image"].encode('utf8'))
|
||||
self.raw_im = data
|
||||
data = np.fromstring(data, np.uint8)
|
||||
# Note: class variables(self.var) can only be used in process op mode
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
self.ori_h, self.ori_w, _ = im.shape
|
||||
det_img = self.det_preprocess(im)
|
||||
_, self.new_h, self.new_w = det_img.shape
|
||||
return {"x": det_img[np.newaxis, :].copy()}, False, None, ""
|
||||
|
||||
def postprocess(self, input_dicts, fetch_dict, log_id):
|
||||
det_out = fetch_dict["save_infer_model/scale_0.tmp_1"]
|
||||
ratio_list = [
|
||||
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
|
||||
]
|
||||
dt_boxes_list = self.post_func(det_out, [ratio_list])
|
||||
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
|
||||
out_dict = {"dt_boxes": str(dt_boxes)}
|
||||
|
||||
return out_dict, None, ""
|
||||
|
||||
|
||||
class OcrService(WebService):
|
||||
def get_pipeline_response(self, read_op):
|
||||
det_op = DetOp(name="det", input_ops=[read_op])
|
||||
return det_op
|
||||
|
||||
|
||||
uci_service = OcrService(name="ocr")
|
||||
uci_service.prepare_pipeline_config("config.yml")
|
||||
uci_service.run_service()
|
|
@ -0,0 +1,86 @@
|
|||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from paddle_serving_server.web_service import WebService, Op
|
||||
|
||||
import logging
|
||||
import numpy as np
|
||||
import cv2
|
||||
import base64
|
||||
# from paddle_serving_app.reader import OCRReader
|
||||
from ocr_reader import OCRReader, DetResizeForTest
|
||||
from paddle_serving_app.reader import Sequential, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
|
||||
_LOGGER = logging.getLogger()
|
||||
|
||||
|
||||
class RecOp(Op):
|
||||
def init_op(self):
|
||||
self.ocr_reader = OCRReader(
|
||||
char_dict_path="../../ppocr/utils/ppocr_keys_v1.txt")
|
||||
|
||||
def preprocess(self, input_dicts, data_id, log_id):
|
||||
(_, input_dict), = input_dicts.items()
|
||||
raw_im = base64.b64decode(input_dict["image"].encode('utf8'))
|
||||
data = np.fromstring(raw_im, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
feed_list = []
|
||||
max_wh_ratio = 0
|
||||
## Many mini-batchs, the type of feed_data is list.
|
||||
max_batch_size = 6 # len(dt_boxes)
|
||||
|
||||
# If max_batch_size is 0, skipping predict stage
|
||||
if max_batch_size == 0:
|
||||
return {}, True, None, ""
|
||||
boxes_size = max_batch_size
|
||||
rem = boxes_size % max_batch_size
|
||||
|
||||
h, w = im.shape[0:2]
|
||||
wh_ratio = w * 1.0 / h
|
||||
max_wh_ratio = max(max_wh_ratio, wh_ratio)
|
||||
_, w, h = self.ocr_reader.resize_norm_img(im, max_wh_ratio).shape
|
||||
norm_img = self.ocr_reader.resize_norm_img(im, max_batch_size)
|
||||
norm_img = norm_img[np.newaxis, :]
|
||||
feed = {"x": norm_img.copy()}
|
||||
feed_list.append(feed)
|
||||
return feed_list, False, None, ""
|
||||
|
||||
def postprocess(self, input_dicts, fetch_data, log_id):
|
||||
res_list = []
|
||||
if isinstance(fetch_data, dict):
|
||||
if len(fetch_data) > 0:
|
||||
rec_batch_res = self.ocr_reader.postprocess(
|
||||
fetch_data, with_score=True)
|
||||
for res in rec_batch_res:
|
||||
res_list.append(res[0])
|
||||
elif isinstance(fetch_data, list):
|
||||
for one_batch in fetch_data:
|
||||
one_batch_res = self.ocr_reader.postprocess(
|
||||
one_batch, with_score=True)
|
||||
for res in one_batch_res:
|
||||
res_list.append(res[0])
|
||||
|
||||
res = {"res": str(res_list)}
|
||||
return res, None, ""
|
||||
|
||||
|
||||
class OcrService(WebService):
|
||||
def get_pipeline_response(self, read_op):
|
||||
rec_op = RecOp(name="rec", input_ops=[read_op])
|
||||
return rec_op
|
||||
|
||||
|
||||
uci_service = OcrService(name="ocr")
|
||||
uci_service.prepare_pipeline_config("config.yml")
|
||||
uci_service.run_service()
|
Before Width: | Height: | Size: 80 KiB After Width: | Height: | Size: 80 KiB |
|
@ -9,38 +9,42 @@
|
|||
|
||||
## PaddleOCR常见问题汇总(持续更新)
|
||||
|
||||
* [近期更新(2021.2.1)](#近期更新)
|
||||
* [近期更新(2021.6.29)](#近期更新)
|
||||
* [【精选】OCR精选10个问题](#OCR精选10个问题)
|
||||
* [【理论篇】OCR通用32个问题](#OCR通用问题)
|
||||
* [基础知识7题](#基础知识)
|
||||
* [数据集7题](#数据集2)
|
||||
* [模型训练调优18题](#模型训练调优2)
|
||||
* [【实战篇】PaddleOCR实战120个问题](#PaddleOCR实战问题)
|
||||
* [使用咨询38题](#使用咨询)
|
||||
* [数据集18题](#数据集3)
|
||||
* [模型训练调优30题](#模型训练调优3)
|
||||
* [预测部署34题](#预测部署3)
|
||||
|
||||
* [【理论篇】OCR通用51个问题](#OCR通用问题)
|
||||
* [基础知识16题](#基础知识)
|
||||
* [数据集10题](#数据集2)
|
||||
* [模型训练调优25题](#模型训练调优2)
|
||||
* [【实战篇】PaddleOCR实战187个问题](#PaddleOCR实战问题)
|
||||
* [使用咨询80题](#使用咨询)
|
||||
* [数据集19题](#数据集3)
|
||||
* [模型训练调优39题](#模型训练调优3)
|
||||
* [预测部署49题](#预测部署3)
|
||||
|
||||
<a name="近期更新"></a>
|
||||
## 近期更新(2021.2.1)
|
||||
## 近期更新(2021.6.29)
|
||||
|
||||
#### Q3.2.18: PaddleOCR动态图版本如何finetune?
|
||||
**A**:finetune需要将配置文件里的 Global.load_static_weights设置为false,如果没有此字段可以手动添加,然后将模型地址放到Global.pretrained_model字段下即可。
|
||||
#### Q2.3.25: 图像正常识别出来的文字是OK的,旋转90度后识别出来的结果比较差,有什么方法可以优化?
|
||||
A: 整图旋转90之后效果变差是有可能的,因为目前PPOCR默认输入的图片是正向的; 可以自己训练一个整图的方向分类器,放在预测的最前端(可以参照现有方向分类器的方式),或者可以基于规则做一些预处理,比如判断长宽等等。
|
||||
|
||||
#### Q3.1.78: 在线demo支持阿拉伯语吗
|
||||
**A**: 在线demo目前只支持中英文, 多语言的都需要通过whl包自行处理
|
||||
|
||||
#### Q3.3.29: 微调v1.1预训练的模型,可以直接用文字垂直排列和上下颠倒的图片吗?还是必须要水平排列的?
|
||||
**A**:1.1和2.0的模型一样,微调时,垂直排列的文字需要逆时针旋转 90° 后加入训练,上下颠倒的需要旋转为水平的。
|
||||
#### Q3.1.79: 某个类别的样本比较少,通过增加训练的迭代次数或者是epoch,变相增加小样本的数目,这样能缓解这个问题么?
|
||||
**A**: 尽量保证类别均衡, 某些类别样本少,可以通过补充合成数据的方式处理;实验证明训练集中出现频次较少的字符,识别效果会比较差,增加迭代次数不能改变样本量少的问题。
|
||||
|
||||
#### Q3.3.30: 模型训练过程中如何得到 best_accuracy 模型?
|
||||
**A**:配置文件里的eval_batch_step字段用来控制多少次iter进行一次eval,在eval完成后会自动生成 best_accuracy 模型,所以如果希望很快就能拿到best_accuracy模型,可以将eval_batch_step改小一点(例如,10)。
|
||||
#### Q3.1.80: 想把简历上的文字识别出来后,能够把关系一一对应起来,比如姓名和它后面的名字组成一对,籍贯、邮箱、学历等等都和各自的内容关联起来,这个应该如何处理,PPOCR目前支持吗?
|
||||
**A**: 这样的需求在企业应用中确实比较常见,但往往都是个性化的需求,没有非常规整统一的处理方式。常见的处理方式有如下两种:
|
||||
1. 对于单一版式、或者版式差异不大的应用场景,可以基于识别场景的一些先验信息,将识别内容进行配对; 比如运用表单结构信息:常见表单"姓名"关键字的后面,往往紧跟的就是名字信息
|
||||
2. 对于版式多样,或者无固定版式的场景, 需要借助于NLP中的NER技术,给识别内容中的某些字段,赋予key值
|
||||
|
||||
#### Q3.4.33: 如何多进程运行paddleocr?
|
||||
**A**:实例化多个paddleocr服务,然后将服务注册到注册中心,之后通过注册中心统一调度即可,关于注册中心,可以搜索eureka了解一下具体使用,其他的注册中心也行。
|
||||
由于这部分需求和业务场景强相关,难以用一个统一的模型去处理,目前PPOCR暂不支持。 如果需要用到NER技术,可以参照Paddle团队的另一个开源套件: https://github.com/PaddlePaddle/ERNIE, 其提供的预训练模型ERNIE, 可以帮助提升NER任务的准确率。
|
||||
|
||||
|
||||
#### Q3.4.34: 2.0训练出来的模型,能否在1.1版本上进行部署?
|
||||
**A**:这个是不建议的,2.0训练出来的模型建议使用dygraph分支里提供的部署代码。
|
||||
#### Q3.4.49: 同一个模型,c++部署和python部署方式,出来的结果不一致,如何定位?
|
||||
**A**:有如下几个Debug经验:
|
||||
1. 优先对一下几个阈值参数是否一致;
|
||||
2. 排查一下c++代码和python代码的预处理和后处理方式是否一致;
|
||||
3. 用python在模型输入输出各保存一下二进制文件,排除inference的差异性
|
||||
|
||||
<a name="OCR精选10个问题"></a>
|
||||
## 【精选】OCR精选10个问题
|
||||
|
@ -76,8 +80,7 @@
|
|||
|
||||
**A**:(1)在人眼确认可识别的条件下,对于背景有干扰的文字,首先要保证检测框足够准确,如果检测框不准确,需要考虑是否可以通过过滤颜色等方式对图像预处理并且增加更多相关的训练数据;在识别的部分,注意在训练数据中加入背景干扰类的扩增图像。
|
||||
|
||||
(2)如果MobileNet模型不能满足需求,可以尝试ResNet系列大模型来获得更好的效果
|
||||
。
|
||||
(2)如果MobileNet模型不能满足需求,可以尝试ResNet系列大模型来获得更好的效果。
|
||||
|
||||
#### Q1.1.6:OCR领域常用的评估指标是什么?
|
||||
|
||||
|
@ -125,7 +128,7 @@
|
|||
|
||||
#### Q1.1.10:PaddleOCR中,对于模型预测加速,CPU加速的途径有哪些?基于TenorRT加速GPU对输入有什么要求?
|
||||
|
||||
**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/tools/infer/utility.py#L84),对于cpp inference的话,在配置文件里面配置use_mkldnn 1即可,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/deploy/cpp_infer/tools/config.txt#L6)
|
||||
**A**:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/tools/infer/utility.py#L99),对于cpp inference的话,在配置文件里面配置use_mkldnn 1即可,[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/deploy/cpp_infer/tools/config.txt#L6)
|
||||
|
||||
(2)GPU需要注意变长输入问题等,TRT6 之后才支持变长输入
|
||||
|
||||
|
@ -161,6 +164,39 @@
|
|||
|
||||
**A**:处理字符的时候,把多字符的当作一个字就行,字典中每行是一个字。
|
||||
|
||||
#### Q2.1.8: 端到端的场景文本识别方法大概分为几种?
|
||||
|
||||
**A**:端到端的场景文本识别方法大概分为2种:基于二阶段的方法和基于字符级别的方法。基于两阶段的方法一般先检测文本块,然后提取文本块中的特征用于识别,例如ABCNet;基于字符级别方法直接进行字符检测与识别,直接输出单词的文本框,字符框以及对应的字符类别,例如CharNet。
|
||||
|
||||
#### Q2.1.9: 二阶段的端到端的场景文本识别方法的不足有哪些?
|
||||
|
||||
**A**: 这类方法一般需要设计针对ROI提取特征的方法,而ROI操作一般比较耗时。
|
||||
|
||||
#### Q2.1.10: 基于字符级别的端到端的场景文本识别方法的不足有哪些?
|
||||
|
||||
**A**: 这类方法一方面训练时需要加入字符级别的数据,一般使用合成数据,但是合成数据和真实数据有分布Gap。另一方面,现有工作大多数假设文本阅读方向,从上到下,从左到右,没有解决文本方向预测问题。
|
||||
|
||||
#### Q2.1.11: AAAI 2021最新的端到端场景文本识别PGNet算法有什么特点?
|
||||
|
||||
**A**: PGNet不需要字符级别的标注,NMS操作以及ROI操作。同时提出预测文本行内的阅读顺序模块和基于图的修正模块来提升文本识别效果。该算法是百度自研,近期会在PaddleOCR开源。
|
||||
|
||||
#### Q2.1.12: PubTabNet 数据集关注的是什么问题?
|
||||
|
||||
**A**: PubTabNet是IBM提出的基于图片格式的表格识别数据集,包含 56.8 万张表格数据的图像,以及图像对应的 html 格式的注释。该数据集的发布推动了表格结构化算法的研发和落地应用。
|
||||
|
||||
#### Q2.1.13: PaddleOCR提供的文本识别算法包括哪些?
|
||||
**A**: PaddleOCR主要提供五种文本识别算法,包括CRNN\StarNet\RARE\Rosetta和SRN, 其中CRNN\StarNet和Rosetta是基于ctc的文字识别算法,RARE是基于attention的文字识别算法;SRN为百度自研的文本识别算法,引入了语义信息,显著提升了准确率。 详情可参照如下页面: [文本识别算法](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/algorithm_overview.md#%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
|
||||
|
||||
#### Q2.1.14: 在识别模型中,为什么降采样残差结构的stride为(2, 1)?
|
||||
**A**: stride为(2, 1),表示在图像y方向(高度方向)上stride为2,x方向(宽度方向)上为1。由于待识别的文本图像通常为长方形,这样只在高度方向做下采样,尽量保留宽度方向的序列信息,避免宽度方向下采样后丢失过多的文字信息。
|
||||
|
||||
#### Q2.1.15: 文本识别方法CRNN关键技术有哪些?
|
||||
**A**: CRNN 关键技术包括三部分。(1)CNN提取图像卷积特征。(2)深层双向LSTM网络,在卷积特征的基础上继续提取文字序列特征。(3)Connectionist Temporal Classification(CTC),解决训练时字符无法对齐的问题。
|
||||
|
||||
#### Q2.1.16: 百度自研的SRN文本识别方法特点有哪些?
|
||||
**A**: SRN文本识别方法特点主要有四个部分:(1)使用Transformer Units(TUs)模块加强图像卷积特征的表达能力。(2)提出Parallel Visual Attention Module(PVAM)模块挖掘特征之间的相互关系。(3)提出Global Semantic Reasoning Module(GSRM)模块挖掘识别结果语义相互关系。(4)提出Visual-Semantic Fusion Decoder(VSFD)模块有效融合PVAM提取的视觉特征和GSRM提取的语义特征。
|
||||
|
||||
|
||||
<a name="数据集2"></a>
|
||||
### 数据集
|
||||
|
||||
|
@ -192,6 +228,16 @@
|
|||
|
||||
**A**:SRNet是借鉴GAN中图像到图像转换、风格迁移的想法合成文本数据。不同于通用GAN的方法只选择一个分支,SRNet将文本合成任务分解为三个简单的子模块,提升合成数据的效果。这三个子模块为不带背景的文本风格迁移模块、背景抽取模块和融合模块。PaddleOCR计划将在2020年12月中旬开源基于SRNet的实用模型。
|
||||
|
||||
#### Q2.2.8: DBNet如果想使用多边形作为输入,数据标签格式应该如何设定?
|
||||
**A**:如果想使用多边形作为DBNet的输入,数据标签也应该用多边形来表示。这样子可以更好得拟合弯曲文本。PPOCRLabel暂时只支持矩形框标注和四边形框标注。
|
||||
|
||||
#### Q2.2.9: 端到端算法PGNet使用的是什么类型的数据集呢?
|
||||
**A**: PGNet目前可以使用四点标注数据集,也可以使用多点标注数据集(十四点),多点标注训练的效果要比四点的好,一种可以尝试的策略是先在四点数据集上训练,之后用多点数据集在此基础上继续训练。
|
||||
|
||||
#### Q2.2.10: 文档版面分析常用数据集有哪些?
|
||||
**A**: 文档版面分析常用数据集常用数据集有PubLayNet、TableBank word、TableBank latex等。
|
||||
|
||||
|
||||
<a name="模型训练调优2"></a>
|
||||
### 模型训练调优
|
||||
|
||||
|
@ -254,7 +300,7 @@
|
|||
|
||||
**A**:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
|
||||
#### Q3.12:如何识别带空格的英文行文本图像?
|
||||
#### Q2.3.12:如何识别带空格的英文行文本图像?
|
||||
|
||||
**A**:空格识别可以考虑以下两种方案:
|
||||
|
||||
|
@ -286,6 +332,33 @@
|
|||
|
||||
**A**:SE模块是MobileNetV3网络一个重要模块,目的是估计特征图每个特征通道重要性,给特征图每个特征分配权重,提高网络的表达能力。但是,对于文本检测,输入网络的分辨率比较大,一般是640\*640,利用SE模块估计特征图每个特征通道重要性比较困难,网络提升能力有限,但是该模块又比较耗时,因此在PP-OCR系统中,文本检测的骨干网络没有使用SE模块。实验也表明,当去掉SE模块,超轻量模型大小可以减小40%,文本检测效果基本不受影响。详细可以参考PP-OCR技术文章,https://arxiv.org/abs/2009.09941.
|
||||
|
||||
#### Q2.3.19: 参照文档做实际项目时,是重新训练还是在官方训练的基础上进行训练?具体如何操作?
|
||||
**A**: 基于官方提供的模型,进行finetune的话,收敛会更快一些。 具体操作上,以识别模型训练为例:如果修改了字符文件,可以设置pretraind_model为官方提供的预训练模型
|
||||
|
||||
#### Q2.3.20: 如何根据不同的硬件平台选用不同的backbone?
|
||||
**A**:在不同的硬件上,不同的backbone的速度优势不同,可以根据不同平台的速度-精度图来确定backbone,这里可以参考[PaddleClas模型速度-精度图](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/docs/zh_CN/models)。
|
||||
|
||||
#### Q2.3.21: 端到端算法PGNet是否支持中文识别,速度会很慢嘛?
|
||||
**A**:目前开源的PGNet算法模型主要是用于检测英文数字,对于中文的识别需要自己训练,大家可以使用开源的端到端中文数据集,而对于复杂文本(弯曲文本)的识别,也可以自己构造一批数据集针对进行训练,对于推理速度,可以先将模型转换为inference再进行预测,速度应该会相当可观。
|
||||
|
||||
#### Q2.3.22: 目前知识蒸馏有哪些主要的实践思路?
|
||||
|
||||
**A**:知识蒸馏即利用教师模型指导学生模型的训练,目前有3种主要的蒸馏思路:
|
||||
1. 基于输出结果的蒸馏,即让学生模型学习教师模型的软标签(分类或者OCR识别等任务中)或者概率热度图(分割等任务中)。
|
||||
2. 基于特征图的蒸馏,即让学生模型学习教师模型中间层的特征图,拟合中间层的一些特征。
|
||||
3. 基于关系的蒸馏,针对不同的样本(假设个数为N),教师模型会有不同的输出,那么可以基于不同样本的输出,计算一个NxN的相关性矩阵,可以让学生模型去学习教师模型关于不同样本的相关性矩阵。
|
||||
|
||||
当然,知识蒸馏方法日新月异,也欢迎大家提出更多的总结与建议。
|
||||
|
||||
#### Q2.3.23: 文档版面分析常用方法有哪些?
|
||||
**A**: 文档版面分析通常使用通用目标检测方法,包括Faster RCNN系列,YOLO系列等。面向产业实践,建议使用PaddleDetection中精度和效率出色的PP-YOLO v2目标检测方法进行训练。
|
||||
|
||||
#### Q2.3.24: 如何识别招牌或者广告图中的艺术字?
|
||||
**A**: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将改每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统。
|
||||
|
||||
#### Q2.3.25: 图像正常识别出来的文字是OK的,旋转90度后识别出来的结果就比较差,有什么方法可以优化?
|
||||
**A**: 整图旋转90之后效果变差是有可能的,因为目前PPOCR默认输入的图片是正向的; 可以自己训练一个整图的方向分类器,放在预测的最前端(可以参照现有方向分类器的方式),或者可以基于规则做一些预处理,比如判断长宽等等。
|
||||
|
||||
<a name="PaddleOCR实战问题"></a>
|
||||
## 【实战篇】PaddleOCR实战问题
|
||||
|
||||
|
@ -361,13 +434,13 @@
|
|||
(2)inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。
|
||||
|
||||
#### Q3.1.17:PaddleOCR开源的超轻量模型和通用OCR模型的区别?
|
||||
**A**:目前PaddleOCR开源了2个中文模型,分别是9.4M超轻量中文模型和通用中文OCR模型。两者对比信息如下:
|
||||
**A**:目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下:
|
||||
- 相同点:两者使用相同的**算法**和**训练数据**;
|
||||
- 不同点:不同之处在于**骨干网络**和**通道参数**,超轻量模型使用MobileNetV3作为骨干网络,通用模型使用Resnet50_vd作为检测模型backbone,Resnet34_vd作为识别模型backbone,具体参数差异可对比两种模型训练的配置文件.
|
||||
|
||||
|模型|骨干网络|检测训练配置|识别训练配置|
|
||||
|-|-|-|-|
|
||||
|9.4M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
|
||||
|8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
|
||||
|通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
|
||||
|
||||
#### Q3.1.18:如何加入自己的检测算法?
|
||||
|
@ -482,7 +555,239 @@ StyleText的用途主要是:提取style_image中的字体、背景等style信
|
|||
|
||||
**A**:Paddle版本问题,请安装2.0版本Paddle:pip install paddlepaddle==2.0.0。
|
||||
|
||||
#### Q3.1.39: 字典中没有的字应该如何标注,是用空格代替还是直接忽略掉?
|
||||
|
||||
**A**:可以直接按照图片内容标注,在编码的时候,会忽略掉字典中不存在的字符。
|
||||
|
||||
#### Q3.1.40: dygraph、release/2.0-rc1-0、release/2.0 这三个分支有什么区别?
|
||||
|
||||
**A**:dygraph是动态图分支,并且适配Paddle-develop,当然目前在Paddle2.0上也可以运行,新特性我们会在这里更新。
|
||||
release/2.0-rc1-0是基于Paddle 2.0rc1的稳定版本,release/2.0是基于Paddle2.0的稳定版本,如果希望版本或者代
|
||||
码稳定的话,建议使用release/2.0分支,如果希望可以实时拿到一些最新特性,建议使用dygraph分支。
|
||||
|
||||
#### Q3.1.41: style-text 融合模块的输入是生成的前景图像以及背景特征权重吗?
|
||||
|
||||
**A**:目前版本是直接输入两个图像进行融合的,没有用到feature_map,替换背景图片不会影响效果。
|
||||
|
||||
#### Q3.1.42: 训练识别任务的时候,在CPU上运行时,报错`The setting of Parameter-Server must has server_num or servers`。
|
||||
|
||||
**A**:这是训练任务启动方式不对造成的。
|
||||
|
||||
1. 在使用CPU或者单块GPU训练的时候,可以直接使用`python3 tools/train.py -c xxx.yml`的方式启动。
|
||||
2. 在使用多块GPU训练的时候,需要使用`distributed.launch`的方式启动,如`python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c xxx.yml`,这种方式需要安装NCCL库,如果没有的话会报错。
|
||||
|
||||
#### Q3.1.43:使用StyleText进行数据合成时,文本(TextInput)的长度远超StyleInput的长度,该怎么处理与合成呢?
|
||||
|
||||
**A**:在使用StyleText进行数据合成的时候,建议StyleInput的长度长于TextInput的长度。有2种方法可以处理上述问题:
|
||||
|
||||
1. 将StyleInput按列的方向进行复制与扩充,直到其超过TextInput的长度。
|
||||
2. 将TextInput进行裁剪,保证每段TextInput都稍短于StyleInput,分别合成之后,再拼接在一起。
|
||||
|
||||
实际使用中发现,使用第2种方法的效果在长文本合成的场景中的合成效果更好,StyleText中提供的也是第2种数据合成的逻辑。
|
||||
|
||||
|
||||
#### Q3.1.44: 文字识别训练,设置图像高度不等于32时报错
|
||||
|
||||
**A**:ctc decode的时候,输入需要是1维向量,因此降采样之后,建议特征图高度为1,ppocr中,特征图会降采样32倍,之后高度正好为1,所以有2种解决方案
|
||||
- 指定输入shape高度为32(推荐)
|
||||
- 在backbone的mv3中添加更多的降采样模块,保证输出的特征图高度为1
|
||||
|
||||
#### Q3.1.45: 增大batch_size模型训练速度没有明显提升
|
||||
|
||||
**A**:如果batch_size打得太大,加速效果不明显的话,可以试一下增大初始化内存的值,运行代码前设置环境变量:
|
||||
```
|
||||
export FLAGS_initial_cpu_memory_in_mb=2000 # 设置初始化内存约2G左右
|
||||
```
|
||||
|
||||
#### Q3.1.46: 动态图分支(dygraph,release/2.0),训练模型和推理模型效果不一致
|
||||
|
||||
**A**:当前问题表现为:使用训练完的模型直接测试结果较好,但是转换为inference model后,预测结果不一致;出现这个问题一般是两个原因:
|
||||
1. 预处理函数设置的不一致
|
||||
2. 后处理参数不一致
|
||||
repo中config.yml文件的前后处理参数和inference预测默认的超参数有不一致的地方,建议排查下训练模型预测和inference预测的前后处理,
|
||||
参考[issue](https://github.com/PaddlePaddle/PaddleOCR/issues/2080)。
|
||||
|
||||
#### Q3.1.47: paddleocr package 报错 FatalError: `Process abort signal` is detected by the operating system
|
||||
|
||||
**A**:首先,按照[安装文档](./installation.md)安装PaddleOCR的运行环境;另外,检查python环境,python3.6/3.8上可能会出现这个问题,建议用python3.7,
|
||||
参考[issue](https://github.com/PaddlePaddle/PaddleOCR/issues/2069)。
|
||||
|
||||
#### Q3.1.48: 下载的识别模型解压后缺失文件,没有期望的inference.pdiparams, inference.pdmodel等文件
|
||||
|
||||
**A**:用解压软件解压可能会出现这个问题,建议二次解压下或者用命令行解压`tar xf `
|
||||
|
||||
#### Q3.1.49: 只想要识别票据中的部分片段,重新训练它的话,只需要训练文本检测模型就可以了吗?问文本识别,方向分类还是用原来的模型这样可以吗?
|
||||
|
||||
**A**:可以的。PaddleOCR的检测、识别、方向分类器三个模型是独立的,在实际使用中可以优化和替换其中任何一个模型。
|
||||
|
||||
#### Q3.1.50: 为什么在checkpoints中load下载的预训练模型会报错?
|
||||
|
||||
**A**: 这里有两个不同的概念:
|
||||
- pretrained_model:指预训练模型,是已经训练完成的模型。这时会load预训练模型的参数,但并不会load学习率、优化器以及训练状态等。如果需要finetune,应该使用pretrained。
|
||||
- checkpoints:指之前训练的中间结果,例如前一次训练到了100个epoch,想接着训练。这时会load尝试所有信息,包括模型的参数,之前的状态等。
|
||||
|
||||
这里应该使用pretrained_model而不是checkpoints
|
||||
|
||||
#### Q3.1.51: 如何用PaddleOCR识别视频中的文字?
|
||||
|
||||
**A**: 目前PaddleOCR主要针对图像做处理,如果需要视频识别,可以先对视频抽帧,然后用PPOCR识别。
|
||||
|
||||
#### Q3.1.52: 相机采集的图像为四通道,应该如何处理?
|
||||
|
||||
**A**: 有两种方式处理:
|
||||
- 如果没有其他需要,可以在解码数据的时候指定模式为三通道,例如如果使用opencv,可以使用cv::imread(img_path, cv::IMREAD_COLOR)。
|
||||
- 如果其他模块需要处理四通道的图像,那也可以在输入PaddleOCR模块之前进行转换,例如使用cvCvtColor(&img,img3chan,CV_RGBA2RGB)。
|
||||
|
||||
#### Q3.1.53: 预测时提示图像过大,显存、内存溢出了,应该如何处理?
|
||||
**A**: 可以按照这个PR的修改来缓解显存、内存占用 [#2230](https://github.com/PaddlePaddle/PaddleOCR/pull/2230)
|
||||
|
||||
#### Q3.1.54: 用c++来部署,目前支持Paddle2.0的模型吗?
|
||||
**A**: PPOCR 2.0的模型在arm上运行可以参照该PR [#1877](https://github.com/PaddlePaddle/PaddleOCR/pull/1877)
|
||||
|
||||
#### Q3.1.55: 目前PaddleOCR有知识蒸馏的demo吗?
|
||||
**A**: 目前我们还没有提供PaddleOCR知识蒸馏的相关demo,PaddleClas开源了一个效果还不错的方案,可以移步[SSLD知识蒸馏方案](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/docs/zh_CN/advanced_tutorials/distillation/distillation.md), paper: https://arxiv.org/abs/2103.05959 关于PaddleOCR的蒸馏,我们也会在未来支持。
|
||||
|
||||
#### Q3.1.56: 在使用PPOCRLabel的时候,如何标注倾斜的文字?
|
||||
**A**: 如果矩形框标注后空白冗余较多,可以尝试PPOCRLabel提供的四点标注,可以标注各种倾斜角度的文本。
|
||||
|
||||
#### Q3.1.57: 端到端算法PGNet提供了两种后处理方式,两者之间有什么区别呢?
|
||||
**A**: 两种后处理的区别主要在于速度的推理,config中PostProcess有fast/slow两种模式,slow模式的后处理速度慢,精度相对较高,fast模式的后处理速度快,精度也在可接受的范围之内。建议使用速度快的后处理方式。
|
||||
|
||||
#### Q3.1.58: 使用PGNet进行eval报错?
|
||||
**A**: 需要注意,我们目前在release/2.1更新了评测代码,目前支持A,B两种评测模式:
|
||||
* A模式:该模式主要为了方便用户使用,与训练集一样的标注文件就可以正常进行eval操作, 代码中默认是A模式。
|
||||
* B模式:该模式主要为了保证我们的评测代码可以和Total Text官方的评测方式对齐,该模式下直接加载官方提供的mat文件进行eval。
|
||||
|
||||
#### Q3.1.59: 使用预训练模型进行预测,对于特定字符识别识别效果较差,怎么解决?
|
||||
**A**: 由于我们所提供的识别模型是基于通用大规模数据集进行训练的,部分字符可能在训练集中包含较少,因此您可以构建特定场景的数据集,基于我们提供的预训练模型进行微调。建议用于微调的数据集中,每个字符出现的样本数量不低于300,但同时需要注意不同字符的数量均衡。具体可以参考:[微调](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/recognition.md#2-%E5%90%AF%E5%8A%A8%E8%AE%AD%E7%BB%83)。
|
||||
|
||||
#### Q3.1.60: PGNet有中文预训练模型吗?
|
||||
**A**: 目前我们尚未提供针对中文的预训练模型,如有需要,可以尝试自己训练。具体需要修改的地方有:
|
||||
1. [config文件中](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/configs/e2e/e2e_r50_vd_pg.yml#L23-L24),字典文件路径及语种设置;
|
||||
1. [网络结构中](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/modeling/heads/e2e_pg_head.py#L181),`out_channels`修改为字典中的字符数目+1(考虑到空格);
|
||||
1. [loss中](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/losses/e2e_pg_loss.py#L93),修改`37`为字典中的字符数目+1(考虑到空格);
|
||||
|
||||
#### Q3.1.61: 用于PGNet的训练集,文本框的标注有要求吗?
|
||||
**A**: PGNet支持多点标注,比如4点、8点、14点等。但需要注意的是,标注点尽可能分布均匀(相邻标注点间隔距离均匀一致),且label文件中的标注点需要从标注框的左上角开始,按标注点顺时针顺序依次编写,以上问题都可能对训练精度造成影响。
|
||||
我们提供的,基于Total Text数据集的PGNet预训练模型使用了14点标注方式。
|
||||
|
||||
#### Q3.1.62: 弯曲文本(如略微形变的文档图像)漏检问题
|
||||
**A**: db后处理中计算文本框平均得分时,是求rectangle区域的平均分数,容易造成弯曲文本漏检,已新增求polygon区域的平均分数,会更准确,但速度有所降低,可按需选择,在相关pr中可查看[可视化对比效果](https://github.com/PaddlePaddle/PaddleOCR/pull/2604)。该功能通过参数 [det_db_score_mode](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L51)进行选择,参数值可选[`fast`(默认)、`slow`],`fast`对应原始的rectangle方式,`slow`对应polygon方式。感谢用户[buptlihang](https://github.com/buptlihang)提[pr](https://github.com/PaddlePaddle/PaddleOCR/pull/2574)帮助解决该问题🌹。
|
||||
|
||||
#### Q3.1.63: 请问端到端的pgnet相比于DB+CRNN在准确率上有优势吗?或者是pgnet最擅长的场景是什么场景呢?
|
||||
**A**: pgnet是端到端算法,检测识别一步到位,不用分开训练2个模型,也支持弯曲文本的识别,但是在中文上的效果还没有充分验证;db+crnn的验证更充分,应用相对成熟,常规非弯曲的文本都能解的不错。
|
||||
|
||||
#### Q3.1.64: config yml文件中的ratio_list参数的作用是什么?
|
||||
**A**: 在动态图中,ratio_list在有多个数据源的情况下使用,ratio_list中的每个值是每个epoch从对应数据源采样数据的比例。如ratio_list=[0.3,0.2],label_file_list=['data1','data2'],代表每个epoch的训练数据包含data1 30%的数据,和data2里 20%的数据,ratio_list中数值的和不需要等于1。ratio_list和label_file_list的长度必须一致。
|
||||
|
||||
静态图检测数据采样的逻辑与动态图不同,但基本不影响训练精度。
|
||||
|
||||
在静态图中,使用 检测 dataloader读取数据时,会先设置每个epoch的数据量,比如这里设置为1000,ratio_list中的值表示在1000中的占比,比如ratio_list是[0.3, 0.7],则表示使用两个数据源,每个epoch从第一个数据源采样1000*0.3=300张图,从第二个数据源采样700张图。ratio_list的值的和也不需要等于1。
|
||||
|
||||
#### Q3.1.65: 支持动态图模型的android和ios demo什么时候上线??
|
||||
**A**: 支持动态图模型的android demo已经合入dygraph分支,欢迎试用(https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/deploy/android_demo/README.md); ios demo暂时未提供动态图模型版本,可以基于静态图版本(https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/ios_demo)自行改造。
|
||||
|
||||
#### Q3.1.66: iaa里面添加的数据增强方式,是每张图像训练都会做增强还是随机的?如何添加一个数据增强方法?
|
||||
|
||||
**A**:iaa增强的训练配置参考:https://github.com/PaddlePaddle/PaddleOCR/blob/0ccc1720c252beb277b9e522a1b228eb6abffb8a/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L82,
|
||||
其中{ 'type': Fliplr, 'args': { 'p': 0.5 } } p是概率。新增数据增强,可以参考这个方法:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.1/doc/doc_ch/add_new_algorithm.md#%E6%95%B0%E6%8D%AE%E5%8A%A0%E8%BD%BD%E5%92%8C%E5%A4%84%E7%90%86
|
||||
|
||||
#### Q3.1.67: PGNet训练中文弯曲数据集,可视化时弯曲文本无法显示。
|
||||
|
||||
**A**: 可能是因为安装的OpenCV里,cv2.putText不能显示中文的原因,可以尝试用Pillow来添加显示中文,需要改draw_e2e_res函数里面的代码,可以参考如下代码:
|
||||
```
|
||||
box = box.astype(np.int32).reshape((-1, 1, 2))
|
||||
cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2)
|
||||
|
||||
from PIL import ImageFont, ImageDraw, Image
|
||||
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
|
||||
draw = ImageDraw.Draw(img)
|
||||
fontStyle = ImageFont.truetype(
|
||||
"font/msyh.ttc", 16, encoding="utf-8")
|
||||
draw.text((int(box[0, 0, 0]), int(box[0, 0, 1])), text, (0, 255, 0), font=fontStyle)
|
||||
|
||||
src_im= cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
|
||||
```
|
||||
#### Q3.1.68: 用PGNet做进行端到端训练时,数据集标注的点的个数必须都是统一一样的吗? 能不能随意标点数,只要能够按顺时针从左上角开始标这样?
|
||||
|
||||
**A**: 目前代码要求标注为统一的点数。
|
||||
|
||||
#### Q3.1.69: 怎么加速训练过程呢?
|
||||
|
||||
**A**:OCR模型训练过程中一般包含大量的数据增广,这些数据增广是比较耗时的,因此可以离线生成大量增广后的图像,直接送入网络进行训练,机器资源充足的情况下,也可以使用分布式训练的方法,可以参考[分布式训练教程文档](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/distributed_training.md)。
|
||||
|
||||
|
||||
#### Q3.1.70: 文字识别模型模型的输出矩阵需要进行解码才能得到识别的文本。代码中实现为preds_idx = preds.argmax(axis=2),也就是最佳路径解码法。这是一种贪心算法,是每一个时间步只将最大概率的字符作为当前时间步的预测输出,但得到的结果不一定是最好的。为什么不使用beam search这种方式进行解码呢?
|
||||
|
||||
**A**:实验发现,使用贪心的方法去做解码,识别精度影响不大,但是速度方面的优势比较明显,因此PaddleOCR中使用贪心算法去做识别的解码。
|
||||
|
||||
#### Q3.1.71: 遇到中英文识别模型不支持的字符,该如何对模型做微调?
|
||||
|
||||
**A**:如果希望识别中英文识别模型中不支持的字符,需要更新识别的字典,并完成微调过程。比如说如果希望模型能够进一步识别罗马数字,可以按照以下步骤完成模型微调过程。
|
||||
1. 准备中英文识别数据以及罗马数字的识别数据,用于训练,同时保证罗马数字和中英文识别数字的效果;
|
||||
2. 修改默认的字典文件,在后面添加罗马数字的字符;
|
||||
3. 下载PaddleOCR提供的预训练模型,配置预训练模型和数据的路径,开始训练。
|
||||
|
||||
|
||||
#### Q3.1.72: 文字识别主要有CRNN和Attention两种方式,但是在我们的说明文档中,CRNN有对应的论文,但是Attention没看到,这个具体在哪里呢?
|
||||
|
||||
**A**:文字识别主要有CTC和Attention两种方式,基于CTC的算法有CRNN、Rosetta、StarNet,基于Attention的方法有RARE、其他的算法PaddleOCR里没有提供复现代码。论文的链接可以参考:[PaddleOCR文本识别算法教程文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/algorithm_overview.md#%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
|
||||
|
||||
|
||||
#### Q3.1.73: 如何使用TensorRT加速PaddleOCR预测?
|
||||
|
||||
**A**: 目前paddle的dygraph分支已经支持了python和C++ TensorRT预测的代码,python端inference预测时把参数[--use_tensorrt=True](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/utility.py#L37)即可,
|
||||
C++TensorRT预测需要使用支持TRT的预测库并在编译时打开[-DWITH_TENSORRT=ON](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/deploy/cpp_infer/tools/build.sh#L15)。
|
||||
如果想修改其他分支代码支持TensorRT预测,可以参考[PR](https://github.com/PaddlePaddle/PaddleOCR/pull/2921)。
|
||||
|
||||
注:建议使用TensorRT大于等于6.1.0.5以上的版本。
|
||||
|
||||
#### Q3.1.74: ppocr检测效果不好,该如何优化?
|
||||
|
||||
**A**: 具体问题具体分析:
|
||||
1. 如果在你的场景上检测效果不可用,首选是在你的数据上做finetune训练;
|
||||
2. 如果图像过大,文字过于密集,建议不要过度压缩图像,可以尝试修改检测预处理的[resize逻辑](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/predict_det.py#L42),防止图像被过度压缩;
|
||||
3. 检测框大小过于紧贴文字或检测框过大,可以调整[db_unclip_ratio](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/utility.py#L51)这个参数,加大参数可以扩大检测框,减小参数可以减小检测框大小;
|
||||
4. 检测框存在很多漏检问题,可以减小DB检测后处理的阈值参数[det_db_box_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/utility.py#L50),防止一些检测框被过滤掉,也可以尝试设置[det_db_score_mode](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/utility.py#L54)为'slow';
|
||||
5. 其他方法可以选择[use_dilation](https://github.com/PaddlePaddle/PaddleOCR/blob/3ec57e8df9263de6fa897e33d2d91bc5d0849ef3/tools/infer/utility.py#L53)为True,对检测输出的feature map做膨胀处理,一般情况下,会有效果改善;
|
||||
|
||||
#### Q3.1.75: lite预测库和nb模型版本不匹配,该如何解决?
|
||||
|
||||
**A**: 如果可以正常预测就不用管,如果这个问题导致无法正常预测,可以尝试使用同一个commit的Paddle Lite代码编译预测库和opt文件,可以参考[移动端部署教程](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.1/deploy/lite/readme.md)。
|
||||
|
||||
#### Q3.1.76: 'SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.' 遇到这个错如何处理?
|
||||
|
||||
这个报错说明dataloader的时候报错了,如果是还未开始训练就报错,需要检查下数据和标签格式是不是对的,ppocr的数据标签格式为
|
||||
```
|
||||
" 图像文件名 json.dumps编码的图像标注信息"
|
||||
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
|
||||
```
|
||||
提供的标注文件格式如上,中间用"\t"分隔,不是四个空格分隔。
|
||||
|
||||
如果是训练期间报错了,需要检查下是不是遇到了异常数据,或者是共享内存不足导致了这个问题,可以使用tools/train.py中的test_reader进行调试,
|
||||
linux系统共享内存位于/dev/shm目录下,如果内存不足,可以清理/dev/shm目录, 另外,如果是使用docker,在创建镜像时,可通过设置参数--shm_size=8G 设置较大的共享内存。
|
||||
|
||||
#### Q3.1.77: 使用mkldnn加速预测时遇到 'Please compile with MKLDNN first to use MKLDNN'
|
||||
|
||||
**A**: 报错提示当前环境没有mkldnn,建议检查下当前CPU是否支持mlkdnn(MAC上是无法用mkldnn);另外的可能是使用的预测库不支持mkldnn,
|
||||
建议从[这里](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html#linux)下载支持mlkdnn的CPU预测库。
|
||||
|
||||
#### Q3.1.78: 在线demo支持阿拉伯语吗
|
||||
**A**: 在线demo目前只支持中英文, 多语言的都需要通过whl包自行处理
|
||||
|
||||
#### Q3.1.79: 某个类别的样本比较少,通过增加训练的迭代次数或者是epoch,变相增加小样本的数目,这样能缓解这个问题么?
|
||||
**A**: 尽量保证类别均衡, 某些类别样本少,可以通过补充合成数据的方式处理;实验证明训练集中出现频次较少的字符,识别效果会比较差,增加迭代次数不能改变样本量少的问题。
|
||||
|
||||
#### Q3.1.80: 想把简历上的文字识别出来后,能够把关系一一对应起来,比如姓名和它后面的名字组成一对,籍贯、邮箱、学历等等都和各自的内容关联起来,这个应该如何处理,PPOCR目前支持吗?
|
||||
**A**: 这样的需求在企业应用中确实比较常见,但往往都是个性化的需求,没有非常规整统一的处理方式。常见的处理方式有如下两种:
|
||||
1. 对于单一版式、或者版式差异不大的应用场景,可以基于识别场景的一些先验信息,将识别内容进行配对; 比如运用表单结构信息:常见表单"姓名"关键字的后面,往往紧跟的就是名字信息
|
||||
2. 对于版式多样,或者无固定版式的场景, 需要借助于NLP中的NER技术,给识别内容中的某些字段,赋予key值
|
||||
|
||||
由于这部分需求和业务场景强相关,难以用一个统一的模型去处理,目前PPOCR暂不支持。 如果需要用到NER技术,可以参照Paddle团队的另一个开源套件: https://github.com/PaddlePaddle/ERNIE, 其提供的预训练模型ERNIE, 可以帮助提升NER任务的准确率。
|
||||
|
||||
|
||||
<a name="数据集3"></a>
|
||||
|
||||
### 数据集
|
||||
|
||||
#### Q3.2.1:如何制作PaddleOCR支持的数据格式
|
||||
|
@ -576,6 +881,10 @@ StyleText的用途主要是:提取style_image中的字体、背景等style信
|
|||
#### Q3.2.18: PaddleOCR动态图版本如何finetune?
|
||||
**A**:finetune需要将配置文件里的 Global.load_static_weights设置为false,如果没有此字段可以手动添加,然后将模型地址放到Global.pretrained_model字段下即可。
|
||||
|
||||
#### Q3.2.19: 如何合成手写中文数据集?
|
||||
**A**: 手写数据集可以通过手写单字数据集合成得到。随机选取一定数量的单字图片和对应的label,将图片高度resize为随机的统一高度后拼接在一起,即可得到合成数据集。对于需要添加文字背景的情况,建议使用阈值化将单字图片的白色背景处理为透明背景,再与真实背景图进行合成。具体可以参考文档[手写数据集](https://github.com/PaddlePaddle/PaddleOCR/blob/a72d6f23be9979e0c103d911a9dca3e4613e8ccf/doc/doc_ch/handwritten_datasets.md)。
|
||||
|
||||
|
||||
<a name="模型训练调优3"></a>
|
||||
|
||||
### 模型训练调优
|
||||
|
@ -725,8 +1034,52 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
|
|||
**A**:1.1和2.0的模型一样,微调时,垂直排列的文字需要逆时针旋转 90°后加入训练,上下颠倒的需要旋转为水平的。
|
||||
|
||||
#### Q3.3.30: 模型训练过程中如何得到 best_accuracy 模型?
|
||||
|
||||
**A**:配置文件里的eval_batch_step字段用来控制多少次iter进行一次eval,在eval完成后会自动生成 best_accuracy 模型,所以如果希望很快就能拿到best_accuracy模型,可以将eval_batch_step改小一点,如改为[10,10],这样表示第10次迭代后,以后没隔10个迭代就进行一次模型的评估。
|
||||
|
||||
#### Q3.3.31: Cosine学习率的更新策略是怎样的?训练过程中为什么会在一个值上停很久?
|
||||
|
||||
**A**: Cosine学习率的说明可以参考[这里](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/CosineAnnealingDecay_cn.html#cosineannealingdecay)
|
||||
|
||||
在PaddleOCR中,为了让学习率更加平缓,我们将其中的epoch调整成了iter。
|
||||
学习率的更新会和总的iter数量有关。当iter比较大时,会经过较多iter才能看出学习率的值有变化。
|
||||
|
||||
#### Q3.3.32: 之前的CosineWarmup方法为什么不见了?
|
||||
|
||||
**A**: 我们对代码结构进行了调整,目前的Cosine可以覆盖原有的CosineWarmup的功能,只需要在配置文件中增加相应配置即可。
|
||||
例如下面的代码,可以设置warmup为2个epoch:
|
||||
```
|
||||
lr:
|
||||
name: Cosine
|
||||
learning_rate: 0.001
|
||||
warmup_epoch: 2
|
||||
```
|
||||
|
||||
#### Q3.3.33: 训练识别和检测时学习率要加上warmup,目的是什么?
|
||||
**A**: Warmup机制先使学习率从一个较小的值逐步升到一个较大的值,而不是直接就使用较大的学习率,这样有助于模型的稳定收敛。在OCR检测和OCR识别中,一般会带来精度~0.5%的提升。
|
||||
|
||||
#### Q3.3.34: 表格识别中,如何提高单字的识别结果?
|
||||
**A**: 首先需要确认一下检测模型有没有有效的检测出单个字符,如果没有的话,需要在训练集当中添加相应的单字数据集。
|
||||
|
||||
#### Q3.3.35: SRN训练不收敛(loss不降)或SRN训练acc一直为0。
|
||||
**A**: 如果loss下降不正常,需要确认没有修改yml文件中的image_shape,默认[1, 64, 256],代码中针对这个配置写死了,修改可能会造成无法收敛。如果确认参数无误,loss正常下降,可以多迭代一段时间观察下,开始acc为0是正常的。
|
||||
|
||||
#### Q3.3.36: 训练starnet网络,印章数据可以和非弯曲数据一起训练吗。
|
||||
**A**: 可以的,starnet里的tps模块会对印章图片进行校正,使其和非弯曲的图片一样。
|
||||
|
||||
#### Q3.3.37: 训练过程中,训练程序意外退出/挂起,应该如何解决?
|
||||
**A**: 考虑内存,显存(使用GPU训练的话)是否不足,可在配置文件中,将训练和评估的batch size调小一些。需要注意,训练batch size调小时,学习率learning rate也要调小,一般可按等比例调整。
|
||||
|
||||
#### Q3.3.38: 训练程序启动后直到结束,看不到训练过程log?
|
||||
**A**: 可以从以下三方面考虑:
|
||||
1. 检查训练进程是否正常退出、显存占用是否释放、是否有残留进程,如果确定是训练程序卡死,可以检查环境配置,遇到环境问题建议使用docker,可以参考说明文档[安装](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/installation.md)。
|
||||
2. 检查数据集的数据量是否太小,可调小batch size从而增加一个epoch中的训练step数量,或在训练config文件中,将参数print_batch_step改为1,即每一个step打印一次log信息。
|
||||
3. 如果使用私有数据集训练,可先用PaddleOCR提供/推荐的数据集进行训练,排查私有数据集是否存在问题。
|
||||
|
||||
#### Q3.3.39: 配置文件中的参数num workers是什么意思,应该如何设置?
|
||||
**A**: 训练数据的读取需要硬盘IO,而硬盘IO速度远小于GPU运算速度,为了避免数据读取成为训练速度瓶颈,可以使用多进程读取数据,num workers表示数据读取的进程数量,0表示不使用多进程读取。在Linux系统下,多进程读取数据时,进程间通信需要基于共享内存,因此使用多进程读取数据时,建议设置共享内存不低于2GB,最好可以达到8GB,此时,num workers可以设置为CPU核心数。如果机器硬件配置较低,或训练进程卡死、dataloader报错,可以将num workers设置为0,即不使用多进程读取数据。
|
||||
|
||||
|
||||
<a name="预测部署3"></a>
|
||||
|
||||
### 预测部署
|
||||
|
@ -771,10 +1124,6 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
|
|||
|
||||
**A**:在安卓APK上无法设置,没有暴露这个接口,如果使用的是PaddledOCR/deploy/lite/的demo,可以修改config.txt中的对应参数来设置
|
||||
|
||||
#### Q3.4.9:PaddleOCR模型是否可以转换成ONNX模型?
|
||||
|
||||
**A**:目前暂不支持转ONNX,相关工作在研发中。
|
||||
|
||||
#### Q3.4.10:使用opt工具对检测模型转换时报错 can not found op arguments for node conv2_b_attr
|
||||
|
||||
**A**:这个问题大概率是编译opt工具的Paddle-Lite不是develop分支,建议使用Paddle-Lite 的develop分支编译opt工具。
|
||||
|
@ -841,7 +1190,8 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
|
|||
**A**:使用EAST或SAST模型进行推理预测时,需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST",使用DB时不用指定是因为该参数默认值是"DB":https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43
|
||||
|
||||
#### Q3.4.25: PaddleOCR模型Python端预测和C++预测结果不一致?
|
||||
正常来说,python端预测和C++预测文本是一致的,如果预测结果差异较大,
|
||||
|
||||
**A**:正常来说,python端预测和C++预测文本是一致的,如果预测结果差异较大,
|
||||
建议首先排查diff出现在检测模型还是识别模型,或者尝试换其他模型是否有类似的问题。
|
||||
其次,检查python端和C++端数据处理部分是否存在差异,建议保存环境,更新PaddleOCR代码再试下。
|
||||
如果更新代码或者更新代码都没能解决,建议在PaddleOCR微信群里或者issue中抛出您的问题。
|
||||
|
@ -889,3 +1239,68 @@ Paddle2ONNX支持转换的[模型列表](https://github.com/PaddlePaddle/Paddle2
|
|||
|
||||
#### Q3.4.34: 2.0训练出来的模型,能否在1.1版本上进行部署?
|
||||
**A**:这个是不建议的,2.0训练出来的模型建议使用dygraph分支里提供的部署代码。
|
||||
|
||||
#### Q3.4.35: 怎么解决paddleOCR在T4卡上有越预测越慢的情况?
|
||||
**A**:
|
||||
1. T4 GPU没有主动散热,因此在测试的时候需要在每次infer之后需要sleep 30ms,否则机器容易因为过热而降频(inference速度会变慢),温度过高也有可能会导致宕机。
|
||||
2. T4在不使用的时候,也有可能会降频,因此在做benchmark的时候需要锁频,下面这两条命令可以进行锁频。
|
||||
```
|
||||
nvidia-smi -i 0 -pm ENABLED
|
||||
nvidia-smi --lock-gpu-clocks=1590 -i 0
|
||||
```
|
||||
|
||||
#### Q3.4.36: DB有些框太贴文本了反而去掉了一些文本的边角影响识别,这个问题有什么办法可以缓解吗?
|
||||
|
||||
**A**:可以把后处理的参数unclip_ratio适当调大一点。
|
||||
|
||||
#### Q3.4.37: 在windows上进行cpp inference的部署时,总是提示找不到`paddle_fluid.dll`和`opencv_world346.dll`,
|
||||
**A**:有2种方法可以解决这个问题:
|
||||
|
||||
1. 将paddle预测库和opencv库的地址添加到系统环境变量中。
|
||||
2. 将提示缺失的dll文件拷贝到编译产出的`ocr_system.exe`文件夹中。
|
||||
|
||||
|
||||
#### Q3.4.38:想在Mac上部署,从哪里下载预测库呢?
|
||||
|
||||
**A**:Mac上的Paddle预测库可以从这里下载:[https://paddle-inference-lib.bj.bcebos.com/mac/2.0.0/cpu_avx_openblas/paddle_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/mac/2.0.0/cpu_avx_openblas/paddle_inference.tgz)
|
||||
|
||||
|
||||
#### Q3.4.39:内网环境如何进行服务化部署呢?
|
||||
|
||||
**A**:仍然可以使用PaddleServing或者HubServing进行服务化部署,保证内网地址可以访问即可。
|
||||
|
||||
#### Q3.4.40: 使用hub_serving部署,延时较高,可能的原因是什么呀?
|
||||
|
||||
**A**: 首先,测试的时候第一张图延时较高,可以多测试几张然后观察后几张图的速度;其次,如果是在cpu端部署serving端模型(如backbone为ResNet34),耗时较慢,建议在cpu端部署mobile(如backbone为MobileNetV3)模型。
|
||||
|
||||
#### Q3.4.41: PaddleOCR支持tensorrt推理吗?
|
||||
**A**: 支持的,需要在编译的时候将CMakeLists.txt文件当中,将相关代码`option(WITH_TENSORRT "Compile demo with TensorRT." OFF)`的OFF改成ON。关于服务器端部署的更多设置,可以参考[飞桨官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/native_infer.html)
|
||||
|
||||
#### Q3.4.42: 在使用PaddleLite进行预测部署时,启动预测后卡死/手机死机?
|
||||
**A**: 请检查模型转换时所用PaddleLite的版本,和预测库的版本是否对齐。即PaddleLite版本为2.8,则预测库版本也要为2.8。
|
||||
|
||||
#### Q3.4.43: 预测时显存爆炸、内存泄漏问题?
|
||||
**A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)。
|
||||
|
||||
#### Q3.4.44: 如何多进程预测?
|
||||
**A**: 近期PaddleOCR新增了[多进程预测控制参数](https://github.com/PaddlePaddle/PaddleOCR/blob/a312647be716776c1aac33ff939ae358a39e8188/tools/infer/utility.py#L103),`use_mp`表示是否使用多进程,`total_process_num`表示在使用多进程时的进程数。具体使用方式请参考[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/inference.md#1-%E8%B6%85%E8%BD%BB%E9%87%8F%E4%B8%AD%E6%96%87ocr%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86)。
|
||||
|
||||
#### Q3.4.45: win下C++部署中文识别乱码的解决方法
|
||||
**A**: win下编码格式不是utf8,而ppocr_keys_v1.txt的编码格式的utf8,将ppocr_keys_v1.txt 的编码从utf-8修改为 Ansi 编码格式就行了。
|
||||
|
||||
#### Q3.4.46: windows 3060显卡GPU模式启动 加载模型慢。
|
||||
**A**: 30系列的显卡需要使用cuda11。
|
||||
|
||||
#### Q3.4.47: 请教如何优化检测阶段时长?
|
||||
|
||||
**A**: 预测单张图会慢一点,如果批量预测,第一张图比较慢,后面就快了,因为最开始一些初始化操作比较耗时。服务部署的话,访问一次后,后面再访问就不会初始化了,推理的话每次都需要初始化的。
|
||||
|
||||
#### Q3.4.48: paddle serving 本地启动调用失败,怎么判断是否正常工作?
|
||||
|
||||
**A**:没有打印出预测结果,说明启动失败。可以参考这篇文档重新配置下动态图的paddle serving:https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/deploy/pdserving/README_CN.md
|
||||
|
||||
#### Q3.4.49: 同一个模型,c++部署和python部署方式,出来的结果不一致,如何定位?
|
||||
**A**:有如下几个Debug经验:
|
||||
1. 优先对一下几个阈值参数是否一致;
|
||||
2. 排查一下c++代码和python代码的预处理和后处理方式是否一致;
|
||||
3. 用python在模型输入输出各保存一下二进制文件,排除inference的差异性
|
||||
|
|
|
@ -45,6 +45,7 @@ PaddleOCR基于动态图开源的文本识别算法列表:
|
|||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5]
|
||||
- [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))
|
||||
- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2))
|
||||
|
||||
参考[DTRB][3](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||
|
||||
|
@ -60,6 +61,6 @@ PaddleOCR基于动态图开源的文本识别算法列表:
|
|||
|RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
|
||||
|SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) |
|
||||
|NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
|
||||
|
||||
|SAR|Resnet31| 87.2% | rec_r31_sar | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|
||||
|
||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
||||
|
|
|
@ -19,15 +19,16 @@
|
|||
|
||||
<a name="11-----"></a>
|
||||
## 1.1 数据准备
|
||||
icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。
|
||||
|
||||
icdar2015 TextLocalization数据集是文本检测的数据集,包含1000张训练图像和500张测试图像。
|
||||
icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。
|
||||
注册完成登陆后,下载下图中红色框标出的部分,其中, `Training Set Images`下载的内容保存为`icdar_c4_train_imgs`文件夹下,`Test Set Images` 下载的内容保存为`ch4_test_images`文件夹下
|
||||
|
||||
<p align="center">
|
||||
<img src="./doc/datasets/ic15_location_download.png" align="middle" width = "600"/>
|
||||
<img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
|
||||
<p align="center">
|
||||
|
||||
将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/ 下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件
|
||||
将下载到的数据集解压到工作目录下,假设解压在 PaddleOCR/train_data/下。另外,PaddleOCR将零散的标注文件整理成单独的标注文件
|
||||
,您可以通过wget的方式进行下载。
|
||||
```shell
|
||||
# 在PaddleOCR路径下
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
# 运行环境准备
|
||||
Windows和Mac用户推荐使用Anaconda搭建Python环境,Linux用户建议使用docker搭建PyThon环境。
|
||||
|
||||
如果对于Python环境熟悉的用户可以直接跳到第2步安装PaddlePaddle。
|
||||
|
||||
* [1. Python环境搭建](#1)
|
||||
+ [1.1 Windows](#1.1)
|
||||
|
@ -63,9 +66,9 @@
|
|||
```
|
||||
|
||||
<img src="../install/windows/conda_list_env.png" alt="create environment" width="600" align="center"/>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
以上anaconda环境和python环境安装完毕
|
||||
|
||||
|
@ -80,9 +83,9 @@
|
|||
- 安装完Anaconda后,可以安装python环境,以及numpy等所需的工具包环境
|
||||
- Anaconda下载:
|
||||
- 地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
|
||||
|
||||
|
||||
<img src="../install/mac/anaconda_start.png" alt="anaconda download" width="800" align="center"/>
|
||||
|
||||
|
||||
- 选择最下方的`Anaconda3-2021.05-MacOSX-x86_64.pkg`下载
|
||||
- 下载完成后,双击.pkg文件进入图形界面
|
||||
- 按默认设置即可,安装需要花费一段时间
|
||||
|
@ -177,7 +180,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
|
|||
- 说明:使用paddlepaddle需要先安装python环境,这里我们选择python集成环境Anaconda工具包
|
||||
- Anaconda是1个常用的python包管理程序
|
||||
- 安装完Anaconda后,可以安装python环境,以及numpy等所需的工具包环境
|
||||
|
||||
|
||||
- **下载Anaconda**:
|
||||
|
||||
- 下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
|
||||
|
@ -185,22 +188,22 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
|
|||
|
||||
- 选择适合您操作系统的版本
|
||||
- 可在终端输入`uname -m`查询系统所用的指令集
|
||||
|
||||
|
||||
- 下载法1:本地下载,再将安装包传到linux服务器上
|
||||
|
||||
|
||||
- 下载法2:直接使用linux命令行下载
|
||||
|
||||
|
||||
```shell
|
||||
# 首先安装wget
|
||||
sudo apt-get install wget # Ubuntu
|
||||
sudo yum install wget # CentOS
|
||||
```
|
||||
|
||||
|
||||
```shell
|
||||
# 然后使用wget从清华源上下载
|
||||
# 如要下载Anaconda3-2021.05-Linux-x86_64.sh,则下载命令如下:
|
||||
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh
|
||||
|
||||
|
||||
# 若您要下载其他版本,需要将最后1个/后的文件名改成您希望下载的版本
|
||||
```
|
||||
|
||||
|
@ -210,7 +213,7 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
|
|||
- 若您下载的是其它版本,则将该命令的文件名替换为您下载的文件名
|
||||
- 按照安装提示安装即可
|
||||
- 查看许可时可输入q来退出
|
||||
|
||||
|
||||
- **将conda加入环境变量**
|
||||
|
||||
- 加入环境变量是为了让系统能识别conda命令,若您在安装时已将conda加入环境变量path,则可跳过本步
|
||||
|
@ -277,13 +280,13 @@ Linux用户可选择Anaconda或Docker两种方式运行。如果你熟悉Docker
|
|||
# 激活paddle_env环境
|
||||
conda activate paddle_env
|
||||
```
|
||||
|
||||
|
||||
|
||||
以上anaconda环境和python环境安装完毕
|
||||
|
||||
#### 1.3.2 Docker环境配置
|
||||
|
||||
**注意:第一次使用这个镜像,会自动下载该镜像,请耐心等待。**
|
||||
**注意:第一次使用这个镜像,会自动下载该镜像,请耐心等待。您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。**
|
||||
|
||||
```bash
|
||||
# 切换到工作目录下
|
||||
|
@ -297,8 +300,6 @@ sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/pad
|
|||
如果使用CUDA10,请运行以下命令创建容器,设置docker容器共享内存shm-size为64G,建议设置32G以上
|
||||
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
|
||||
|
||||
您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
|
||||
|
||||
# ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令
|
||||
sudo docker container exec -it ppocr /bin/bash
|
||||
```
|
||||
|
|
|
@ -39,7 +39,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要
|
|||
|
||||
### 2.1 识别配置文件解析
|
||||
|
||||
配置文件在[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)。
|
||||
配置文件在[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)。
|
||||
|
||||
#### 2.1.1 模型结构
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
# PP-OCR模型与配置文件
|
||||
PP-OCR模型与配置文件一章主要补充一些OCR模型的基本概念、配置文件的内容与作用以便对模型后续的参数调整和训练中拥有更好的体验。
|
||||
|
||||
本节包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。
|
||||
本章包含三个部分,首先在[PP-OCR模型下载](./models_list.md)中解释PP-OCR模型的类型概念,并提供所有模型的下载链接。然后在[配置文件内容与生成](./config.md)中详细说明调整PP-OCR模型所需的参数。最后的[模型库快速使用](./inference_ppocr.md)是对第一节PP-OCR模型库使用方法的介绍,可以通过Python推理引擎快速利用丰富的模型库模型获得测试结果。
|
||||
|
||||
------
|
||||
|
||||
|
|
|
@ -1,8 +1,9 @@
|
|||
## OCR模型列表(V2.0,2021年1月20日更新)
|
||||
## OCR模型列表(V2.1,2021年9月6日更新)
|
||||
|
||||
> **说明**
|
||||
> 1. 2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md) 的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
|
||||
> 2. 本文档提供的是PPOCR自研模型列表,更多基于公开数据集的算法介绍与预训练模型可以参考:[算法概览文档](./algorithm_overview.md)。
|
||||
> 1. 2.1版模型相比2.0版模型,2.1的模型在模型精度上做了提升
|
||||
> 2. 2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md) 的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
|
||||
> 3. 本文档提供的是PPOCR自研模型列表,更多基于公开数据集的算法介绍与预训练模型可以参考:[算法概览文档](./algorithm_overview.md)。
|
||||
|
||||
|
||||
- [一、文本检测模型](#文本检测模型)
|
||||
|
@ -32,8 +33,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.1_det|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)|
|
||||
|ch_ppocr_mobile_v2.1_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)|
|
||||
|ch_PP-OCRv2_det_slim|slim量化+蒸馏版超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|
||||
|ch_PP-OCRv2_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
|
||||
|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| 2.6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)|
|
||||
|ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|
||||
|ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
|
||||
|
@ -47,8 +48,8 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.1_rec|slim量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) |
|
||||
|ch_ppocr_mobile_v2.1_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) |
|
||||
|ch_PP-OCRv2_rec_slim|slim量化版超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|
||||
|ch_PP-OCRv2_rec|原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
|
||||
|ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
|ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
@ -92,12 +93,13 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|
||||
|ch_ppocr_mobile_v2.0_cls|原始分类器模型,对检测到的文本行文字角度分类|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
|
||||
|
||||
|
||||
<a name="Paddle-Lite模型"></a>
|
||||
### 四、Paddle-Lite 模型
|
||||
|
||||
|模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本|
|
||||
|---|---|---|---|---|---|---|
|
||||
|V2.1|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9|
|
||||
|V2.1(slim)|ppocr_v2.1蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9|
|
||||
|PP-OCRv2|蒸馏版超轻量中文OCR移动端模型|11M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9|
|
||||
|PP-OCRv2(slim)|蒸馏版超轻量中文OCR移动端模型|4.9M|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9|
|
||||
|V2.0|ppocr_v2.0超轻量中文OCR移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
|
||||
|V2.0(slim)|ppocr_v2.0超轻量中文OCR移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
|
||||
- [PaddleOCR快速开始](#paddleocr)
|
||||
|
||||
|
||||
+ [1. 安装PaddleOCR whl包](#1)
|
||||
* [2. 便捷使用](#2)
|
||||
+ [2.1 命令行使用](#21)
|
||||
|
@ -90,7 +90,7 @@ cd /path/to/ppocr_img
|
|||
```
|
||||
|
||||
|
||||
更多whl包使用可参考[whl包文档](./whl.md)
|
||||
如需使用2.0模型,请指定参数`--version 2.0`,paddleocr默认使用2.1模型。更多whl包使用可参考[whl包文档](./whl.md)
|
||||
|
||||
|
||||
<a name="212"></a>
|
||||
|
@ -166,7 +166,7 @@ paddleocr --image_dir=./table/1.png --type=structure
|
|||
|
||||
```
|
||||
/output/table/1/
|
||||
└─ res.txt
|
||||
└─ res.txt
|
||||
└─ [454, 360, 824, 658].xlsx 表格识别结果
|
||||
└─ [16, 2, 828, 305].jpg 被裁剪出的图片区域
|
||||
└─ [17, 361, 404, 711].xlsx 表格识别结果
|
||||
|
@ -183,7 +183,7 @@ paddleocr --image_dir=./table/1.png --type=structure
|
|||
|
||||
大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md)
|
||||
|
||||
|
||||
|
||||
|
||||
<a name="22"></a>
|
||||
### 2.2 Python脚本使用
|
||||
|
@ -232,6 +232,7 @@ im_show.save('result.jpg')
|
|||
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
|
||||
</div>
|
||||
<a name="222"></a>
|
||||
|
||||
#### 2.2.2 版面分析
|
||||
|
||||
```python
|
||||
|
|
|
@ -7,15 +7,13 @@
|
|||
- [1.2 数据下载](#数据下载)
|
||||
- [1.3 字典](#字典)
|
||||
- [1.4 支持空格](#支持空格)
|
||||
|
||||
- [2 启动训练](#启动训练)
|
||||
- [2.1 数据增强](#数据增强)
|
||||
- [2.2 通用模型训练](#通用模型训练)
|
||||
- [2.3 多语言模型训练](#多语言模型训练)
|
||||
|
||||
- [3 评估](#评估)
|
||||
|
||||
- [4 预测](#预测)
|
||||
- [5 转Inference模型测试](#Inference)
|
||||
|
||||
|
||||
<a name="数据准备"></a>
|
||||
|
@ -88,7 +86,10 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
|
|||
|
||||
若您本地没有数据集,可以在官网下载 [ICDAR2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,下载 benchmark 所需的lmdb格式数据集。
|
||||
|
||||
如果希望复现SAR的论文指标,需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码:627x。此外,真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。
|
||||
|
||||
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:
|
||||
|
||||
```
|
||||
# 训练集标签
|
||||
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
|
||||
|
@ -232,6 +233,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
|
|||
| rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att |
|
||||
| rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |
|
||||
| rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder |
|
||||
| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
|
||||
|
||||
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
|
||||
|
||||
|
@ -424,3 +426,39 @@ python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v
|
|||
infer_img: doc/imgs_words/ch/word_1.jpg
|
||||
result: ('韩国小馆', 0.997218)
|
||||
```
|
||||
|
||||
<a name="Inference"></a>
|
||||
|
||||
## 5. 转Inference模型测试
|
||||
|
||||
识别模型转inference模型与检测的方式相同,如下:
|
||||
|
||||
```
|
||||
# -c 后面设置训练算法的yml配置文件
|
||||
# -o 配置可选参数
|
||||
# Global.pretrained_model 参数设置待转换的训练模型地址,不用添加文件后缀 .pdmodel,.pdopt或.pdparams。
|
||||
# Global.save_inference_dir参数设置转换的模型将保存的地址。
|
||||
|
||||
python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy Global.save_inference_dir=./inference/rec_crnn/
|
||||
```
|
||||
|
||||
**注意:**如果您是在自己的数据集上训练的模型,并且调整了中文字符的字典文件,请注意修改配置文件中的`character_dict_path`是否是所需要的字典文件。
|
||||
|
||||
转换成功后,在目录下有三个文件:
|
||||
|
||||
```
|
||||
/inference/rec_crnn/
|
||||
├── inference.pdiparams # 识别inference模型的参数文件
|
||||
├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略
|
||||
└── inference.pdmodel # 识别inference模型的program文件
|
||||
```
|
||||
|
||||
- 自定义模型推理
|
||||
|
||||
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,8 @@
|
|||
# 更新
|
||||
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
|
||||
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
|
||||
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
|
||||
- 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
|
||||
- 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。
|
||||
- 2020.11.25 更新半自动标注工具[PPOCRLabel](../../PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
|
||||
|
|
|
@ -1,7 +1,13 @@
|
|||
# 效果展示
|
||||
|
||||
<a name="超轻量PP-OCRv2效果展示"></a>
|
||||
## 超轻量PP-OCRv2效果展示
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic002.jpg" width="800">
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg" width="800">
|
||||
|
||||
<a name="超轻量ppocr_server_2.0效果展示"></a>
|
||||
## 通用ppocr_server_2.0 效果展示
|
||||
## 通用PP-OCR server 效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
|
||||
|
@ -10,8 +16,6 @@
|
|||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
|
||||
</div>
|
||||
|
|
|
@ -47,6 +47,7 @@ PaddleOCR open-source text recognition algorithms list:
|
|||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5]
|
||||
- [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))
|
||||
- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2))
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|
@ -62,5 +63,6 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|
|||
|RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
|
||||
|SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
|
||||
|NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
|
||||
|SAR|Resnet31| 87.2% | rec_r31_sar | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|
||||
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
|
||||
|
|
|
@ -18,13 +18,14 @@
|
|||
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
|
||||
|
||||
## 1.1 DATA PREPARATION
|
||||
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
|
||||
|
||||
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
|
||||
|
||||
|
||||
After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`
|
||||
|
||||
<p align="center">
|
||||
<img src="./doc/datasets/ic15_location_download.png" align="middle" width = "600"/>
|
||||
<img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
|
||||
<p align="center">
|
||||
|
||||
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
|
||||
|
|
|
@ -1,22 +1,22 @@
|
|||
|
||||
# Reasoning based on Python prediction engine
|
||||
# Python Inference for PP-OCR Model Library
|
||||
|
||||
This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU.
|
||||
|
||||
|
||||
- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
|
||||
- [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE)
|
||||
|
||||
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
- [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
|
||||
- [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE)
|
||||
|
||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
- [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
|
||||
- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
|
||||
- [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION)
|
||||
|
||||
<a name="DETECTION_MODEL_INFERENCE"></a>
|
||||
|
||||
## TEXT DETECTION MODEL INFERENCE
|
||||
## Text Detection Model Inference
|
||||
|
||||
The default configuration is based on the inference setting of the DB text detection model. For lightweight Chinese detection model inference, you can execute the following commands:
|
||||
|
||||
|
@ -52,11 +52,11 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_di
|
|||
|
||||
<a name="RECOGNITION_MODEL_INFERENCE"></a>
|
||||
|
||||
## TEXT RECOGNITION MODEL INFERENCE
|
||||
## Text Recognition Model Inference
|
||||
|
||||
|
||||
<a name="LIGHTWEIGHT_RECOGNITION"></a>
|
||||
### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
|
||||
### 1. Lightweight Chinese Recognition Model Inference
|
||||
|
||||
For lightweight Chinese recognition model inference, you can execute the following commands:
|
||||
|
||||
|
@ -77,7 +77,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
|
|||
|
||||
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
||||
|
||||
### 2. MULTILINGAUL MODEL INFERENCE
|
||||
### 2. Multilingaul Model Inference
|
||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
|
||||
|
||||
|
@ -94,7 +94,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
|
|||
|
||||
<a name="ANGLE_CLASS_MODEL_INFERENCE"></a>
|
||||
|
||||
## ANGLE CLASSIFICATION MODEL INFERENCE
|
||||
## Angle Classification Model Inference
|
||||
|
||||
For angle classification model inference, you can execute the following commands:
|
||||
|
||||
|
@ -114,7 +114,7 @@ After executing the command, the prediction results (classification angle and sc
|
|||
```
|
||||
|
||||
<a name="CONCATENATION"></a>
|
||||
## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
|
||||
## Text Detection Angle Classification and Recognition Inference Concatenation
|
||||
|
||||
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
|
||||
|
||||
|
|
|
@ -1,4 +1,12 @@
|
|||
# PP-OCR Model and Configuration
|
||||
The chapter on PP-OCR model and configuration file mainly adds some basic concepts of OCR model and the content and role of configuration file to have a better experience in the subsequent parameter adjustment and training of the model.
|
||||
|
||||
This chapter contains three parts. Firstly, [PP-OCR Model Download](. /models_list_en.md) explains the concept of PP-OCR model types and provides links to download all models. Then in [Yml Configuration](. /config_en.md) details the parameters needed to fine-tune the PP-OCR models. The final [Python Inference for PP-OCR Model Library](. /inference_ppocr_en.md) is an introduction to the use of the PP-OCR model library in the first section, which can quickly utilize the rich model library models to obtain test results through the Python inference engine.
|
||||
|
||||
------
|
||||
|
||||
Let's first understand some basic concepts.
|
||||
|
||||
- [INTRODUCTION ABOUT OCR](#introduction-about-ocr)
|
||||
* [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model)
|
||||
* [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
## OCR model list(V2.0, updated on 2021.1.20)
|
||||
## OCR model list(V2.1, updated on 2021.9.6)
|
||||
> **Note**
|
||||
> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
|
||||
> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
|
||||
> 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed.
|
||||
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
|
||||
> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
|
||||
|
||||
- [1. Text Detection Model](#Detection)
|
||||
- [2. Text Recognition Model](#Recognition)
|
||||
|
@ -28,8 +29,8 @@ Relationship of the above models is as follows.
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.1_det|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)|
|
||||
|ch_ppocr_mobile_v2.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)|
|
||||
|ch_PP-OCRv2_det_slim|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|
||||
|ch_PP-OCRv2_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
|
||||
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)|
|
||||
|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|
||||
|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
|
||||
|
@ -42,8 +43,8 @@ Relationship of the above models is as follows.
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.1_rec|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) |
|
||||
|ch_ppocr_mobile_v2.1_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) |
|
||||
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|
||||
|ch_PP-OCRv2_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
|
||||
|ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
@ -92,7 +93,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l
|
|||
### 4. Paddle-Lite Model
|
||||
|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
|
||||
|---|---|---|---|---|---|---|
|
||||
|V2.1|ppocr_v2.1 extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9|
|
||||
|V2.1(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9|
|
||||
|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9|
|
||||
|PP-OCRv2(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_opt.nb)|v2.9|
|
||||
|V2.0|ppocr_v2.0 extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
|
||||
|V2.0(slim)|ppovr_v2.0 extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
|
||||
|
|
|
@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag
|
|||
['PAIN', 0.990372]
|
||||
```
|
||||
|
||||
More whl package usage can be found in [whl package](./whl_en.md)
|
||||
If you need to use the 2.0 model, please specify the parameter `--version 2.0`, paddleocr uses the 2.1 model by default. More whl package usage can be found in [whl package](./whl_en.md)
|
||||
<a name="212-multi-language-model"></a>
|
||||
|
||||
#### 2.1.2 Multi-language Model
|
||||
|
|
|
@ -15,6 +15,7 @@
|
|||
|
||||
- [4 PREDICTION](#PREDICTION)
|
||||
- [4.1 Training engine prediction](#Training_engine_prediction)
|
||||
- [5 CONVERT TO INFERENCE MODEL](#Inference)
|
||||
|
||||
<a name="DATA_PREPARATION"></a>
|
||||
## 1 DATA PREPARATION
|
||||
|
@ -91,6 +92,8 @@ Similar to the training set, the test set also needs to be provided a folder con
|
|||
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads).
|
||||
Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
|
||||
|
||||
If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
|
||||
|
||||
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
|
||||
|
||||
```
|
||||
|
@ -235,6 +238,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
|
|||
| rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att |
|
||||
| rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |
|
||||
| rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder |
|
||||
| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
|
||||
|
||||
|
||||
For training Chinese data, it is recommended to use
|
||||
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
|
||||
|
@ -361,6 +366,7 @@ Eval:
|
|||
```
|
||||
|
||||
<a name="EVALUATION"></a>
|
||||
|
||||
## 3 EVALUATION
|
||||
|
||||
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
|
||||
|
@ -432,3 +438,40 @@ Get the prediction result of the input image:
|
|||
infer_img: doc/imgs_words/ch/word_1.jpg
|
||||
result: ('韩国小馆', 0.997218)
|
||||
```
|
||||
|
||||
<a name="Inference"></a>
|
||||
|
||||
## 5 CONVERT TO INFERENCE MODEL
|
||||
|
||||
The recognition model is converted to the inference model in the same way as the detection, as follows:
|
||||
|
||||
```
|
||||
# -c Set the training algorithm yml configuration file
|
||||
# -o Set optional parameters
|
||||
# Global.pretrained_model parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
|
||||
# Global.save_inference_dir Set the address where the converted model will be saved.
|
||||
|
||||
python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy Global.save_inference_dir=./inference/rec_crnn/
|
||||
```
|
||||
|
||||
If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path.
|
||||
|
||||
After the conversion is successful, there are three files in the model save directory:
|
||||
|
||||
```
|
||||
inference/det_db/
|
||||
├── inference.pdiparams # The parameter file of recognition inference model
|
||||
├── inference.pdiparams.info # The parameter information of recognition inference model, which can be ignored
|
||||
└── inference.pdmodel # The program file of recognition model
|
||||
```
|
||||
|
||||
- Text recognition model Inference using custom characters dictionary
|
||||
|
||||
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,9 @@
|
|||
# RECENT UPDATES
|
||||
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
|
||||
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
|
||||
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
|
||||
|
||||
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
|
||||
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
|
||||
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
|
||||
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
|
||||
|
|
|
@ -1,5 +1,10 @@
|
|||
# Visualization
|
||||
|
||||
<a name="PP-OCRv2"></a>
|
||||
## PP-OCRv2
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic002.jpg" width="800">
|
||||
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg" width="800">
|
||||
|
||||
<a name="ppocr_server_2.0"></a>
|
||||
## ch_ppocr_server_2.0
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 192 KiB |
Binary file not shown.
After Width: | Height: | Size: 94 KiB |
Binary file not shown.
After Width: | Height: | Size: 246 KiB |
331
paddleocr.py
331
paddleocr.py
|
@ -33,104 +33,141 @@ from tools.infer.utility import draw_ocr, str2bool
|
|||
from ppstructure.utility import init_args, draw_structure_result
|
||||
from ppstructure.predict_system import OCRSystem, save_structure_res
|
||||
|
||||
__all__ = ['PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result', 'save_structure_res','download_with_progressbar']
|
||||
|
||||
model_urls = {
|
||||
'det': {
|
||||
'ch':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar',
|
||||
'en':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar',
|
||||
'structure': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar'
|
||||
},
|
||||
'rec': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
|
||||
},
|
||||
'en': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/en_dict.txt'
|
||||
},
|
||||
'french': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/french_dict.txt'
|
||||
},
|
||||
'german': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/german_dict.txt'
|
||||
},
|
||||
'korean': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/korean_dict.txt'
|
||||
},
|
||||
'japan': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/japan_dict.txt'
|
||||
},
|
||||
'chinese_cht': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'
|
||||
},
|
||||
'ta': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/ta_dict.txt'
|
||||
},
|
||||
'te': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/te_dict.txt'
|
||||
},
|
||||
'ka': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/ka_dict.txt'
|
||||
},
|
||||
'latin': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/latin_dict.txt'
|
||||
},
|
||||
'arabic': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/arabic_dict.txt'
|
||||
},
|
||||
'cyrillic': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/cyrillic_dict.txt'
|
||||
},
|
||||
'devanagari': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/devanagari_dict.txt'
|
||||
},
|
||||
'structure': {
|
||||
'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar',
|
||||
'dict_path': 'ppocr/utils/dict/table_dict.txt'
|
||||
}
|
||||
},
|
||||
'cls': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
|
||||
'table': {
|
||||
'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar',
|
||||
'dict_path': 'ppocr/utils/dict/table_structure_dict.txt'
|
||||
}
|
||||
}
|
||||
__all__ = [
|
||||
'PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result',
|
||||
'save_structure_res', 'download_with_progressbar'
|
||||
]
|
||||
|
||||
SUPPORT_DET_MODEL = ['DB']
|
||||
VERSION = '2.2.0.1'
|
||||
VERSION = '2.2.1'
|
||||
SUPPORT_REC_MODEL = ['CRNN']
|
||||
BASE_DIR = os.path.expanduser("~/.paddleocr/")
|
||||
|
||||
DEFAULT_MODEL_VERSION = '2.0'
|
||||
MODEL_URLS = {
|
||||
'2.1': {
|
||||
'det': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar',
|
||||
},
|
||||
},
|
||||
'rec': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
|
||||
}
|
||||
}
|
||||
},
|
||||
'2.0': {
|
||||
'det': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar',
|
||||
},
|
||||
'en': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar',
|
||||
},
|
||||
'structure': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar'
|
||||
}
|
||||
},
|
||||
'rec': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
|
||||
},
|
||||
'en': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/en_dict.txt'
|
||||
},
|
||||
'french': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/french_dict.txt'
|
||||
},
|
||||
'german': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/german_dict.txt'
|
||||
},
|
||||
'korean': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/korean_dict.txt'
|
||||
},
|
||||
'japan': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/japan_dict.txt'
|
||||
},
|
||||
'chinese_cht': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'
|
||||
},
|
||||
'ta': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/ta_dict.txt'
|
||||
},
|
||||
'te': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/te_dict.txt'
|
||||
},
|
||||
'ka': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/ka_dict.txt'
|
||||
},
|
||||
'latin': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/latin_dict.txt'
|
||||
},
|
||||
'arabic': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/arabic_dict.txt'
|
||||
},
|
||||
'cyrillic': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/cyrillic_dict.txt'
|
||||
},
|
||||
'devanagari': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar',
|
||||
'dict_path': './ppocr/utils/dict/devanagari_dict.txt'
|
||||
},
|
||||
'structure': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar',
|
||||
'dict_path': 'ppocr/utils/dict/table_dict.txt'
|
||||
}
|
||||
},
|
||||
'cls': {
|
||||
'ch': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
|
||||
}
|
||||
},
|
||||
'table': {
|
||||
'en': {
|
||||
'url':
|
||||
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar',
|
||||
'dict_path': 'ppocr/utils/dict/table_structure_dict.txt'
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def parse_args(mMain=True):
|
||||
import argparse
|
||||
|
@ -140,6 +177,7 @@ def parse_args(mMain=True):
|
|||
parser.add_argument("--det", type=str2bool, default=True)
|
||||
parser.add_argument("--rec", type=str2bool, default=True)
|
||||
parser.add_argument("--type", type=str, default='ocr')
|
||||
parser.add_argument("--version", type=str, default='2.1')
|
||||
|
||||
for action in parser._actions:
|
||||
if action.dest in ['rec_char_dict_path', 'table_char_dict_path']:
|
||||
|
@ -155,19 +193,19 @@ def parse_args(mMain=True):
|
|||
|
||||
def parse_lang(lang):
|
||||
latin_lang = [
|
||||
'af', 'az', 'bs', 'cs', 'cy', 'da', 'de', 'es', 'et', 'fr', 'ga',
|
||||
'hr', 'hu', 'id', 'is', 'it', 'ku', 'la', 'lt', 'lv', 'mi', 'ms',
|
||||
'mt', 'nl', 'no', 'oc', 'pi', 'pl', 'pt', 'ro', 'rs_latin', 'sk',
|
||||
'sl', 'sq', 'sv', 'sw', 'tl', 'tr', 'uz', 'vi'
|
||||
'af', 'az', 'bs', 'cs', 'cy', 'da', 'de', 'es', 'et', 'fr', 'ga', 'hr',
|
||||
'hu', 'id', 'is', 'it', 'ku', 'la', 'lt', 'lv', 'mi', 'ms', 'mt', 'nl',
|
||||
'no', 'oc', 'pi', 'pl', 'pt', 'ro', 'rs_latin', 'sk', 'sl', 'sq', 'sv',
|
||||
'sw', 'tl', 'tr', 'uz', 'vi'
|
||||
]
|
||||
arabic_lang = ['ar', 'fa', 'ug', 'ur']
|
||||
cyrillic_lang = [
|
||||
'ru', 'rs_cyrillic', 'be', 'bg', 'uk', 'mn', 'abq', 'ady', 'kbd',
|
||||
'ava', 'dar', 'inh', 'che', 'lbe', 'lez', 'tab'
|
||||
'ru', 'rs_cyrillic', 'be', 'bg', 'uk', 'mn', 'abq', 'ady', 'kbd', 'ava',
|
||||
'dar', 'inh', 'che', 'lbe', 'lez', 'tab'
|
||||
]
|
||||
devanagari_lang = [
|
||||
'hi', 'mr', 'ne', 'bh', 'mai', 'ang', 'bho', 'mah', 'sck', 'new',
|
||||
'gom', 'sa', 'bgc'
|
||||
'hi', 'mr', 'ne', 'bh', 'mai', 'ang', 'bho', 'mah', 'sck', 'new', 'gom',
|
||||
'sa', 'bgc'
|
||||
]
|
||||
if lang in latin_lang:
|
||||
lang = "latin"
|
||||
|
@ -177,9 +215,9 @@ def parse_lang(lang):
|
|||
lang = "cyrillic"
|
||||
elif lang in devanagari_lang:
|
||||
lang = "devanagari"
|
||||
assert lang in model_urls[
|
||||
assert lang in MODEL_URLS[DEFAULT_MODEL_VERSION][
|
||||
'rec'], 'param lang must in {}, but got {}'.format(
|
||||
model_urls['rec'].keys(), lang)
|
||||
MODEL_URLS[DEFAULT_MODEL_VERSION]['rec'].keys(), lang)
|
||||
if lang == "ch":
|
||||
det_lang = "ch"
|
||||
elif lang == 'structure':
|
||||
|
@ -189,6 +227,35 @@ def parse_lang(lang):
|
|||
return lang, det_lang
|
||||
|
||||
|
||||
def get_model_config(version, model_type, lang):
|
||||
if version not in MODEL_URLS:
|
||||
logger.warning('version {} not in {}, use version {} instead'.format(
|
||||
version, MODEL_URLS.keys(), DEFAULT_MODEL_VERSION))
|
||||
version = DEFAULT_MODEL_VERSION
|
||||
if model_type not in MODEL_URLS[version]:
|
||||
if model_type in MODEL_URLS[DEFAULT_MODEL_VERSION]:
|
||||
logger.warning(
|
||||
'version {} not support {} models, use version {} instead'.
|
||||
format(version, model_type, DEFAULT_MODEL_VERSION))
|
||||
version = DEFAULT_MODEL_VERSION
|
||||
else:
|
||||
logger.error('{} models is not support, we only support {}'.format(
|
||||
model_type, MODEL_URLS[DEFAULT_MODEL_VERSION].keys()))
|
||||
sys.exit(-1)
|
||||
if lang not in MODEL_URLS[version][model_type]:
|
||||
if lang in MODEL_URLS[DEFAULT_MODEL_VERSION][model_type]:
|
||||
logger.warning('lang {} is not support in {}, use {} instead'.
|
||||
format(lang, version, DEFAULT_MODEL_VERSION))
|
||||
version = DEFAULT_MODEL_VERSION
|
||||
else:
|
||||
logger.error(
|
||||
'lang {} is not support, we only support {} for {} models'.
|
||||
format(lang, MODEL_URLS[DEFAULT_MODEL_VERSION][model_type].keys(
|
||||
), model_type))
|
||||
sys.exit(-1)
|
||||
return MODEL_URLS[version][model_type][lang]
|
||||
|
||||
|
||||
class PaddleOCR(predict_system.TextSystem):
|
||||
def __init__(self, **kwargs):
|
||||
"""
|
||||
|
@ -204,15 +271,21 @@ class PaddleOCR(predict_system.TextSystem):
|
|||
lang, det_lang = parse_lang(params.lang)
|
||||
|
||||
# init model dir
|
||||
params.det_model_dir, det_url = confirm_model_dir_url(params.det_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
|
||||
model_urls['det'][det_lang])
|
||||
params.rec_model_dir, rec_url = confirm_model_dir_url(params.rec_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
|
||||
model_urls['rec'][lang]['url'])
|
||||
params.cls_model_dir, cls_url = confirm_model_dir_url(params.cls_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'cls'),
|
||||
model_urls['cls'])
|
||||
det_model_config = get_model_config(params.version, 'det', det_lang)
|
||||
params.det_model_dir, det_url = confirm_model_dir_url(
|
||||
params.det_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
|
||||
det_model_config['url'])
|
||||
rec_model_config = get_model_config(params.version, 'rec', lang)
|
||||
params.rec_model_dir, rec_url = confirm_model_dir_url(
|
||||
params.rec_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
|
||||
rec_model_config['url'])
|
||||
cls_model_config = get_model_config(params.version, 'cls', 'ch')
|
||||
params.cls_model_dir, cls_url = confirm_model_dir_url(
|
||||
params.cls_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'cls'),
|
||||
cls_model_config['url'])
|
||||
# download model
|
||||
maybe_download(params.det_model_dir, det_url)
|
||||
maybe_download(params.rec_model_dir, rec_url)
|
||||
|
@ -226,7 +299,8 @@ class PaddleOCR(predict_system.TextSystem):
|
|||
sys.exit(0)
|
||||
|
||||
if params.rec_char_dict_path is None:
|
||||
params.rec_char_dict_path = str(Path(__file__).parent / model_urls['rec'][lang]['dict_path'])
|
||||
params.rec_char_dict_path = str(
|
||||
Path(__file__).parent / rec_model_config['dict_path'])
|
||||
|
||||
print(params)
|
||||
# init det_model and rec_model
|
||||
|
@ -293,24 +367,32 @@ class PPStructure(OCRSystem):
|
|||
lang, det_lang = parse_lang(params.lang)
|
||||
|
||||
# init model dir
|
||||
params.det_model_dir, det_url = confirm_model_dir_url(params.det_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
|
||||
model_urls['det'][det_lang])
|
||||
params.rec_model_dir, rec_url = confirm_model_dir_url(params.rec_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
|
||||
model_urls['rec'][lang]['url'])
|
||||
params.table_model_dir, table_url = confirm_model_dir_url(params.table_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'table'),
|
||||
model_urls['table']['url'])
|
||||
det_model_config = get_model_config(params.version, 'det', det_lang)
|
||||
params.det_model_dir, det_url = confirm_model_dir_url(
|
||||
params.det_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
|
||||
det_model_config['url'])
|
||||
rec_model_config = get_model_config(params.version, 'rec', lang)
|
||||
params.rec_model_dir, rec_url = confirm_model_dir_url(
|
||||
params.rec_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
|
||||
rec_model_config['url'])
|
||||
table_model_config = get_model_config(params.version, 'table', 'en')
|
||||
params.table_model_dir, table_url = confirm_model_dir_url(
|
||||
params.table_model_dir,
|
||||
os.path.join(BASE_DIR, VERSION, 'ocr', 'table'),
|
||||
table_model_config['url'])
|
||||
# download model
|
||||
maybe_download(params.det_model_dir, det_url)
|
||||
maybe_download(params.rec_model_dir, rec_url)
|
||||
maybe_download(params.table_model_dir, table_url)
|
||||
|
||||
if params.rec_char_dict_path is None:
|
||||
params.rec_char_dict_path = str(Path(__file__).parent / model_urls['rec'][lang]['dict_path'])
|
||||
params.rec_char_dict_path = str(
|
||||
Path(__file__).parent / rec_model_config['dict_path'])
|
||||
if params.table_char_dict_path is None:
|
||||
params.table_char_dict_path = str(Path(__file__).parent / model_urls['table']['dict_path'])
|
||||
params.table_char_dict_path = str(
|
||||
Path(__file__).parent / table_model_config['dict_path'])
|
||||
|
||||
print(params)
|
||||
super().__init__(params)
|
||||
|
@ -374,4 +456,3 @@ def main():
|
|||
for item in result:
|
||||
item.pop('img')
|
||||
logger.info(item)
|
||||
|
||||
|
|
|
@ -21,7 +21,7 @@ from .make_border_map import MakeBorderMap
|
|||
from .make_shrink_map import MakeShrinkMap
|
||||
from .random_crop_data import EastRandomCropData, PSERandomCrop
|
||||
|
||||
from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg
|
||||
from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg
|
||||
from .randaugment import RandAugment
|
||||
from .copy_paste import CopyPaste
|
||||
from .operators import *
|
||||
|
|
|
@ -549,3 +549,49 @@ class TableLabelEncode(object):
|
|||
assert False, "Unsupport type %s in char_or_elem" \
|
||||
% char_or_elem
|
||||
return idx
|
||||
|
||||
|
||||
class SARLabelEncode(BaseRecLabelEncode):
|
||||
""" Convert between text-label and text-index """
|
||||
|
||||
def __init__(self,
|
||||
max_text_length,
|
||||
character_dict_path=None,
|
||||
character_type='ch',
|
||||
use_space_char=False,
|
||||
**kwargs):
|
||||
super(SARLabelEncode,
|
||||
self).__init__(max_text_length, character_dict_path,
|
||||
character_type, use_space_char)
|
||||
|
||||
def add_special_char(self, dict_character):
|
||||
beg_end_str = "<BOS/EOS>"
|
||||
unknown_str = "<UKN>"
|
||||
padding_str = "<PAD>"
|
||||
dict_character = dict_character + [unknown_str]
|
||||
self.unknown_idx = len(dict_character) - 1
|
||||
dict_character = dict_character + [beg_end_str]
|
||||
self.start_idx = len(dict_character) - 1
|
||||
self.end_idx = len(dict_character) - 1
|
||||
dict_character = dict_character + [padding_str]
|
||||
self.padding_idx = len(dict_character) - 1
|
||||
|
||||
return dict_character
|
||||
|
||||
def __call__(self, data):
|
||||
text = data['label']
|
||||
text = self.encode(text)
|
||||
if text is None:
|
||||
return None
|
||||
if len(text) >= self.max_text_len - 1:
|
||||
return None
|
||||
data['length'] = np.array(len(text))
|
||||
target = [self.start_idx] + text + [self.end_idx]
|
||||
padded_text = [self.padding_idx for _ in range(self.max_text_len)]
|
||||
|
||||
padded_text[:len(target)] = target
|
||||
data['label'] = np.array(padded_text)
|
||||
return data
|
||||
|
||||
def get_ignored_tokens(self):
|
||||
return [self.padding_idx]
|
||||
|
|
|
@ -113,7 +113,7 @@ class NormalizeImage(object):
|
|||
assert isinstance(img,
|
||||
np.ndarray), "invalid input 'img' in NormalizeImage"
|
||||
data['image'] = (
|
||||
img.astype('float32') * self.scale - self.mean) / self.std
|
||||
img.astype('float32') * self.scale - self.mean) / self.std
|
||||
return data
|
||||
|
||||
|
||||
|
@ -144,6 +144,34 @@ class KeepKeys(object):
|
|||
return data_list
|
||||
|
||||
|
||||
class Resize(object):
|
||||
def __init__(self, size=(640, 640), **kwargs):
|
||||
self.size = size
|
||||
|
||||
def resize_image(self, img):
|
||||
resize_h, resize_w = self.size
|
||||
ori_h, ori_w = img.shape[:2] # (h, w, c)
|
||||
ratio_h = float(resize_h) / ori_h
|
||||
ratio_w = float(resize_w) / ori_w
|
||||
img = cv2.resize(img, (int(resize_w), int(resize_h)))
|
||||
return img, [ratio_h, ratio_w]
|
||||
|
||||
def __call__(self, data):
|
||||
img = data['image']
|
||||
text_polys = data['polys']
|
||||
|
||||
img_resize, [ratio_h, ratio_w] = self.resize_image(img)
|
||||
new_boxes = []
|
||||
for box in text_polys:
|
||||
new_box = []
|
||||
for cord in box:
|
||||
new_box.append([cord[0] * ratio_w, cord[1] * ratio_h])
|
||||
new_boxes.append(new_box)
|
||||
data['image'] = img_resize
|
||||
data['polys'] = np.array(new_boxes, dtype=np.float32)
|
||||
return data
|
||||
|
||||
|
||||
class DetResizeForTest(object):
|
||||
def __init__(self, **kwargs):
|
||||
super(DetResizeForTest, self).__init__()
|
||||
|
@ -215,7 +243,7 @@ class DetResizeForTest(object):
|
|||
else:
|
||||
ratio = 1.
|
||||
elif self.limit_type == 'resize_long':
|
||||
ratio = float(limit_side_len) / max(h,w)
|
||||
ratio = float(limit_side_len) / max(h, w)
|
||||
else:
|
||||
raise Exception('not support limit type, image ')
|
||||
resize_h = int(h * ratio)
|
||||
|
|
|
@ -102,6 +102,56 @@ class SRNRecResizeImg(object):
|
|||
return data
|
||||
|
||||
|
||||
class SARRecResizeImg(object):
|
||||
def __init__(self, image_shape, width_downsample_ratio=0.25, **kwargs):
|
||||
self.image_shape = image_shape
|
||||
self.width_downsample_ratio = width_downsample_ratio
|
||||
|
||||
def __call__(self, data):
|
||||
img = data['image']
|
||||
norm_img, resize_shape, pad_shape, valid_ratio = resize_norm_img_sar(img, self.image_shape, self.width_downsample_ratio)
|
||||
data['image'] = norm_img
|
||||
data['resized_shape'] = resize_shape
|
||||
data['pad_shape'] = pad_shape
|
||||
data['valid_ratio'] = valid_ratio
|
||||
return data
|
||||
|
||||
|
||||
def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25):
|
||||
imgC, imgH, imgW_min, imgW_max = image_shape
|
||||
h = img.shape[0]
|
||||
w = img.shape[1]
|
||||
valid_ratio = 1.0
|
||||
# make sure new_width is an integral multiple of width_divisor.
|
||||
width_divisor = int(1 / width_downsample_ratio)
|
||||
# resize
|
||||
ratio = w / float(h)
|
||||
resize_w = math.ceil(imgH * ratio)
|
||||
if resize_w % width_divisor != 0:
|
||||
resize_w = round(resize_w / width_divisor) * width_divisor
|
||||
if imgW_min is not None:
|
||||
resize_w = max(imgW_min, resize_w)
|
||||
if imgW_max is not None:
|
||||
valid_ratio = min(1.0, 1.0 * resize_w / imgW_max)
|
||||
resize_w = min(imgW_max, resize_w)
|
||||
resized_image = cv2.resize(img, (resize_w, imgH))
|
||||
resized_image = resized_image.astype('float32')
|
||||
# norm
|
||||
if image_shape[0] == 1:
|
||||
resized_image = resized_image / 255
|
||||
resized_image = resized_image[np.newaxis, :]
|
||||
else:
|
||||
resized_image = resized_image.transpose((2, 0, 1)) / 255
|
||||
resized_image -= 0.5
|
||||
resized_image /= 0.5
|
||||
resize_shape = resized_image.shape
|
||||
padding_im = -1.0 * np.ones((imgC, imgH, imgW_max), dtype=np.float32)
|
||||
padding_im[:, :, 0:resize_w] = resized_image
|
||||
pad_shape = padding_im.shape
|
||||
|
||||
return padding_im, resize_shape, pad_shape, valid_ratio
|
||||
|
||||
|
||||
def resize_norm_img(img, image_shape):
|
||||
imgC, imgH, imgW = image_shape
|
||||
h = img.shape[0]
|
||||
|
|
|
@ -26,6 +26,7 @@ from .rec_ctc_loss import CTCLoss
|
|||
from .rec_att_loss import AttentionLoss
|
||||
from .rec_srn_loss import SRNLoss
|
||||
from .rec_nrtr_loss import NRTRLoss
|
||||
from .rec_sar_loss import SARLoss
|
||||
# cls loss
|
||||
from .cls_loss import ClsLoss
|
||||
|
||||
|
@ -44,7 +45,7 @@ from .table_att_loss import TableAttentionLoss
|
|||
def build_loss(config):
|
||||
support_dict = [
|
||||
'DBLoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss', 'AttentionLoss',
|
||||
'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss'
|
||||
'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss', 'SARLoss'
|
||||
]
|
||||
|
||||
config = copy.deepcopy(config)
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import paddle
|
||||
from paddle import nn
|
||||
|
||||
|
||||
class SARLoss(nn.Layer):
|
||||
def __init__(self, **kwargs):
|
||||
super(SARLoss, self).__init__()
|
||||
self.loss_func = paddle.nn.loss.CrossEntropyLoss(reduction="mean", ignore_index=96)
|
||||
|
||||
def forward(self, predicts, batch):
|
||||
predict = predicts[:, :-1, :] # ignore last index of outputs to be in same seq_len with targets
|
||||
label = batch[1].astype("int64")[:, 1:] # ignore first index of target in loss calculation
|
||||
batch_size, num_steps, num_classes = predict.shape[0], predict.shape[
|
||||
1], predict.shape[2]
|
||||
assert len(label.shape) == len(list(predict.shape)) - 1, \
|
||||
"The target's shape and inputs's shape is [N, d] and [N, num_steps]"
|
||||
|
||||
inputs = paddle.reshape(predict, [-1, num_classes])
|
||||
targets = paddle.reshape(label, [-1])
|
||||
loss = self.loss_func(inputs, targets)
|
||||
return {'loss': loss}
|
|
@ -27,8 +27,9 @@ def build_backbone(config, model_type):
|
|||
from .rec_resnet_fpn import ResNetFPN
|
||||
from .rec_mv1_enhance import MobileNetV1Enhance
|
||||
from .rec_nrtr_mtb import MTB
|
||||
from .rec_resnet_31 import ResNet31
|
||||
support_dict = [
|
||||
'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB'
|
||||
'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB', "ResNet31"
|
||||
]
|
||||
elif model_type == "e2e":
|
||||
from .e2e_resnet_vd_pg import ResNet
|
||||
|
|
|
@ -0,0 +1,176 @@
|
|||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import paddle
|
||||
from paddle import ParamAttr
|
||||
import paddle.nn as nn
|
||||
import paddle.nn.functional as F
|
||||
import numpy as np
|
||||
|
||||
__all__ = ["ResNet31"]
|
||||
|
||||
|
||||
def conv3x3(in_channel, out_channel, stride=1):
|
||||
return nn.Conv2D(
|
||||
in_channel,
|
||||
out_channel,
|
||||
kernel_size=3,
|
||||
stride=stride,
|
||||
padding=1,
|
||||
bias_attr=False
|
||||
)
|
||||
|
||||
|
||||
class BasicBlock(nn.Layer):
|
||||
expansion = 1
|
||||
def __init__(self, in_channels, channels, stride=1, downsample=False):
|
||||
super().__init__()
|
||||
self.conv1 = conv3x3(in_channels, channels, stride)
|
||||
self.bn1 = nn.BatchNorm2D(channels)
|
||||
self.relu = nn.ReLU()
|
||||
self.conv2 = conv3x3(channels, channels)
|
||||
self.bn2 = nn.BatchNorm2D(channels)
|
||||
self.downsample = downsample
|
||||
if downsample:
|
||||
self.downsample = nn.Sequential(
|
||||
nn.Conv2D(in_channels, channels * self.expansion, 1, stride, bias_attr=False),
|
||||
nn.BatchNorm2D(channels * self.expansion),
|
||||
)
|
||||
else:
|
||||
self.downsample = nn.Sequential()
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x):
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
|
||||
if self.downsample:
|
||||
residual = self.downsample(x)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class ResNet31(nn.Layer):
|
||||
'''
|
||||
Args:
|
||||
in_channels (int): Number of channels of input image tensor.
|
||||
layers (list[int]): List of BasicBlock number for each stage.
|
||||
channels (list[int]): List of out_channels of Conv2d layer.
|
||||
out_indices (None | Sequence[int]): Indices of output stages.
|
||||
last_stage_pool (bool): If True, add `MaxPool2d` layer to last stage.
|
||||
'''
|
||||
def __init__(self,
|
||||
in_channels=3,
|
||||
layers=[1, 2, 5, 3],
|
||||
channels=[64, 128, 256, 256, 512, 512, 512],
|
||||
out_indices=None,
|
||||
last_stage_pool=False):
|
||||
super(ResNet31, self).__init__()
|
||||
assert isinstance(in_channels, int)
|
||||
assert isinstance(last_stage_pool, bool)
|
||||
|
||||
self.out_indices = out_indices
|
||||
self.last_stage_pool = last_stage_pool
|
||||
|
||||
# conv 1 (Conv Conv)
|
||||
self.conv1_1 = nn.Conv2D(in_channels, channels[0], kernel_size=3, stride=1, padding=1)
|
||||
self.bn1_1 = nn.BatchNorm2D(channels[0])
|
||||
self.relu1_1 = nn.ReLU()
|
||||
|
||||
self.conv1_2 = nn.Conv2D(channels[0], channels[1], kernel_size=3, stride=1, padding=1)
|
||||
self.bn1_2 = nn.BatchNorm2D(channels[1])
|
||||
self.relu1_2 = nn.ReLU()
|
||||
|
||||
# conv 2 (Max-pooling, Residual block, Conv)
|
||||
self.pool2 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
|
||||
self.block2 = self._make_layer(channels[1], channels[2], layers[0])
|
||||
self.conv2 = nn.Conv2D(channels[2], channels[2], kernel_size=3, stride=1, padding=1)
|
||||
self.bn2 = nn.BatchNorm2D(channels[2])
|
||||
self.relu2 = nn.ReLU()
|
||||
|
||||
# conv 3 (Max-pooling, Residual block, Conv)
|
||||
self.pool3 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
|
||||
self.block3 = self._make_layer(channels[2], channels[3], layers[1])
|
||||
self.conv3 = nn.Conv2D(channels[3], channels[3], kernel_size=3, stride=1, padding=1)
|
||||
self.bn3 = nn.BatchNorm2D(channels[3])
|
||||
self.relu3 = nn.ReLU()
|
||||
|
||||
# conv 4 (Max-pooling, Residual block, Conv)
|
||||
self.pool4 = nn.MaxPool2D(kernel_size=(2, 1), stride=(2, 1), padding=0, ceil_mode=True)
|
||||
self.block4 = self._make_layer(channels[3], channels[4], layers[2])
|
||||
self.conv4 = nn.Conv2D(channels[4], channels[4], kernel_size=3, stride=1, padding=1)
|
||||
self.bn4 = nn.BatchNorm2D(channels[4])
|
||||
self.relu4 = nn.ReLU()
|
||||
|
||||
# conv 5 ((Max-pooling), Residual block, Conv)
|
||||
self.pool5 = None
|
||||
if self.last_stage_pool:
|
||||
self.pool5 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
|
||||
self.block5 = self._make_layer(channels[4], channels[5], layers[3])
|
||||
self.conv5 = nn.Conv2D(channels[5], channels[5], kernel_size=3, stride=1, padding=1)
|
||||
self.bn5 = nn.BatchNorm2D(channels[5])
|
||||
self.relu5 = nn.ReLU()
|
||||
|
||||
self.out_channels = channels[-1]
|
||||
|
||||
def _make_layer(self, input_channels, output_channels, blocks):
|
||||
layers = []
|
||||
for _ in range(blocks):
|
||||
downsample = None
|
||||
if input_channels != output_channels:
|
||||
downsample = nn.Sequential(
|
||||
nn.Conv2D(
|
||||
input_channels,
|
||||
output_channels,
|
||||
kernel_size=1,
|
||||
stride=1,
|
||||
bias_attr=False),
|
||||
nn.BatchNorm2D(output_channels),
|
||||
)
|
||||
|
||||
layers.append(BasicBlock(input_channels, output_channels, downsample=downsample))
|
||||
input_channels = output_channels
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def forward(self, x):
|
||||
x = self.conv1_1(x)
|
||||
x = self.bn1_1(x)
|
||||
x = self.relu1_1(x)
|
||||
|
||||
x = self.conv1_2(x)
|
||||
x = self.bn1_2(x)
|
||||
x = self.relu1_2(x)
|
||||
|
||||
outs = []
|
||||
for i in range(4):
|
||||
layer_index = i + 2
|
||||
pool_layer = getattr(self, f'pool{layer_index}')
|
||||
block_layer = getattr(self, f'block{layer_index}')
|
||||
conv_layer = getattr(self, f'conv{layer_index}')
|
||||
bn_layer = getattr(self, f'bn{layer_index}')
|
||||
relu_layer = getattr(self, f'relu{layer_index}')
|
||||
|
||||
if pool_layer is not None:
|
||||
x = pool_layer(x)
|
||||
x = block_layer(x)
|
||||
x = conv_layer(x)
|
||||
x = bn_layer(x)
|
||||
x= relu_layer(x)
|
||||
|
||||
outs.append(x)
|
||||
|
||||
if self.out_indices is not None:
|
||||
return tuple([outs[i] for i in self.out_indices])
|
||||
|
||||
return x
|
|
@ -27,12 +27,13 @@ def build_head(config):
|
|||
from .rec_att_head import AttentionHead
|
||||
from .rec_srn_head import SRNHead
|
||||
from .rec_nrtr_head import Transformer
|
||||
from .rec_sar_head import SARHead
|
||||
|
||||
# cls head
|
||||
from .cls_head import ClsHead
|
||||
support_dict = [
|
||||
'DBHead', 'EASTHead', 'SASTHead', 'CTCHead', 'ClsHead', 'AttentionHead',
|
||||
'SRNHead', 'PGHead', 'Transformer', 'TableAttentionHead'
|
||||
'SRNHead', 'PGHead', 'Transformer', 'TableAttentionHead', 'SARHead'
|
||||
]
|
||||
|
||||
#table head
|
||||
|
|
|
@ -0,0 +1,383 @@
|
|||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
import paddle
|
||||
from paddle import ParamAttr
|
||||
import paddle.nn as nn
|
||||
import paddle.nn.functional as F
|
||||
|
||||
|
||||
class SAREncoder(nn.Layer):
|
||||
"""
|
||||
Args:
|
||||
enc_bi_rnn (bool): If True, use bidirectional RNN in encoder.
|
||||
enc_drop_rnn (float): Dropout probability of RNN layer in encoder.
|
||||
enc_gru (bool): If True, use GRU, else LSTM in encoder.
|
||||
d_model (int): Dim of channels from backbone.
|
||||
d_enc (int): Dim of encoder RNN layer.
|
||||
mask (bool): If True, mask padding in RNN sequence.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
enc_bi_rnn=False,
|
||||
enc_drop_rnn=0.1,
|
||||
enc_gru=False,
|
||||
d_model=512,
|
||||
d_enc=512,
|
||||
mask=True,
|
||||
**kwargs):
|
||||
super().__init__()
|
||||
assert isinstance(enc_bi_rnn, bool)
|
||||
assert isinstance(enc_drop_rnn, (int, float))
|
||||
assert 0 <= enc_drop_rnn < 1.0
|
||||
assert isinstance(enc_gru, bool)
|
||||
assert isinstance(d_model, int)
|
||||
assert isinstance(d_enc, int)
|
||||
assert isinstance(mask, bool)
|
||||
|
||||
self.enc_bi_rnn = enc_bi_rnn
|
||||
self.enc_drop_rnn = enc_drop_rnn
|
||||
self.mask = mask
|
||||
|
||||
# LSTM Encoder
|
||||
if enc_bi_rnn:
|
||||
direction = 'bidirectional'
|
||||
else:
|
||||
direction = 'forward'
|
||||
kwargs = dict(
|
||||
input_size=d_model,
|
||||
hidden_size=d_enc,
|
||||
num_layers=2,
|
||||
time_major=False,
|
||||
dropout=enc_drop_rnn,
|
||||
direction=direction)
|
||||
if enc_gru:
|
||||
self.rnn_encoder = nn.GRU(**kwargs)
|
||||
else:
|
||||
self.rnn_encoder = nn.LSTM(**kwargs)
|
||||
|
||||
# global feature transformation
|
||||
encoder_rnn_out_size = d_enc * (int(enc_bi_rnn) + 1)
|
||||
self.linear = nn.Linear(encoder_rnn_out_size, encoder_rnn_out_size)
|
||||
|
||||
def forward(self, feat, img_metas=None):
|
||||
if img_metas is not None:
|
||||
assert len(img_metas[0]) == feat.shape[0]
|
||||
|
||||
valid_ratios = None
|
||||
if img_metas is not None and self.mask:
|
||||
valid_ratios = img_metas[-1]
|
||||
|
||||
h_feat = feat.shape[2] # bsz c h w
|
||||
feat_v = F.max_pool2d(
|
||||
feat, kernel_size=(h_feat, 1), stride=1, padding=0)
|
||||
feat_v = feat_v.squeeze(2) # bsz * C * W
|
||||
feat_v = paddle.transpose(feat_v, perm=[0, 2, 1]) # bsz * W * C
|
||||
holistic_feat = self.rnn_encoder(feat_v)[0] # bsz * T * C
|
||||
|
||||
if valid_ratios is not None:
|
||||
valid_hf = []
|
||||
T = holistic_feat.shape[1]
|
||||
for i, valid_ratio in enumerate(valid_ratios):
|
||||
valid_step = min(T, math.ceil(T * valid_ratio)) - 1
|
||||
valid_hf.append(holistic_feat[i, valid_step, :])
|
||||
valid_hf = paddle.stack(valid_hf, axis=0)
|
||||
else:
|
||||
valid_hf = holistic_feat[:, -1, :] # bsz * C
|
||||
holistic_feat = self.linear(valid_hf) # bsz * C
|
||||
|
||||
return holistic_feat
|
||||
|
||||
|
||||
class BaseDecoder(nn.Layer):
|
||||
def __init__(self, **kwargs):
|
||||
super().__init__()
|
||||
|
||||
def forward_train(self, feat, out_enc, targets, img_metas):
|
||||
raise NotImplementedError
|
||||
|
||||
def forward_test(self, feat, out_enc, img_metas):
|
||||
raise NotImplementedError
|
||||
|
||||
def forward(self,
|
||||
feat,
|
||||
out_enc,
|
||||
label=None,
|
||||
img_metas=None,
|
||||
train_mode=True):
|
||||
self.train_mode = train_mode
|
||||
|
||||
if train_mode:
|
||||
return self.forward_train(feat, out_enc, label, img_metas)
|
||||
return self.forward_test(feat, out_enc, img_metas)
|
||||
|
||||
|
||||
class ParallelSARDecoder(BaseDecoder):
|
||||
"""
|
||||
Args:
|
||||
out_channels (int): Output class number.
|
||||
enc_bi_rnn (bool): If True, use bidirectional RNN in encoder.
|
||||
dec_bi_rnn (bool): If True, use bidirectional RNN in decoder.
|
||||
dec_drop_rnn (float): Dropout of RNN layer in decoder.
|
||||
dec_gru (bool): If True, use GRU, else LSTM in decoder.
|
||||
d_model (int): Dim of channels from backbone.
|
||||
d_enc (int): Dim of encoder RNN layer.
|
||||
d_k (int): Dim of channels of attention module.
|
||||
pred_dropout (float): Dropout probability of prediction layer.
|
||||
max_seq_len (int): Maximum sequence length for decoding.
|
||||
mask (bool): If True, mask padding in feature map.
|
||||
start_idx (int): Index of start token.
|
||||
padding_idx (int): Index of padding token.
|
||||
pred_concat (bool): If True, concat glimpse feature from
|
||||
attention with holistic feature and hidden state.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
out_channels, # 90 + unknown + start + padding
|
||||
enc_bi_rnn=False,
|
||||
dec_bi_rnn=False,
|
||||
dec_drop_rnn=0.0,
|
||||
dec_gru=False,
|
||||
d_model=512,
|
||||
d_enc=512,
|
||||
d_k=64,
|
||||
pred_dropout=0.1,
|
||||
max_text_length=30,
|
||||
mask=True,
|
||||
pred_concat=True,
|
||||
**kwargs):
|
||||
super().__init__()
|
||||
|
||||
self.num_classes = out_channels
|
||||
self.enc_bi_rnn = enc_bi_rnn
|
||||
self.d_k = d_k
|
||||
self.start_idx = out_channels - 2
|
||||
self.padding_idx = out_channels - 1
|
||||
self.max_seq_len = max_text_length
|
||||
self.mask = mask
|
||||
self.pred_concat = pred_concat
|
||||
|
||||
encoder_rnn_out_size = d_enc * (int(enc_bi_rnn) + 1)
|
||||
decoder_rnn_out_size = encoder_rnn_out_size * (int(dec_bi_rnn) + 1)
|
||||
|
||||
# 2D attention layer
|
||||
self.conv1x1_1 = nn.Linear(decoder_rnn_out_size, d_k)
|
||||
self.conv3x3_1 = nn.Conv2D(
|
||||
d_model, d_k, kernel_size=3, stride=1, padding=1)
|
||||
self.conv1x1_2 = nn.Linear(d_k, 1)
|
||||
|
||||
# Decoder RNN layer
|
||||
if dec_bi_rnn:
|
||||
direction = 'bidirectional'
|
||||
else:
|
||||
direction = 'forward'
|
||||
|
||||
kwargs = dict(
|
||||
input_size=encoder_rnn_out_size,
|
||||
hidden_size=encoder_rnn_out_size,
|
||||
num_layers=2,
|
||||
time_major=False,
|
||||
dropout=dec_drop_rnn,
|
||||
direction=direction)
|
||||
if dec_gru:
|
||||
self.rnn_decoder = nn.GRU(**kwargs)
|
||||
else:
|
||||
self.rnn_decoder = nn.LSTM(**kwargs)
|
||||
|
||||
# Decoder input embedding
|
||||
self.embedding = nn.Embedding(
|
||||
self.num_classes,
|
||||
encoder_rnn_out_size,
|
||||
padding_idx=self.padding_idx)
|
||||
|
||||
# Prediction layer
|
||||
self.pred_dropout = nn.Dropout(pred_dropout)
|
||||
pred_num_classes = self.num_classes - 1
|
||||
if pred_concat:
|
||||
fc_in_channel = decoder_rnn_out_size + d_model + d_enc
|
||||
else:
|
||||
fc_in_channel = d_model
|
||||
self.prediction = nn.Linear(fc_in_channel, pred_num_classes)
|
||||
|
||||
def _2d_attention(self,
|
||||
decoder_input,
|
||||
feat,
|
||||
holistic_feat,
|
||||
valid_ratios=None):
|
||||
|
||||
y = self.rnn_decoder(decoder_input)[0]
|
||||
# y: bsz * (seq_len + 1) * hidden_size
|
||||
|
||||
attn_query = self.conv1x1_1(y) # bsz * (seq_len + 1) * attn_size
|
||||
bsz, seq_len, attn_size = attn_query.shape
|
||||
attn_query = paddle.unsqueeze(attn_query, axis=[3, 4])
|
||||
# (bsz, seq_len + 1, attn_size, 1, 1)
|
||||
|
||||
attn_key = self.conv3x3_1(feat)
|
||||
# bsz * attn_size * h * w
|
||||
attn_key = attn_key.unsqueeze(1)
|
||||
# bsz * 1 * attn_size * h * w
|
||||
|
||||
attn_weight = paddle.tanh(paddle.add(attn_key, attn_query))
|
||||
|
||||
# bsz * (seq_len + 1) * attn_size * h * w
|
||||
attn_weight = paddle.transpose(attn_weight, perm=[0, 1, 3, 4, 2])
|
||||
# bsz * (seq_len + 1) * h * w * attn_size
|
||||
attn_weight = self.conv1x1_2(attn_weight)
|
||||
# bsz * (seq_len + 1) * h * w * 1
|
||||
bsz, T, h, w, c = attn_weight.shape
|
||||
assert c == 1
|
||||
|
||||
if valid_ratios is not None:
|
||||
# cal mask of attention weight
|
||||
for i, valid_ratio in enumerate(valid_ratios):
|
||||
valid_width = min(w, math.ceil(w * valid_ratio))
|
||||
attn_weight[i, :, :, valid_width:, :] = float('-inf')
|
||||
|
||||
attn_weight = paddle.reshape(attn_weight, [bsz, T, -1])
|
||||
attn_weight = F.softmax(attn_weight, axis=-1)
|
||||
|
||||
attn_weight = paddle.reshape(attn_weight, [bsz, T, h, w, c])
|
||||
attn_weight = paddle.transpose(attn_weight, perm=[0, 1, 4, 2, 3])
|
||||
# attn_weight: bsz * T * c * h * w
|
||||
# feat: bsz * c * h * w
|
||||
attn_feat = paddle.sum(paddle.multiply(feat.unsqueeze(1), attn_weight),
|
||||
(3, 4),
|
||||
keepdim=False)
|
||||
# bsz * (seq_len + 1) * C
|
||||
|
||||
# Linear transformation
|
||||
if self.pred_concat:
|
||||
hf_c = holistic_feat.shape[-1]
|
||||
holistic_feat = paddle.expand(
|
||||
holistic_feat, shape=[bsz, seq_len, hf_c])
|
||||
y = self.prediction(paddle.concat((y, attn_feat, holistic_feat), 2))
|
||||
else:
|
||||
y = self.prediction(attn_feat)
|
||||
# bsz * (seq_len + 1) * num_classes
|
||||
if self.train_mode:
|
||||
y = self.pred_dropout(y)
|
||||
|
||||
return y
|
||||
|
||||
def forward_train(self, feat, out_enc, label, img_metas):
|
||||
'''
|
||||
img_metas: [label, valid_ratio]
|
||||
'''
|
||||
if img_metas is not None:
|
||||
assert len(img_metas[0]) == feat.shape[0]
|
||||
|
||||
valid_ratios = None
|
||||
if img_metas is not None and self.mask:
|
||||
valid_ratios = img_metas[-1]
|
||||
|
||||
label = label.cuda()
|
||||
lab_embedding = self.embedding(label)
|
||||
# bsz * seq_len * emb_dim
|
||||
out_enc = out_enc.unsqueeze(1)
|
||||
# bsz * 1 * emb_dim
|
||||
in_dec = paddle.concat((out_enc, lab_embedding), axis=1)
|
||||
# bsz * (seq_len + 1) * C
|
||||
out_dec = self._2d_attention(
|
||||
in_dec, feat, out_enc, valid_ratios=valid_ratios)
|
||||
# bsz * (seq_len + 1) * num_classes
|
||||
|
||||
return out_dec[:, 1:, :] # bsz * seq_len * num_classes
|
||||
|
||||
def forward_test(self, feat, out_enc, img_metas):
|
||||
if img_metas is not None:
|
||||
assert len(img_metas[0]) == feat.shape[0]
|
||||
|
||||
valid_ratios = None
|
||||
if img_metas is not None and self.mask:
|
||||
valid_ratios = img_metas[-1]
|
||||
|
||||
seq_len = self.max_seq_len
|
||||
bsz = feat.shape[0]
|
||||
start_token = paddle.full(
|
||||
(bsz, ), fill_value=self.start_idx, dtype='int64')
|
||||
# bsz
|
||||
start_token = self.embedding(start_token)
|
||||
# bsz * emb_dim
|
||||
emb_dim = start_token.shape[1]
|
||||
start_token = start_token.unsqueeze(1)
|
||||
start_token = paddle.expand(start_token, shape=[bsz, seq_len, emb_dim])
|
||||
# bsz * seq_len * emb_dim
|
||||
out_enc = out_enc.unsqueeze(1)
|
||||
# bsz * 1 * emb_dim
|
||||
decoder_input = paddle.concat((out_enc, start_token), axis=1)
|
||||
# bsz * (seq_len + 1) * emb_dim
|
||||
|
||||
outputs = []
|
||||
for i in range(1, seq_len + 1):
|
||||
decoder_output = self._2d_attention(
|
||||
decoder_input, feat, out_enc, valid_ratios=valid_ratios)
|
||||
char_output = decoder_output[:, i, :] # bsz * num_classes
|
||||
char_output = F.softmax(char_output, -1)
|
||||
outputs.append(char_output)
|
||||
max_idx = paddle.argmax(char_output, axis=1, keepdim=False)
|
||||
char_embedding = self.embedding(max_idx) # bsz * emb_dim
|
||||
if i < seq_len:
|
||||
decoder_input[:, i + 1, :] = char_embedding
|
||||
|
||||
outputs = paddle.stack(outputs, 1) # bsz * seq_len * num_classes
|
||||
|
||||
return outputs
|
||||
|
||||
|
||||
class SARHead(nn.Layer):
|
||||
def __init__(self,
|
||||
out_channels,
|
||||
enc_bi_rnn=False,
|
||||
enc_drop_rnn=0.1,
|
||||
enc_gru=False,
|
||||
dec_bi_rnn=False,
|
||||
dec_drop_rnn=0.0,
|
||||
dec_gru=False,
|
||||
d_k=512,
|
||||
pred_dropout=0.1,
|
||||
max_text_length=30,
|
||||
pred_concat=True,
|
||||
**kwargs):
|
||||
super(SARHead, self).__init__()
|
||||
|
||||
# encoder module
|
||||
self.encoder = SAREncoder(
|
||||
enc_bi_rnn=enc_bi_rnn, enc_drop_rnn=enc_drop_rnn, enc_gru=enc_gru)
|
||||
|
||||
# decoder module
|
||||
self.decoder = ParallelSARDecoder(
|
||||
out_channels=out_channels,
|
||||
enc_bi_rnn=enc_bi_rnn,
|
||||
dec_bi_rnn=dec_bi_rnn,
|
||||
dec_drop_rnn=dec_drop_rnn,
|
||||
dec_gru=dec_gru,
|
||||
d_k=d_k,
|
||||
pred_dropout=pred_dropout,
|
||||
max_text_length=max_text_length,
|
||||
pred_concat=pred_concat)
|
||||
|
||||
def forward(self, feat, targets=None):
|
||||
'''
|
||||
img_metas: [label, valid_ratio]
|
||||
'''
|
||||
holistic_feat = self.encoder(feat, targets) # bsz c
|
||||
|
||||
if self.training:
|
||||
label = targets[0] # label
|
||||
label = paddle.to_tensor(label, dtype='int64')
|
||||
final_out = self.decoder(
|
||||
feat, holistic_feat, label, img_metas=targets)
|
||||
if not self.training:
|
||||
final_out = self.decoder(
|
||||
feat,
|
||||
holistic_feat,
|
||||
label=None,
|
||||
img_metas=targets,
|
||||
train_mode=False)
|
||||
# (bsz, seq_len, num_classes)
|
||||
|
||||
return final_out
|
|
@ -25,7 +25,7 @@ from .db_postprocess import DBPostProcess, DistillationDBPostProcess
|
|||
from .east_postprocess import EASTPostProcess
|
||||
from .sast_postprocess import SASTPostProcess
|
||||
from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, DistillationCTCLabelDecode, NRTRLabelDecode, \
|
||||
TableLabelDecode
|
||||
TableLabelDecode, SARLabelDecode
|
||||
from .cls_postprocess import ClsPostProcess
|
||||
from .pg_postprocess import PGPostProcess
|
||||
|
||||
|
@ -33,7 +33,8 @@ def build_post_process(config, global_config=None):
|
|||
support_dict = [
|
||||
'DBPostProcess', 'EASTPostProcess', 'SASTPostProcess', 'CTCLabelDecode',
|
||||
'AttnLabelDecode', 'ClsPostProcess', 'SRNLabelDecode', 'PGPostProcess',
|
||||
'DistillationCTCLabelDecode', 'NRTRLabelDecode', 'TableLabelDecode', 'DistillationDBPostProcess'
|
||||
'DistillationCTCLabelDecode', 'TableLabelDecode',
|
||||
'DistillationDBPostProcess', 'NRTRLabelDecode', 'SARLabelDecode'
|
||||
]
|
||||
|
||||
config = copy.deepcopy(config)
|
||||
|
|
|
@ -15,6 +15,7 @@ import numpy as np
|
|||
import string
|
||||
import paddle
|
||||
from paddle.nn import functional as F
|
||||
import re
|
||||
|
||||
|
||||
class BaseRecLabelDecode(object):
|
||||
|
@ -165,21 +166,21 @@ class NRTRLabelDecode(BaseRecLabelDecode):
|
|||
use_space_char=True,
|
||||
**kwargs):
|
||||
super(NRTRLabelDecode, self).__init__(character_dict_path,
|
||||
character_type, use_space_char)
|
||||
character_type, use_space_char)
|
||||
|
||||
def __call__(self, preds, label=None, *args, **kwargs):
|
||||
if preds.dtype == paddle.int64:
|
||||
if isinstance(preds, paddle.Tensor):
|
||||
preds = preds.numpy()
|
||||
if preds[0][0]==2:
|
||||
preds_idx = preds[:,1:]
|
||||
if preds[0][0] == 2:
|
||||
preds_idx = preds[:, 1:]
|
||||
else:
|
||||
preds_idx = preds
|
||||
|
||||
text = self.decode(preds_idx)
|
||||
if label is None:
|
||||
return text
|
||||
label = self.decode(label[:,1:])
|
||||
label = self.decode(label[:, 1:])
|
||||
else:
|
||||
if isinstance(preds, paddle.Tensor):
|
||||
preds = preds.numpy()
|
||||
|
@ -188,13 +189,13 @@ class NRTRLabelDecode(BaseRecLabelDecode):
|
|||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=False)
|
||||
if label is None:
|
||||
return text
|
||||
label = self.decode(label[:,1:])
|
||||
label = self.decode(label[:, 1:])
|
||||
return text, label
|
||||
|
||||
def add_special_char(self, dict_character):
|
||||
dict_character = ['blank','<unk>','<s>','</s>'] + dict_character
|
||||
dict_character = ['blank', '<unk>', '<s>', '</s>'] + dict_character
|
||||
return dict_character
|
||||
|
||||
|
||||
def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
|
||||
""" convert text-index into text-label. """
|
||||
result_list = []
|
||||
|
@ -203,10 +204,11 @@ class NRTRLabelDecode(BaseRecLabelDecode):
|
|||
char_list = []
|
||||
conf_list = []
|
||||
for idx in range(len(text_index[batch_idx])):
|
||||
if text_index[batch_idx][idx] == 3: # end
|
||||
if text_index[batch_idx][idx] == 3: # end
|
||||
break
|
||||
try:
|
||||
char_list.append(self.character[int(text_index[batch_idx][idx])])
|
||||
char_list.append(self.character[int(text_index[batch_idx][
|
||||
idx])])
|
||||
except:
|
||||
continue
|
||||
if text_prob is not None:
|
||||
|
@ -218,7 +220,6 @@ class NRTRLabelDecode(BaseRecLabelDecode):
|
|||
return result_list
|
||||
|
||||
|
||||
|
||||
class AttnLabelDecode(BaseRecLabelDecode):
|
||||
""" Convert between text-label and text-index """
|
||||
|
||||
|
@ -256,7 +257,8 @@ class AttnLabelDecode(BaseRecLabelDecode):
|
|||
if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
|
||||
batch_idx][idx]:
|
||||
continue
|
||||
char_list.append(self.character[int(text_index[batch_idx][idx])])
|
||||
char_list.append(self.character[int(text_index[batch_idx][
|
||||
idx])])
|
||||
if text_prob is not None:
|
||||
conf_list.append(text_prob[batch_idx][idx])
|
||||
else:
|
||||
|
@ -386,10 +388,9 @@ class SRNLabelDecode(BaseRecLabelDecode):
|
|||
class TableLabelDecode(object):
|
||||
""" """
|
||||
|
||||
def __init__(self,
|
||||
character_dict_path,
|
||||
**kwargs):
|
||||
list_character, list_elem = self.load_char_elem_dict(character_dict_path)
|
||||
def __init__(self, character_dict_path, **kwargs):
|
||||
list_character, list_elem = self.load_char_elem_dict(
|
||||
character_dict_path)
|
||||
list_character = self.add_special_char(list_character)
|
||||
list_elem = self.add_special_char(list_elem)
|
||||
self.dict_character = {}
|
||||
|
@ -408,7 +409,8 @@ class TableLabelDecode(object):
|
|||
list_elem = []
|
||||
with open(character_dict_path, "rb") as fin:
|
||||
lines = fin.readlines()
|
||||
substr = lines[0].decode('utf-8').strip("\n").strip("\r\n").split("\t")
|
||||
substr = lines[0].decode('utf-8').strip("\n").strip("\r\n").split(
|
||||
"\t")
|
||||
character_num = int(substr[0])
|
||||
elem_num = int(substr[1])
|
||||
for cno in range(1, 1 + character_num):
|
||||
|
@ -428,14 +430,14 @@ class TableLabelDecode(object):
|
|||
def __call__(self, preds):
|
||||
structure_probs = preds['structure_probs']
|
||||
loc_preds = preds['loc_preds']
|
||||
if isinstance(structure_probs,paddle.Tensor):
|
||||
if isinstance(structure_probs, paddle.Tensor):
|
||||
structure_probs = structure_probs.numpy()
|
||||
if isinstance(loc_preds,paddle.Tensor):
|
||||
if isinstance(loc_preds, paddle.Tensor):
|
||||
loc_preds = loc_preds.numpy()
|
||||
structure_idx = structure_probs.argmax(axis=2)
|
||||
structure_probs = structure_probs.max(axis=2)
|
||||
structure_str, structure_pos, result_score_list, result_elem_idx_list = self.decode(structure_idx,
|
||||
structure_probs, 'elem')
|
||||
structure_str, structure_pos, result_score_list, result_elem_idx_list = self.decode(
|
||||
structure_idx, structure_probs, 'elem')
|
||||
res_html_code_list = []
|
||||
res_loc_list = []
|
||||
batch_num = len(structure_str)
|
||||
|
@ -450,8 +452,13 @@ class TableLabelDecode(object):
|
|||
res_loc = np.array(res_loc)
|
||||
res_html_code_list.append(res_html_code)
|
||||
res_loc_list.append(res_loc)
|
||||
return {'res_html_code': res_html_code_list, 'res_loc': res_loc_list, 'res_score_list': result_score_list,
|
||||
'res_elem_idx_list': result_elem_idx_list,'structure_str_list':structure_str}
|
||||
return {
|
||||
'res_html_code': res_html_code_list,
|
||||
'res_loc': res_loc_list,
|
||||
'res_score_list': result_score_list,
|
||||
'res_elem_idx_list': result_elem_idx_list,
|
||||
'structure_str_list': structure_str
|
||||
}
|
||||
|
||||
def decode(self, text_index, structure_probs, char_or_elem):
|
||||
"""convert text-label into text-index.
|
||||
|
@ -516,3 +523,82 @@ class TableLabelDecode(object):
|
|||
assert False, "Unsupport type %s in char_or_elem" \
|
||||
% char_or_elem
|
||||
return idx
|
||||
|
||||
|
||||
class SARLabelDecode(BaseRecLabelDecode):
|
||||
""" Convert between text-label and text-index """
|
||||
|
||||
def __init__(self,
|
||||
character_dict_path=None,
|
||||
character_type='ch',
|
||||
use_space_char=False,
|
||||
**kwargs):
|
||||
super(SARLabelDecode, self).__init__(character_dict_path,
|
||||
character_type, use_space_char)
|
||||
|
||||
self.rm_symbol = kwargs.get('rm_symbol', False)
|
||||
|
||||
def add_special_char(self, dict_character):
|
||||
beg_end_str = "<BOS/EOS>"
|
||||
unknown_str = "<UKN>"
|
||||
padding_str = "<PAD>"
|
||||
dict_character = dict_character + [unknown_str]
|
||||
self.unknown_idx = len(dict_character) - 1
|
||||
dict_character = dict_character + [beg_end_str]
|
||||
self.start_idx = len(dict_character) - 1
|
||||
self.end_idx = len(dict_character) - 1
|
||||
dict_character = dict_character + [padding_str]
|
||||
self.padding_idx = len(dict_character) - 1
|
||||
return dict_character
|
||||
|
||||
def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
|
||||
""" convert text-index into text-label. """
|
||||
result_list = []
|
||||
ignored_tokens = self.get_ignored_tokens()
|
||||
|
||||
batch_size = len(text_index)
|
||||
for batch_idx in range(batch_size):
|
||||
char_list = []
|
||||
conf_list = []
|
||||
for idx in range(len(text_index[batch_idx])):
|
||||
if text_index[batch_idx][idx] in ignored_tokens:
|
||||
continue
|
||||
if int(text_index[batch_idx][idx]) == int(self.end_idx):
|
||||
if text_prob is None and idx == 0:
|
||||
continue
|
||||
else:
|
||||
break
|
||||
if is_remove_duplicate:
|
||||
# only for predict
|
||||
if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
|
||||
batch_idx][idx]:
|
||||
continue
|
||||
char_list.append(self.character[int(text_index[batch_idx][
|
||||
idx])])
|
||||
if text_prob is not None:
|
||||
conf_list.append(text_prob[batch_idx][idx])
|
||||
else:
|
||||
conf_list.append(1)
|
||||
text = ''.join(char_list)
|
||||
if self.rm_symbol:
|
||||
comp = re.compile('[^A-Z^a-z^0-9^\u4e00-\u9fa5]')
|
||||
text = text.lower()
|
||||
text = comp.sub('', text)
|
||||
result_list.append((text, np.mean(conf_list)))
|
||||
return result_list
|
||||
|
||||
def __call__(self, preds, label=None, *args, **kwargs):
|
||||
if isinstance(preds, paddle.Tensor):
|
||||
preds = preds.numpy()
|
||||
preds_idx = preds.argmax(axis=2)
|
||||
preds_prob = preds.max(axis=2)
|
||||
|
||||
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=False)
|
||||
|
||||
if label is None:
|
||||
return text
|
||||
label = self.decode(label, is_remove_duplicate=False)
|
||||
return text, label
|
||||
|
||||
def get_ignored_tokens(self):
|
||||
return [self.padding_idx]
|
||||
|
|
|
@ -0,0 +1,126 @@
|
|||
Global:
|
||||
use_gpu: false
|
||||
epoch_num: 5
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 1
|
||||
save_model_dir: ./output/db_mv3/
|
||||
save_epoch_step: 1200
|
||||
# evaluation is run every 2000 iterations
|
||||
eval_batch_step: [0, 400]
|
||||
cal_metric_during_train: False
|
||||
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
||||
infer_img: doc/imgs_en/img_10.jpg
|
||||
save_res_path: ./output/det_db/predicts_db.txt
|
||||
|
||||
Architecture:
|
||||
model_type: det
|
||||
algorithm: DB
|
||||
Transform:
|
||||
Backbone:
|
||||
name: MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: large
|
||||
disable_se: True
|
||||
Neck:
|
||||
name: DBFPN
|
||||
out_channels: 96
|
||||
Head:
|
||||
name: DBHead
|
||||
k: 50
|
||||
|
||||
Loss:
|
||||
name: DBLoss
|
||||
balance_loss: true
|
||||
main_loss_type: DiceLoss
|
||||
alpha: 5
|
||||
beta: 10
|
||||
ohem_ratio: 3
|
||||
|
||||
Optimizer:
|
||||
name: Adam #Momentum
|
||||
#momentum: 0.9
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
lr:
|
||||
learning_rate: 0.001
|
||||
regularizer:
|
||||
name: 'L2'
|
||||
factor: 0
|
||||
|
||||
PostProcess:
|
||||
name: DBPostProcess
|
||||
thresh: 0.3
|
||||
box_thresh: 0.6
|
||||
max_candidates: 1000
|
||||
unclip_ratio: 1.5
|
||||
|
||||
Metric:
|
||||
name: DetMetric
|
||||
main_indicator: hmean
|
||||
|
||||
Train:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
data_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_list:
|
||||
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
|
||||
ratio_list: [1.0]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- DetLabelEncode: # Class handling label
|
||||
- Resize:
|
||||
# size: [640, 640]
|
||||
- MakeBorderMap:
|
||||
shrink_ratio: 0.4
|
||||
thresh_min: 0.3
|
||||
thresh_max: 0.7
|
||||
- MakeShrinkMap:
|
||||
shrink_ratio: 0.4
|
||||
min_text_size: 8
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1
|
||||
num_workers: 0
|
||||
use_shared_memory: False
|
||||
|
||||
Eval:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
data_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_list:
|
||||
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- DetLabelEncode: # Class handling label
|
||||
- DetResizeForTest:
|
||||
image_shape: [736, 1280]
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 0
|
||||
use_shared_memory: False
|
|
@ -0,0 +1,124 @@
|
|||
Global:
|
||||
use_gpu: false
|
||||
epoch_num: 5
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 1
|
||||
save_model_dir: ./output/db_mv3/
|
||||
save_epoch_step: 1200
|
||||
# evaluation is run every 2000 iterations
|
||||
eval_batch_step: [0, 400]
|
||||
cal_metric_during_train: False
|
||||
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
||||
infer_img: doc/imgs_en/img_10.jpg
|
||||
save_res_path: ./output/det_db/predicts_db.txt
|
||||
|
||||
Architecture:
|
||||
model_type: det
|
||||
algorithm: DB
|
||||
Transform:
|
||||
Backbone:
|
||||
name: ResNet #MobileNetV3
|
||||
layers: 50
|
||||
Neck:
|
||||
name: DBFPN
|
||||
out_channels: 256
|
||||
Head:
|
||||
name: DBHead
|
||||
k: 50
|
||||
|
||||
Loss:
|
||||
name: DBLoss
|
||||
balance_loss: true
|
||||
main_loss_type: DiceLoss
|
||||
alpha: 5 #5
|
||||
beta: 10 #10
|
||||
ohem_ratio: 3
|
||||
|
||||
Optimizer:
|
||||
name: Adam #Momentum
|
||||
#momentum: 0.9
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
lr:
|
||||
learning_rate: 0.001
|
||||
regularizer:
|
||||
name: 'L2'
|
||||
factor: 0
|
||||
|
||||
PostProcess:
|
||||
name: DBPostProcess
|
||||
thresh: 0.3
|
||||
box_thresh: 0.6
|
||||
max_candidates: 1000
|
||||
unclip_ratio: 1.5
|
||||
|
||||
Metric:
|
||||
name: DetMetric
|
||||
main_indicator: hmean
|
||||
|
||||
Train:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
data_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_list:
|
||||
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
|
||||
ratio_list: [1.0]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- DetLabelEncode: # Class handling label
|
||||
- Resize:
|
||||
# size: [640, 640]
|
||||
- MakeBorderMap:
|
||||
shrink_ratio: 0.4
|
||||
thresh_min: 0.3
|
||||
thresh_max: 0.7
|
||||
- MakeShrinkMap:
|
||||
shrink_ratio: 0.4
|
||||
min_text_size: 8
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1
|
||||
num_workers: 0
|
||||
use_shared_memory: False
|
||||
|
||||
Eval:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
data_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_list:
|
||||
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- DetLabelEncode: # Class handling label
|
||||
- DetResizeForTest:
|
||||
image_shape: [736, 1280]
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 0
|
||||
use_shared_memory: False
|
|
@ -13,34 +13,34 @@ train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
|
|||
null:null
|
||||
##
|
||||
trainer:norm_train|pact_train
|
||||
norm_train:tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
|
||||
pact_train:deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o
|
||||
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/det_mv3_db_v2.0_train/best_accuracy
|
||||
norm_train:tools/train.py -c tests/configs/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
|
||||
pact_train:deploy/slim/quantization/quant.py -c tests/configs/det_mv3_db.yml -o
|
||||
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c tests/configs/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/det_mv3_db_v2.0_train/best_accuracy
|
||||
distill_train:null
|
||||
null:null
|
||||
null:null
|
||||
##
|
||||
===========================eval_params===========================
|
||||
eval:tools/eval.py -c configs/det/det_mv3_db.yml -o
|
||||
eval:tools/eval.py -c tests/configs/det_mv3_db.yml -o
|
||||
null:null
|
||||
##
|
||||
===========================infer_params===========================
|
||||
Global.save_inference_dir:./output/
|
||||
Global.pretrained_model:
|
||||
norm_export:tools/export_model.py -c configs/det/det_mv3_db.yml -o
|
||||
quant_export:deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o
|
||||
fpgm_export:deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o
|
||||
norm_export:tools/export_model.py -c tests/configs/det_mv3_db.yml -o
|
||||
quant_export:deploy/slim/quantization/export_model.py -c tests/configs/det_mv3_db.yml -o
|
||||
fpgm_export:deploy/slim/prune/export_prune_model.py -c tests/configs/det_mv3_db.yml -o
|
||||
distill_export:null
|
||||
export1:null
|
||||
export2:null
|
||||
##
|
||||
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
infer_export:null
|
||||
train_model:./inference/ch_ppocr_mobile_v2.0_det_train/best_accuracy
|
||||
infer_export:tools/export_model.py -c configs/det/det_mv3_db.yml -o
|
||||
infer_quant:False
|
||||
inference:tools/infer/predict_det.py
|
||||
--use_gpu:True|False
|
||||
--enable_mkldnn:True|False
|
||||
--cpu_threads:1|6
|
||||
--cpu_threads:6
|
||||
--rec_batch_num:1
|
||||
--use_tensorrt:False|True
|
||||
--precision:fp32|fp16|int8
|
||||
|
@ -62,6 +62,21 @@ inference:./deploy/cpp_infer/build/ppocr det
|
|||
--precision:fp32|fp16
|
||||
--det_model_dir:
|
||||
--image_dir:./inference/ch_det_data_50/all-sum-510/
|
||||
--save_log_path:null
|
||||
null:null
|
||||
--benchmark:True
|
||||
===========================serving_params===========================
|
||||
trans_model:-m paddle_serving_client.convert
|
||||
--dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
--model_filename:inference.pdmodel
|
||||
--params_filename:inference.pdiparams
|
||||
--serving_server:./deploy/pdserving/ppocr_det_mobile_2.0_serving/
|
||||
--serving_client:./deploy/pdserving/ppocr_det_mobile_2.0_client/
|
||||
serving_dir:./deploy/pdserving
|
||||
web_service:web_service_det.py --config=config.yml --opt op.det.concurrency=1
|
||||
op.det.local_service_conf.devices:null|0
|
||||
op.det.local_service_conf.use_mkldnn:True|False
|
||||
op.det.local_service_conf.thread_num:1|6
|
||||
op.det.local_service_conf.use_trt:False|True
|
||||
op.det.local_service_conf.precision:fp32|fp16|int8
|
||||
pipline:pipeline_http_client.py --image_dir=../../doc/imgs
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
|
|||
null:null
|
||||
##
|
||||
trainer:norm_train|pact_train
|
||||
norm_train:tools/train.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=""
|
||||
norm_train:tools/train.py -c tests/configs/det_r50_vd_db.yml -o Global.pretrained_model=""
|
||||
pact_train:null
|
||||
fpgm_train:null
|
||||
distill_train:null
|
||||
|
@ -21,13 +21,13 @@ null:null
|
|||
null:null
|
||||
##
|
||||
===========================eval_params===========================
|
||||
eval:tools/eval.py -c configs/det/det_mv3_db.yml -o
|
||||
eval:tools/eval.py -c tests/configs/det_r50_vd_db.yml -o
|
||||
null:null
|
||||
##
|
||||
===========================infer_params===========================
|
||||
Global.save_inference_dir:./output/
|
||||
Global.pretrained_model:
|
||||
norm_export:tools/export_model.py -c configs/det/det_r50_vd_db.yml -o
|
||||
norm_export:tools/export_model.py -c tests/configs/det_r50_vd_db.yml -o
|
||||
quant_export:null
|
||||
fpgm_export:null
|
||||
distill_export:null
|
||||
|
|
|
@ -0,0 +1,67 @@
|
|||
===========================train_params===========================
|
||||
model_name:ocr_system
|
||||
python:python3.7
|
||||
gpu_list:null
|
||||
Global.use_gpu:null
|
||||
Global.auto_cast:null
|
||||
Global.epoch_num:null
|
||||
Global.save_model_dir:./output/
|
||||
Train.loader.batch_size_per_card:null
|
||||
Global.pretrained_model:null
|
||||
train_model_name:null
|
||||
train_infer_img_dir:null
|
||||
null:null
|
||||
##
|
||||
trainer:
|
||||
norm_train:null
|
||||
pact_train:null
|
||||
fpgm_train:null
|
||||
distill_train:null
|
||||
null:null
|
||||
null:null
|
||||
##
|
||||
===========================eval_params===========================
|
||||
eval:null
|
||||
null:null
|
||||
##
|
||||
===========================infer_params===========================
|
||||
Global.save_inference_dir:./output/
|
||||
Global.pretrained_model:
|
||||
norm_export:null
|
||||
quant_export:null
|
||||
fpgm_export:null
|
||||
distill_export:null
|
||||
export1:null
|
||||
export2:null
|
||||
##
|
||||
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
infer_export:null
|
||||
infer_quant:False
|
||||
inference:tools/infer/predict_system.py
|
||||
--use_gpu:True
|
||||
--enable_mkldnn:True|False
|
||||
--cpu_threads:1|6
|
||||
--rec_batch_num:1
|
||||
--use_tensorrt:False|True
|
||||
--precision:fp32|fp16|int8
|
||||
--det_model_dir:
|
||||
--image_dir:./inference/ch_det_data_50/all-sum-510/
|
||||
--save_log_path:null
|
||||
--benchmark:True
|
||||
--rec_model_dir:./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
===========================cpp_infer_params===========================
|
||||
use_opencv:True
|
||||
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
infer_quant:False
|
||||
inference:./deploy/cpp_infer/build/ppocr system
|
||||
--use_gpu:True|False
|
||||
--enable_mkldnn:True|False
|
||||
--cpu_threads:1|6
|
||||
--rec_batch_num:1
|
||||
--use_tensorrt:False|True
|
||||
--precision:fp32|fp16
|
||||
--det_model_dir:
|
||||
--image_dir:./inference/ch_det_data_50/all-sum-510/
|
||||
--rec_model_dir:./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
--benchmark:True
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
===========================train_params===========================
|
||||
model_name:ocr_rec
|
||||
python:python3.7
|
||||
gpu_list:0|2,3
|
||||
gpu_list:0|0,1
|
||||
Global.use_gpu:True|True
|
||||
Global.auto_cast:null
|
||||
Global.epoch_num:lite_train_infer=2|whole_train_infer=300
|
||||
|
@ -9,7 +9,7 @@ Global.save_model_dir:./output/
|
|||
Train.loader.batch_size_per_card:lite_train_infer=128|whole_train_infer=128
|
||||
Global.pretrained_model:null
|
||||
train_model_name:latest
|
||||
train_infer_img_dir:./train_data/ic15_data/train
|
||||
train_infer_img_dir:./inference/rec_inference
|
||||
null:null
|
||||
##
|
||||
trainer:norm_train|pact_train
|
||||
|
@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py
|
|||
--use_gpu:True|False
|
||||
--enable_mkldnn:True|False
|
||||
--cpu_threads:1|6
|
||||
--rec_batch_num:1
|
||||
--rec_batch_num:1|6
|
||||
--use_tensorrt:True|False
|
||||
--precision:fp32|fp16|int8
|
||||
--rec_model_dir:
|
||||
|
@ -49,3 +49,18 @@ inference:tools/infer/predict_rec.py
|
|||
--save_log_path:./test/output/
|
||||
--benchmark:True
|
||||
null:null
|
||||
===========================cpp_infer_params===========================
|
||||
use_opencv:True
|
||||
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
infer_quant:False
|
||||
inference:./deploy/cpp_infer/build/ppocr rec
|
||||
--use_gpu:True|False
|
||||
--enable_mkldnn:True|False
|
||||
--cpu_threads:1|6
|
||||
--rec_batch_num:1
|
||||
--use_tensorrt:False|True
|
||||
--precision:fp32|fp16
|
||||
--rec_model_dir:
|
||||
--image_dir:./inference/rec_inference/
|
||||
null:null
|
||||
--benchmark:True
|
112
tests/prepare.sh
112
tests/prepare.sh
|
@ -1,6 +1,7 @@
|
|||
#!/bin/bash
|
||||
FILENAME=$1
|
||||
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer', 'cpp_infer']
|
||||
|
||||
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer', 'cpp_infer', 'serving_infer']
|
||||
MODE=$2
|
||||
|
||||
dataline=$(cat ${FILENAME})
|
||||
|
@ -40,11 +41,13 @@ if [ ${MODE} = "lite_train_infer" ];then
|
|||
rm -rf ./train_data/ic15_data
|
||||
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/icdar2015_lite.tar
|
||||
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ic15_data.tar # todo change to bcebos
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
|
||||
wget -nc -P ./deploy/slim/prune https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/sen.pickle
|
||||
|
||||
cd ./train_data/ && tar xf icdar2015_lite.tar && tar xf ic15_data.tar
|
||||
ln -s ./icdar2015_lite ./icdar2015
|
||||
cd ../
|
||||
cd ./inference && tar xf rec_inference.tar && cd ../
|
||||
elif [ ${MODE} = "whole_train_infer" ];then
|
||||
wget -nc -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
|
||||
rm -rf ./train_data/icdar2015
|
||||
|
@ -61,64 +64,99 @@ elif [ ${MODE} = "whole_infer" ];then
|
|||
cd ./train_data/ && tar xf icdar2015_infer.tar && tar xf ic15_data.tar
|
||||
ln -s ./icdar2015_infer ./icdar2015
|
||||
cd ../
|
||||
elif [ ${MODE} = "infer" ] || [ ${MODE} = "cpp_infer" ];then
|
||||
elif [ ${MODE} = "infer" ];then
|
||||
if [ ${model_name} = "ocr_det" ]; then
|
||||
eval_model_name="ch_ppocr_mobile_v2.0_det_infer"
|
||||
eval_model_name="ch_ppocr_mobile_v2.0_det_train"
|
||||
rm -rf ./train_data/icdar2015
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
|
||||
cd ./inference && tar xf ${eval_model_name}.tar && tar xf ch_det_data_50.tar && cd ../
|
||||
elif [ ${model_name} = "ocr_server_det" ]; then
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
|
||||
cd ./inference && tar xf ch_ppocr_server_v2.0_det_infer.tar && tar xf ch_det_data_50.tar && cd ../
|
||||
elif [ ${model_name} = "ocr_system" ]; then
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
|
||||
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_det_data_50.tar && cd ../
|
||||
else
|
||||
rm -rf ./train_data/ic15_data
|
||||
eval_model_name="ch_ppocr_mobile_v2.0_rec_infer"
|
||||
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ic15_data.tar
|
||||
wget -nc -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
|
||||
cd ./inference && tar xf ${eval_model_name}.tar && tar xf ic15_data.tar && cd ../
|
||||
cd ./inference && tar xf ${eval_model_name}.tar && tar xf rec_inference.tar && cd ../
|
||||
fi
|
||||
elif [ ${MODE} = "cpp_infer" ];then
|
||||
if [ ${model_name} = "ocr_det" ]; then
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_det_data_50.tar && cd ../
|
||||
elif [ ${model_name} = "ocr_rec" ]; then
|
||||
wget -nc -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
|
||||
cd ./inference && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf rec_inference.tar && cd ../
|
||||
elif [ ${model_name} = "ocr_system" ]; then
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
|
||||
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_det_data_50.tar && cd ../
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ ${MODE} = "serving_infer" ];then
|
||||
# prepare serving env
|
||||
python_name=$(func_parser_value "${lines[2]}")
|
||||
${python_name} -m pip install install paddle-serving-server-gpu==0.6.1.post101
|
||||
${python_name} -m pip install paddle_serving_client==0.6.1
|
||||
${python_name} -m pip install paddle-serving-app==0.6.1
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
|
||||
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && cd ../
|
||||
fi
|
||||
|
||||
if [ ${MODE} = "cpp_infer" ];then
|
||||
cd deploy/cpp_infer
|
||||
use_opencv=$(func_parser_value "${lines[52]}")
|
||||
if [ ${use_opencv} = "True" ]; then
|
||||
echo "################### build opencv ###################"
|
||||
rm -rf 3.4.7.tar.gz opencv-3.4.7/
|
||||
wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
|
||||
tar -xf 3.4.7.tar.gz
|
||||
if [ -d "opencv-3.4.7/opencv3/" ] && [ $(md5sum opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ];then
|
||||
echo "################### build opencv skipped ###################"
|
||||
else
|
||||
echo "################### build opencv ###################"
|
||||
rm -rf opencv-3.4.7.tar.gz opencv-3.4.7/
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz
|
||||
tar -xf opencv-3.4.7.tar.gz
|
||||
|
||||
cd opencv-3.4.7/
|
||||
install_path=$(pwd)/opencv-3.4.7/opencv3
|
||||
cd opencv-3.4.7/
|
||||
install_path=$(pwd)/opencv3
|
||||
|
||||
rm -rf build
|
||||
mkdir build
|
||||
cd build
|
||||
rm -rf build
|
||||
mkdir build
|
||||
cd build
|
||||
|
||||
cmake .. \
|
||||
-DCMAKE_INSTALL_PREFIX=${install_path} \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DBUILD_SHARED_LIBS=OFF \
|
||||
-DWITH_IPP=OFF \
|
||||
-DBUILD_IPP_IW=OFF \
|
||||
-DWITH_LAPACK=OFF \
|
||||
-DWITH_EIGEN=OFF \
|
||||
-DCMAKE_INSTALL_LIBDIR=lib64 \
|
||||
-DWITH_ZLIB=ON \
|
||||
-DBUILD_ZLIB=ON \
|
||||
-DWITH_JPEG=ON \
|
||||
-DBUILD_JPEG=ON \
|
||||
-DWITH_PNG=ON \
|
||||
-DBUILD_PNG=ON \
|
||||
-DWITH_TIFF=ON \
|
||||
-DBUILD_TIFF=ON
|
||||
cmake .. \
|
||||
-DCMAKE_INSTALL_PREFIX=${install_path} \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DBUILD_SHARED_LIBS=OFF \
|
||||
-DWITH_IPP=OFF \
|
||||
-DBUILD_IPP_IW=OFF \
|
||||
-DWITH_LAPACK=OFF \
|
||||
-DWITH_EIGEN=OFF \
|
||||
-DCMAKE_INSTALL_LIBDIR=lib64 \
|
||||
-DWITH_ZLIB=ON \
|
||||
-DBUILD_ZLIB=ON \
|
||||
-DWITH_JPEG=ON \
|
||||
-DBUILD_JPEG=ON \
|
||||
-DWITH_PNG=ON \
|
||||
-DBUILD_PNG=ON \
|
||||
-DWITH_TIFF=ON \
|
||||
-DBUILD_TIFF=ON
|
||||
|
||||
make -j
|
||||
make install
|
||||
cd ../
|
||||
echo "################### build opencv finished ###################"
|
||||
make -j
|
||||
make install
|
||||
cd ../
|
||||
echo "################### build opencv finished ###################"
|
||||
fi
|
||||
fi
|
||||
|
||||
|
||||
|
@ -149,4 +187,4 @@ if [ ${MODE} = "cpp_infer" ];then
|
|||
|
||||
make -j
|
||||
echo "################### build PaddleOCR demo finished ###################"
|
||||
fi
|
||||
fi
|
||||
|
|
|
@ -23,36 +23,46 @@ test.sh和params.txt文件配合使用,完成OCR轻量检测和识别模型从
|
|||
|
||||
```bash
|
||||
tests/
|
||||
├── ocr_det_params.txt # 测试OCR检测模型的参数配置文件
|
||||
├── ocr_rec_params.txt # 测试OCR识别模型的参数配置文件
|
||||
└── prepare.sh # 完成test.sh运行所需要的数据和模型下载
|
||||
└── test.sh # 根据
|
||||
├── ocr_det_params.txt # 测试OCR检测模型的参数配置文件
|
||||
├── ocr_rec_params.txt # 测试OCR识别模型的参数配置文件
|
||||
├── ocr_ppocr_mobile_params.txt # 测试OCR检测+识别模型串联的参数配置文件
|
||||
└── prepare.sh # 完成test.sh运行所需要的数据和模型下载
|
||||
└── test.sh # 测试主程序
|
||||
```
|
||||
|
||||
# 使用方法
|
||||
|
||||
test.sh包含四种运行模式,每种模式的运行数据不同,分别用于测试速度和精度,分别是:
|
||||
- 模式1 lite_train_infer,使用少量数据训练,用于快速验证训练到预测的走通流程,不验证精度和速度;
|
||||
```
|
||||
|
||||
- 模式1:lite_train_infer,使用少量数据训练,用于快速验证训练到预测的走通流程,不验证精度和速度;
|
||||
```shell
|
||||
bash test/prepare.sh ./tests/ocr_det_params.txt 'lite_train_infer'
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'lite_train_infer'
|
||||
```
|
||||
- 模式2 whole_infer,使用少量数据训练,一定量数据预测,用于验证训练后的模型执行预测,预测速度是否合理;
|
||||
```
|
||||
```
|
||||
|
||||
- 模式2:whole_infer,使用少量数据训练,一定量数据预测,用于验证训练后的模型执行预测,预测速度是否合理;
|
||||
```shell
|
||||
bash tests/prepare.sh ./tests/ocr_det_params.txt 'whole_infer'
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'whole_infer'
|
||||
```
|
||||
```
|
||||
|
||||
- 模式3 infer 不训练,全量数据预测,走通开源模型评估、动转静,检查inference model预测时间和精度;
|
||||
```
|
||||
- 模式3:infer 不训练,全量数据预测,走通开源模型评估、动转静,检查inference model预测时间和精度;
|
||||
```shell
|
||||
bash tests/prepare.sh ./tests/ocr_det_params.txt 'infer'
|
||||
用法1:
|
||||
# 用法1:
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'infer'
|
||||
用法2: 指定GPU卡预测,第三个传入参数为GPU卡号
|
||||
# 用法2: 指定GPU卡预测,第三个传入参数为GPU卡号
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'infer' '1'
|
||||
```
|
||||
```
|
||||
|
||||
模式4: whole_train_infer , CE: 全量数据训练,全量数据预测,验证模型训练精度,预测精度,预测速度
|
||||
```
|
||||
- 模式4:whole_train_infer , CE: 全量数据训练,全量数据预测,验证模型训练精度,预测精度,预测速度;
|
||||
```shell
|
||||
bash tests/prepare.sh ./tests/ocr_det_params.txt 'whole_train_infer'
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'whole_train_infer'
|
||||
```
|
||||
```
|
||||
|
||||
- 模式5:cpp_infer , CE: 验证inference model的c++预测是否走通;
|
||||
```shell
|
||||
bash tests/prepare.sh ./tests/ocr_det_params.txt 'cpp_infer'
|
||||
bash tests/test.sh ./tests/ocr_det_params.txt 'cpp_infer'
|
||||
```
|
||||
|
|
File diff suppressed because one or more lines are too long
128
tests/test.sh
128
tests/test.sh
|
@ -144,6 +144,32 @@ benchmark_key=$(func_parser_key "${lines[49]}")
|
|||
benchmark_value=$(func_parser_value "${lines[49]}")
|
||||
infer_key1=$(func_parser_key "${lines[50]}")
|
||||
infer_value1=$(func_parser_value "${lines[50]}")
|
||||
# parser serving
|
||||
trans_model_py=$(func_parser_value "${lines[67]}")
|
||||
infer_model_dir_key=$(func_parser_key "${lines[68]}")
|
||||
infer_model_dir_value=$(func_parser_value "${lines[68]}")
|
||||
model_filename_key=$(func_parser_key "${lines[69]}")
|
||||
model_filename_value=$(func_parser_value "${lines[69]}")
|
||||
params_filename_key=$(func_parser_key "${lines[70]}")
|
||||
params_filename_value=$(func_parser_value "${lines[70]}")
|
||||
serving_server_key=$(func_parser_key "${lines[71]}")
|
||||
serving_server_value=$(func_parser_value "${lines[71]}")
|
||||
serving_client_key=$(func_parser_key "${lines[72]}")
|
||||
serving_client_value=$(func_parser_value "${lines[72]}")
|
||||
serving_dir_value=$(func_parser_value "${lines[73]}")
|
||||
web_service_py=$(func_parser_value "${lines[74]}")
|
||||
web_use_gpu_key=$(func_parser_key "${lines[75]}")
|
||||
web_use_gpu_list=$(func_parser_value "${lines[75]}")
|
||||
web_use_mkldnn_key=$(func_parser_key "${lines[76]}")
|
||||
web_use_mkldnn_list=$(func_parser_value "${lines[76]}")
|
||||
web_cpu_threads_key=$(func_parser_key "${lines[77]}")
|
||||
web_cpu_threads_list=$(func_parser_value "${lines[77]}")
|
||||
web_use_trt_key=$(func_parser_key "${lines[78]}")
|
||||
web_use_trt_list=$(func_parser_value "${lines[78]}")
|
||||
web_precision_key=$(func_parser_key "${lines[79]}")
|
||||
web_precision_list=$(func_parser_value "${lines[79]}")
|
||||
pipeline_py=$(func_parser_value "${lines[80]}")
|
||||
|
||||
|
||||
if [ ${MODE} = "cpp_infer" ]; then
|
||||
# parser cpp inference model
|
||||
|
@ -166,7 +192,8 @@ if [ ${MODE} = "cpp_infer" ]; then
|
|||
cpp_infer_model_key=$(func_parser_key "${lines[62]}")
|
||||
cpp_image_dir_key=$(func_parser_key "${lines[63]}")
|
||||
cpp_infer_img_dir=$(func_parser_value "${lines[63]}")
|
||||
cpp_save_log_key=$(func_parser_key "${lines[64]}")
|
||||
cpp_infer_key1=$(func_parser_key "${lines[64]}")
|
||||
cpp_infer_value1=$(func_parser_value "${lines[64]}")
|
||||
cpp_benchmark_key=$(func_parser_key "${lines[65]}")
|
||||
cpp_benchmark_value=$(func_parser_value "${lines[65]}")
|
||||
fi
|
||||
|
@ -244,6 +271,81 @@ function func_inference(){
|
|||
fi
|
||||
done
|
||||
}
|
||||
function func_serving(){
|
||||
IFS='|'
|
||||
_python=$1
|
||||
_script=$2
|
||||
_model_dir=$3
|
||||
# pdserving
|
||||
set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
|
||||
set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
|
||||
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
|
||||
set_serving_server=$(func_set_params "${serving_server_key}" "${serving_server_value}")
|
||||
set_serving_client=$(func_set_params "${serving_client_key}" "${serving_client_value}")
|
||||
trans_model_cmd="${python} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
|
||||
eval $trans_model_cmd
|
||||
cd ${serving_dir_value}
|
||||
echo $PWD
|
||||
unset https_proxy
|
||||
unset http_proxy
|
||||
for use_gpu in ${web_use_gpu_list[*]}; do
|
||||
echo ${ues_gpu}
|
||||
if [ ${use_gpu} = "null" ]; then
|
||||
for use_mkldnn in ${web_use_mkldnn_list[*]}; do
|
||||
if [ ${use_mkldnn} = "False" ]; then
|
||||
continue
|
||||
fi
|
||||
for threads in ${web_cpu_threads_list[*]}; do
|
||||
_save_log_path="${_log_path}/server_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log"
|
||||
set_cpu_threads=$(func_set_params "${web_cpu_threads_key}" "${threads}")
|
||||
web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${web_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} &>${_save_log_path} &"
|
||||
eval $web_service_cmd
|
||||
sleep 2s
|
||||
pipeline_cmd="${python} ${pipeline_py}"
|
||||
eval $pipeline_cmd
|
||||
last_status=${PIPESTATUS[0]}
|
||||
eval "cat ${_save_log_path}"
|
||||
status_check $last_status "${pipeline_cmd}" "${status_log}"
|
||||
PID=$!
|
||||
kill $PID
|
||||
sleep 2s
|
||||
ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
|
||||
done
|
||||
done
|
||||
elif [ ${use_gpu} = "0" ]; then
|
||||
for use_trt in ${web_use_trt_list[*]}; do
|
||||
for precision in ${web_precision_list[*]}; do
|
||||
if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
|
||||
continue
|
||||
fi
|
||||
if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
|
||||
continue
|
||||
fi
|
||||
if [[ ${use_trt} = "Falg_quantse" || ${precision} =~ "int8" ]]; then
|
||||
continue
|
||||
fi
|
||||
_save_log_path="${_log_path}/infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_1.log"
|
||||
set_tensorrt=$(func_set_params "${web_use_trt_key}" "${use_trt}")
|
||||
set_precision=$(func_set_params "${web_precision_key}" "${precision}")
|
||||
web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} &>${_save_log_path} & "
|
||||
eval $web_service_cmd
|
||||
sleep 2s
|
||||
pipeline_cmd="${python} ${pipeline_py}"
|
||||
eval $pipeline_cmd
|
||||
last_status=${PIPESTATUS[0]}
|
||||
eval "cat ${_save_log_path}"
|
||||
status_check $last_status "${pipeline_cmd}" "${status_log}"
|
||||
PID=$!
|
||||
kill $PID
|
||||
sleep 2s
|
||||
ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
|
||||
done
|
||||
done
|
||||
else
|
||||
echo "Does not support hardware other than CPU and GPU Currently!"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
function func_cpp_inference(){
|
||||
IFS='|'
|
||||
|
@ -267,7 +369,8 @@ function func_cpp_inference(){
|
|||
set_batchsize=$(func_set_params "${cpp_batch_size_key}" "${batch_size}")
|
||||
set_cpu_threads=$(func_set_params "${cpp_cpu_threads_key}" "${threads}")
|
||||
set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}")
|
||||
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${cpp_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} > ${_save_log_path} 2>&1 "
|
||||
set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}")
|
||||
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${cpp_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
|
||||
eval $command
|
||||
last_status=${PIPESTATUS[0]}
|
||||
eval "cat ${_save_log_path}"
|
||||
|
@ -295,7 +398,8 @@ function func_cpp_inference(){
|
|||
set_tensorrt=$(func_set_params "${cpp_use_trt_key}" "${use_trt}")
|
||||
set_precision=$(func_set_params "${cpp_precision_key}" "${precision}")
|
||||
set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}")
|
||||
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} > ${_save_log_path} 2>&1 "
|
||||
set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}")
|
||||
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
|
||||
eval $command
|
||||
last_status=${PIPESTATUS[0]}
|
||||
eval "cat ${_save_log_path}"
|
||||
|
@ -332,9 +436,7 @@ if [ ${MODE} = "infer" ]; then
|
|||
export_cmd="${python} ${norm_export} ${set_export_weight} ${set_save_infer_key}"
|
||||
eval $export_cmd
|
||||
status_export=$?
|
||||
if [ ${status_export} = 0 ];then
|
||||
status_check $status_export "${export_cmd}" "${status_log}"
|
||||
fi
|
||||
status_check $status_export "${export_cmd}" "${status_log}"
|
||||
else
|
||||
save_infer_dir=${infer_model}
|
||||
fi
|
||||
|
@ -362,6 +464,20 @@ elif [ ${MODE} = "cpp_infer" ]; then
|
|||
func_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_infer_img_dir}" ${is_quant}
|
||||
Count=$(($Count + 1))
|
||||
done
|
||||
|
||||
elif [ ${MODE} = "serving_infer" ]; then
|
||||
GPUID=$3
|
||||
if [ ${#GPUID} -le 0 ];then
|
||||
env=" "
|
||||
else
|
||||
env="export CUDA_VISIBLE_DEVICES=${GPUID}"
|
||||
fi
|
||||
# set CUDA_VISIBLE_DEVICES
|
||||
eval $env
|
||||
export Count=0
|
||||
IFS="|"
|
||||
#run serving
|
||||
func_serving "${web_service_cmd}"
|
||||
|
||||
else
|
||||
IFS="|"
|
||||
|
|
|
@ -55,6 +55,7 @@ def main():
|
|||
|
||||
model = build_model(config['Architecture'])
|
||||
use_srn = config['Architecture']['algorithm'] == "SRN"
|
||||
use_sar = config['Architecture']['algorithm'] == "SAR"
|
||||
if "model_type" in config['Architecture'].keys():
|
||||
model_type = config['Architecture']['model_type']
|
||||
else:
|
||||
|
@ -71,7 +72,7 @@ def main():
|
|||
|
||||
# start eval
|
||||
metric = program.eval(model, valid_dataloader, post_process_class,
|
||||
eval_class, model_type, use_srn)
|
||||
eval_class, model_type, use_srn, use_sar)
|
||||
logger.info('metric eval ***************')
|
||||
for k, v in metric.items():
|
||||
logger.info('{}:{}'.format(k, v))
|
||||
|
|
|
@ -141,6 +141,7 @@ if __name__ == "__main__":
|
|||
img, flag = check_and_read_gif(image_file)
|
||||
if not flag:
|
||||
img = cv2.imread(image_file)
|
||||
img = img[:, :, ::-1]
|
||||
if img is None:
|
||||
logger.info("error in loading image:{}".format(image_file))
|
||||
continue
|
||||
|
|
|
@ -173,6 +173,9 @@ def main(args):
|
|||
|
||||
logger.info("The predict total time is {}".format(time.time() - _st))
|
||||
logger.info("\nThe predict total time is {}".format(total_time))
|
||||
if args.benchmark:
|
||||
text_sys.text_detector.autolog.report()
|
||||
text_sys.text_recognizer.autolog.report()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -236,11 +236,11 @@ def create_predictor(args, mode, logger):
|
|||
max_input_shape.update(max_pact_shape)
|
||||
opt_input_shape.update(opt_pact_shape)
|
||||
elif mode == "rec":
|
||||
min_input_shape = {"x": [args.rec_batch_num, 3, 32, 10]}
|
||||
min_input_shape = {"x": [1, 3, 32, 10]}
|
||||
max_input_shape = {"x": [args.rec_batch_num, 3, 32, 2000]}
|
||||
opt_input_shape = {"x": [args.rec_batch_num, 3, 32, 320]}
|
||||
elif mode == "cls":
|
||||
min_input_shape = {"x": [args.rec_batch_num, 3, 48, 10]}
|
||||
min_input_shape = {"x": [1, 3, 48, 10]}
|
||||
max_input_shape = {"x": [args.rec_batch_num, 3, 48, 2000]}
|
||||
opt_input_shape = {"x": [args.rec_batch_num, 3, 48, 320]}
|
||||
else:
|
||||
|
|
|
@ -74,6 +74,10 @@ def main():
|
|||
'image', 'encoder_word_pos', 'gsrm_word_pos',
|
||||
'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'
|
||||
]
|
||||
elif config['Architecture']['algorithm'] == "SAR":
|
||||
op[op_name]['keep_keys'] = [
|
||||
'image', 'valid_ratio'
|
||||
]
|
||||
else:
|
||||
op[op_name]['keep_keys'] = ['image']
|
||||
transforms.append(op)
|
||||
|
@ -106,11 +110,16 @@ def main():
|
|||
paddle.to_tensor(gsrm_slf_attn_bias1_list),
|
||||
paddle.to_tensor(gsrm_slf_attn_bias2_list)
|
||||
]
|
||||
if config['Architecture']['algorithm'] == "SAR":
|
||||
valid_ratio = np.expand_dims(batch[-1], axis=0)
|
||||
img_metas = [paddle.to_tensor(valid_ratio)]
|
||||
|
||||
images = np.expand_dims(batch[0], axis=0)
|
||||
images = paddle.to_tensor(images)
|
||||
if config['Architecture']['algorithm'] == "SRN":
|
||||
preds = model(images, others)
|
||||
elif config['Architecture']['algorithm'] == "SAR":
|
||||
preds = model(images, img_metas)
|
||||
else:
|
||||
preds = model(images)
|
||||
post_result = post_process_class(preds)
|
||||
|
|
|
@ -187,7 +187,7 @@ def train(config,
|
|||
|
||||
use_srn = config['Architecture']['algorithm'] == "SRN"
|
||||
use_nrtr = config['Architecture']['algorithm'] == "NRTR"
|
||||
|
||||
use_sar = config['Architecture']['algorithm'] == 'SAR'
|
||||
try:
|
||||
model_type = config['Architecture']['model_type']
|
||||
except:
|
||||
|
@ -215,7 +215,7 @@ def train(config,
|
|||
images = batch[0]
|
||||
if use_srn:
|
||||
model_average = True
|
||||
if use_srn or model_type == 'table' or use_nrtr:
|
||||
if use_srn or model_type == 'table' or use_nrtr or use_sar:
|
||||
preds = model(images, data=batch[1:])
|
||||
else:
|
||||
preds = model(images)
|
||||
|
@ -279,7 +279,8 @@ def train(config,
|
|||
post_process_class,
|
||||
eval_class,
|
||||
model_type,
|
||||
use_srn=use_srn)
|
||||
use_srn=use_srn,
|
||||
use_sar=use_sar)
|
||||
cur_metric_str = 'cur metric, {}'.format(', '.join(
|
||||
['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
|
||||
logger.info(cur_metric_str)
|
||||
|
@ -351,7 +352,8 @@ def eval(model,
|
|||
post_process_class,
|
||||
eval_class,
|
||||
model_type,
|
||||
use_srn=False):
|
||||
use_srn=False,
|
||||
use_sar=False):
|
||||
model.eval()
|
||||
with paddle.no_grad():
|
||||
total_frame = 0.0
|
||||
|
@ -364,7 +366,7 @@ def eval(model,
|
|||
break
|
||||
images = batch[0]
|
||||
start = time.time()
|
||||
if use_srn or model_type == 'table':
|
||||
if use_srn or model_type == 'table' or use_sar:
|
||||
preds = model(images, data=batch[1:])
|
||||
else:
|
||||
preds = model(images)
|
||||
|
@ -400,7 +402,7 @@ def preprocess(is_train=False):
|
|||
alg = config['Architecture']['algorithm']
|
||||
assert alg in [
|
||||
'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN',
|
||||
'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn'
|
||||
'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn', 'SAR'
|
||||
]
|
||||
|
||||
device = 'gpu:{}'.format(dist.ParallelEnv().dev_id) if use_gpu else 'cpu'
|
||||
|
|
Loading…
Reference in New Issue