269
README.md
|
@ -1,209 +1,138 @@
|
|||
[English](README_en.md) | 简体中文
|
||||
English | [简体中文](README_ch.md)
|
||||
|
||||
## 简介
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
## Introduction
|
||||
PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
**近期更新**
|
||||
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md)
|
||||
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
|
||||
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23 发布7月21日B站直播课回放和PPT,课节1,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统
|
||||
- [more](./doc/doc_ch/update.md)
|
||||
**Recent updates**
|
||||
|
||||
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
|
||||
- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## 特性
|
||||
- 超轻量级中文OCR模型,总模型仅8.6M
|
||||
- 单模型支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 检测模型DB(4.1M)+识别模型CRNN(4.5M)
|
||||
- 实用通用中文OCR模型
|
||||
- 多种预测推理部署方案,包括服务部署和端侧部署
|
||||
- 多种文本检测训练算法,EAST、DB、SAST
|
||||
- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE、SRN
|
||||
- 可运行于Linux、Windows、MacOS等多种系统
|
||||
## Features
|
||||
- PPOCR series of high-quality pre-trained models, comparable to commercial effects
|
||||
- Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M
|
||||
- General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M
|
||||
- Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Support user-defined training, provides rich predictive inference deployment solutions
|
||||
- Support PIP installation, easy to use
|
||||
- Support Linux, Windows, MacOS and other systems
|
||||
|
||||
## 快速体验
|
||||
## Visualization
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/11.jpg" width="800">
|
||||
<img src="doc/imgs_results/1101.jpg" width="800">
|
||||
<img src="doc/imgs_results/1103.jpg" width="800">
|
||||
</div>
|
||||
|
||||
上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
The above picture is the effect display of the general ppocr_server model. For more effect pictures, please see [More visualization](./doc/doc_en/visualization_en.md).
|
||||
|
||||
- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
|
||||
- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统):[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
|
||||
## Quick Experience
|
||||
|
||||
Android手机也可以扫描下面二维码安装体验。
|
||||
You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
|
||||
|
||||
Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
|
||||
|
||||
Also, you can scan the QR code below to install the App (**Android support only**)
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md)
|
||||
|
||||
## 中文OCR模型列表
|
||||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
|
||||
|-|-|-|-|-|
|
||||
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|
||||
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
|
||||
## PP-OCR 1.1 series model list(Update on Sep 17)
|
||||
|
||||
## 文档教程
|
||||
- [快速安装](./doc/doc_ch/installation.md)
|
||||
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
|
||||
- 算法介绍
|
||||
- [文本检测](#文本检测算法)
|
||||
- [文本识别](#文本识别算法)
|
||||
- 模型训练/评估
|
||||
- [文本检测](./doc/doc_ch/detection.md)
|
||||
- [文本识别](./doc/doc_ch/recognition.md)
|
||||
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
|
||||
- [中文OCR训练预测技巧](./doc/doc_ch/tricks.md)
|
||||
- 预测部署
|
||||
- [基于Python预测引擎推理](./doc/doc_ch/inference.md)
|
||||
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
|
||||
- [服务化部署](./doc/doc_ch/serving.md)
|
||||
- [端侧部署](./deploy/lite/readme.md)
|
||||
- 模型量化压缩(coming soon)
|
||||
- [Benchmark](./doc/doc_ch/benchmark.md)
|
||||
- 数据集
|
||||
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
|
||||
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
|
||||
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
|
||||
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
|
||||
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
|
||||
- 效果展示
|
||||
- [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)
|
||||
- [通用中文OCR效果展示](#通用中文OCR效果展示)
|
||||
- [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示)
|
||||
- FAQ
|
||||
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md)
|
||||
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
|
||||
- [参考文献](./doc/doc_ch/reference.md)
|
||||
- [许可证书](#许可证书)
|
||||
- [贡献代码](#贡献代码)
|
||||
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model | |
|
||||
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---- |
|
||||
| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | |
|
||||
| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | |
|
||||
| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | |
|
||||
|
||||
<a name="算法介绍"></a>
|
||||
## 算法介绍
|
||||
<a name="文本检测算法"></a>
|
||||
### 1.文本检测算法
|
||||
|
||||
PaddleOCR开源的文本检测算法列表:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研)
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md)
|
||||
|
||||
|
||||
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下:
|
||||
## Tutorials
|
||||
- [Installation](./doc/doc_en/installation_en.md)
|
||||
- [Quick Start](./doc/doc_en/quickstart_en.md)
|
||||
- [Code Structure](./doc/doc_en/tree_en.md)
|
||||
- Algorithm introduction
|
||||
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [PP-OCR Pipline](#PP-OCR-Pipline)
|
||||
- Model training/evaluation
|
||||
- [Text Detection](./doc/doc_en/detection_en.md)
|
||||
- [Text Recognition](./doc/doc_en/recognition_en.md)
|
||||
- [Yml Configuration](./doc/doc_en/config_en.md)
|
||||
- Inference and Deployment
|
||||
- [Quick inference based on pip](./doc/doc_en/whl_en.md)
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving](./deploy/hubserving/readme_en.md)
|
||||
- [Mobile](./deploy/lite/readme_en.md)
|
||||
- [Model Quantization](./deploy/slim/quantization/README_en.md)
|
||||
- [Model Compression](./deploy/slim/prune/README_en.md)
|
||||
- [Benchmark](./doc/doc_en/benchmark_en.md)
|
||||
- Datasets
|
||||
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
|
||||
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
|
||||
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
|
||||
- [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
|
||||
- [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
|
||||
- [Visualization](#Visualization)
|
||||
- [FAQ](./doc/doc_en/FAQ_en.md)
|
||||
- [Community](#Community)
|
||||
- [References](./doc/doc_en/reference_en.md)
|
||||
- [License](#LICENSE)
|
||||
- [Contribution](#CONTRIBUTION)
|
||||
|
||||
|模型|骨干网络|配置文件|预训练模型|
|
||||
|-|-|-|-|
|
||||
|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|
||||
|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
|
||||
<a name="PP-OCR-Pipline"></a>
|
||||
|
||||
* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
|
||||
|
||||
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。
|
||||
|
||||
<a name="文本识别算法"></a>
|
||||
### 2.文本识别算法
|
||||
|
||||
PaddleOCR开源的文本识别算法列表:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研)
|
||||
|
||||
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||
|
||||
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。
|
||||
原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。
|
||||
|
||||
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下:
|
||||
|
||||
|模型|骨干网络|配置文件|预训练模型|
|
||||
|-|-|-|-|
|
||||
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|
||||
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
|
||||
|
||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。
|
||||
|
||||
## 效果展示
|
||||
|
||||
<a name="超轻量级中文OCR效果展示"></a>
|
||||
### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
## PP-OCR Pipline
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1.jpg" width="800">
|
||||
<img src="./doc/ppocr_framework.png" width="800">
|
||||
</div>
|
||||
|
||||
<a name="通用中文OCR效果展示"></a>
|
||||
### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (Arxiv article link is being generated).
|
||||
|
||||
## Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/chinese_db_crnn_server/11.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1102.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1104.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1106.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1105.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1110.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="支持空格的中文OCR效果展示"></a>
|
||||
### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
|
||||
<a name="Community"></a>
|
||||
## Community
|
||||
Scan the QR code below with your Wechat and completing the questionnaire, you can access to offical technical exchange group.
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
<a name="许可证书"></a>
|
||||
## 许可证书
|
||||
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
|
||||
<a name="LICENSE"></a>
|
||||
## License
|
||||
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
|
||||
|
||||
<a name="贡献代码"></a>
|
||||
## 贡献代码
|
||||
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。
|
||||
<a name="CONTRIBUTION"></a>
|
||||
## Contribution
|
||||
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
|
||||
|
||||
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
|
||||
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题
|
||||
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
|
||||
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
|
||||
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
|
||||
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
|
||||
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。
|
||||
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation.
|
||||
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
|
||||
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
|
||||
- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets.
|
||||
- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
|
||||
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
|
||||
- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
|
||||
|
|
|
@ -0,0 +1,140 @@
|
|||
[English](README.md) | 简体中文
|
||||
|
||||
## 简介
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
|
||||
**近期更新**
|
||||
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
|
||||
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载)
|
||||
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md)
|
||||
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
|
||||
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- [More](./doc/doc_ch/update.md)
|
||||
|
||||
|
||||
## 特性
|
||||
|
||||
- PPOCR系列高质量预训练模型,媲美商业效果
|
||||
- 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M
|
||||
- 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M
|
||||
- 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M
|
||||
- 支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 支持多语言识别:韩语、日语、德语、法语
|
||||
- 支持用户自定义训练,提供丰富的预测推理部署方案
|
||||
- 支持PIP快速安装使用
|
||||
- 可运行于Linux、Windows、MacOS等多种系统
|
||||
|
||||
## 效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1101.jpg" width="800">
|
||||
<img src="doc/imgs_results/1103.jpg" width="800">
|
||||
</div>
|
||||
|
||||
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
|
||||
## 快速体验
|
||||
- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
|
||||
|
||||
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统),Android手机也可以直接扫描下面二维码安装体验。
|
||||
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始
|
||||
|
||||
<a name="模型下载"></a>
|
||||
## PP-OCR 1.1系列模型列表(9月17日更新)
|
||||
|
||||
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | |
|
||||
| ------------ | --------------- | ----------------|---- | ---------- | -------- | ---- |
|
||||
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | |
|
||||
| 中英文通用OCR模型(155.1M) |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | |
|
||||
| 中英文超轻量压缩OCR模型(3.5M) | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)| | ||
|
||||
|
||||
更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md)
|
||||
|
||||
## 文档教程
|
||||
- [快速安装](./doc/doc_ch/installation.md)
|
||||
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
|
||||
- [代码组织结构](./doc/doc_ch/tree.md)
|
||||
- 算法介绍
|
||||
- [文本检测](./doc/doc_ch/algorithm_overview.md)
|
||||
- [文本识别](./doc/doc_ch/algorithm_overview.md)
|
||||
- [PP-OCR Pipline](#PP-OCR)
|
||||
- 模型训练/评估
|
||||
- [文本检测](./doc/doc_ch/detection.md)
|
||||
- [文本识别](./doc/doc_ch/recognition.md)
|
||||
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
|
||||
- 预测部署
|
||||
- [基于pip安装whl包快速推理](./doc/doc_ch/whl.md)
|
||||
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
|
||||
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
|
||||
- [服务化部署](./deploy/hubserving/readme.md)
|
||||
- [端侧部署](./deploy/lite/readme.md)
|
||||
- [模型量化](./deploy/slim/quantization/README.md)
|
||||
- [模型裁剪](./deploy/slim/prune/README_ch.md)
|
||||
- [Benchmark](./doc/doc_ch/benchmark.md)
|
||||
- 数据集
|
||||
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
|
||||
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
|
||||
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
|
||||
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
|
||||
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
|
||||
- [效果展示](#效果展示)
|
||||
- FAQ
|
||||
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md)
|
||||
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
|
||||
- [参考文献](./doc/doc_ch/reference.md)
|
||||
- [许可证书](#许可证书)
|
||||
- [贡献代码](#贡献代码)
|
||||
|
||||
<a name="PP-OCR"></a>
|
||||
## PP-OCR Pipline
|
||||
<div align="center">
|
||||
<img src="./doc/ppocr_framework.png" width="800">
|
||||
</div>
|
||||
|
||||
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术文章(Arxiv文章链接生成中)。
|
||||
|
||||
|
||||
<a name="效果展示"></a>
|
||||
## 效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1102.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1104.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1106.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1105.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1110.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
<a name="许可证书"></a>
|
||||
## 许可证书
|
||||
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
|
||||
|
||||
<a name="贡献代码"></a>
|
||||
## 贡献代码
|
||||
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。
|
||||
|
||||
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
|
||||
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题
|
||||
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
|
||||
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
|
||||
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
|
||||
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
|
||||
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。
|
226
README_en.md
|
@ -1,226 +0,0 @@
|
|||
English | [简体中文](README.md)
|
||||
|
||||
## Introduction
|
||||
PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
**Recent updates**
|
||||
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
|
||||
- 2020.8.16, Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
|
||||
- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addition, the benchmarks of the ultra-lightweight OCR model are provided.
|
||||
- 2020.7.15, Add several related datasets, data annotation and synthesis tools.
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## Features
|
||||
- Ultra-lightweight OCR model, total model size is only 8.6M
|
||||
- Single model supports Chinese/English numbers combination recognition, vertical text recognition, long text recognition
|
||||
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
|
||||
- Various text detection algorithms: EAST, DB
|
||||
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
|
||||
- Support Linux, Windows, macOS and other systems.
|
||||
|
||||
## Visualization
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
[More visualization](./doc/doc_en/visualization_en.md)
|
||||
|
||||
You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
|
||||
|
||||
Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
|
||||
|
||||
Also, you can scan the QR code below to install the App (**Android support only**)
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md)
|
||||
|
||||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
### Supported Models:
|
||||
|
||||
|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link|
|
||||
|-|-|-|-|-|
|
||||
|db_crnn_mobile|ultra-lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|
||||
|db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
|
||||
|
||||
|
||||
## Tutorials
|
||||
- [Installation](./doc/doc_en/installation_en.md)
|
||||
- [Quick Start](./doc/doc_en/quickstart_en.md)
|
||||
- Algorithm introduction
|
||||
- [Text Detection Algorithm](#TEXTDETECTIONALGORITHM)
|
||||
- [Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM)
|
||||
- Model training/evaluation
|
||||
- [Text Detection](./doc/doc_en/detection_en.md)
|
||||
- [Text Recognition](./doc/doc_en/recognition_en.md)
|
||||
- [Yml Configuration](./doc/doc_en/config_en.md)
|
||||
- [Tricks](./doc/doc_en/tricks_en.md)
|
||||
- Deployment
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving](./doc/doc_en/serving_en.md)
|
||||
- [Mobile](./deploy/lite/readme_en.md)
|
||||
- Model Quantization and Compression (coming soon)
|
||||
- [Benchmark](./doc/doc_en/benchmark_en.md)
|
||||
- Datasets
|
||||
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
|
||||
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
|
||||
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
|
||||
- [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
|
||||
- [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
|
||||
- [FAQ](#FAQ)
|
||||
- Visualization
|
||||
- [Ultra-lightweight Chinese/English OCR Visualization](#UCOCRVIS)
|
||||
- [General Chinese/English OCR Visualization](#GeOCRVIS)
|
||||
- [Chinese/English OCR Visualization (Support Space Recognition )](#SpaceOCRVIS)
|
||||
- [Community](#Community)
|
||||
- [References](./doc/doc_en/reference_en.md)
|
||||
- [License](#LICENSE)
|
||||
- [Contribution](#CONTRIBUTION)
|
||||
|
||||
<a name="TEXTDETECTIONALGORITHM"></a>
|
||||
## Text Detection Algorithm
|
||||
|
||||
PaddleOCR open source text detection algorithms list:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
||||
|
||||
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows:
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|-|-|-|-|
|
||||
|ultra-lightweight OCR model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|
||||
|General OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
|
||||
|
||||
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
|
||||
|
||||
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
|
||||
|
||||
<a name="TEXTRECOGNITIONALGORITHM"></a>
|
||||
## Text Recognition Algorithm
|
||||
|
||||
PaddleOCR open-source text recognition algorithms list:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research)
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry).
|
||||
|
||||
The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).
|
||||
|
||||
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w training data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows:
|
||||
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|-|-|-|-|
|
||||
|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|
||||
|General OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
|
||||
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
||||
|
||||
## Visualization
|
||||
|
||||
<a name="UCOCRVIS"></a>
|
||||
### 1.Ultra-lightweight Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="GeOCRVIS"></a>
|
||||
### 2. General Chinese/English OCR Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/chinese_db_crnn_server/11.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="SpaceOCRVIS"></a>
|
||||
### 3.Chinese/English OCR Visualization (Space_support) [more](./doc/doc_en/visualization_en.md)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="FAQ"></a>
|
||||
|
||||
## FAQ
|
||||
1. Error when using attention-based recognition model: KeyError: 'predict'
|
||||
|
||||
The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss.
|
||||
|
||||
2. About inference speed
|
||||
|
||||
When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values.
|
||||
|
||||
3. Service deployment and mobile deployment
|
||||
|
||||
It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates.
|
||||
|
||||
4. Release time of self-developed algorithm
|
||||
|
||||
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
|
||||
|
||||
[more](./doc/doc_en/FAQ_en.md)
|
||||
|
||||
<a name="Community"></a>
|
||||
## Community
|
||||
Scan the QR code below with your wechat and completing the questionnaire, you can access to offical technical exchange group.
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
<a name="LICENSE"></a>
|
||||
## License
|
||||
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
|
||||
|
||||
<a name="CONTRIBUTION"></a>
|
||||
## Contribution
|
||||
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
|
||||
|
||||
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation.
|
||||
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
|
||||
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
|
||||
- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets.
|
||||
- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
|
||||
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
|
||||
- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
|
|
@ -0,0 +1,44 @@
|
|||
Global:
|
||||
algorithm: CLS
|
||||
use_gpu: False
|
||||
epoch_num: 100
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 100
|
||||
save_model_dir: output/cls_mv3
|
||||
save_epoch_step: 3
|
||||
eval_batch_step: 500
|
||||
train_batch_size_per_card: 512
|
||||
test_batch_size_per_card: 512
|
||||
image_shape: [3, 48, 192]
|
||||
label_list: ['0','180']
|
||||
distort: True
|
||||
reader_yml: ./configs/cls/cls_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.cls_model,ClsModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.35
|
||||
model_name: small
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.cls_head,ClsHead
|
||||
class_dim: 2
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.cls_loss,ClsLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay
|
||||
step_each_epoch: 1169
|
||||
total_epoch: 100
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data/cls
|
||||
label_file_path: ./train_data/cls/train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data/cls
|
||||
label_file_path: ./train_data/cls/test.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
|
|
@ -24,6 +24,7 @@ Backbone:
|
|||
function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: large
|
||||
disable_se: true
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.det_db_head,DBHead
|
||||
|
@ -49,6 +50,6 @@ Optimizer:
|
|||
PostProcess:
|
||||
function: ppocr.postprocess.db_postprocess,DBPostProcess
|
||||
thresh: 0.3
|
||||
box_thresh: 0.7
|
||||
box_thresh: 0.6
|
||||
max_candidates: 1000
|
||||
unclip_ratio: 2.0
|
||||
unclip_ratio: 1.5
|
||||
|
|
|
@ -0,0 +1,53 @@
|
|||
Global:
|
||||
algorithm: CRNN
|
||||
use_gpu: true
|
||||
epoch_num: 500
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/en_number
|
||||
save_epoch_step: 3
|
||||
eval_batch_step: 2000
|
||||
train_batch_size_per_card: 256
|
||||
test_batch_size_per_card: 256
|
||||
image_shape: [3, 32, 320]
|
||||
max_text_length: 30
|
||||
character_type: ch
|
||||
character_dict_path: ./ppocr/utils/ic15_dict.txt
|
||||
loss_type: ctc
|
||||
distort: false
|
||||
use_space_char: false
|
||||
reader_yml: ./configs/rec/multi_languages/rec_en_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: small
|
||||
small_stride: [1, 2, 2, 2]
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
|
||||
encoder_type: rnn
|
||||
SeqRNN:
|
||||
hidden_size: 48
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
l2_decay: 0.00001
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay_warmup
|
||||
warmup_minibatch: 1000
|
||||
step_each_epoch: 6530
|
||||
total_epoch: 500
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/en_train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/en_eval.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
|
@ -0,0 +1,52 @@
|
|||
Global:
|
||||
algorithm: CRNN
|
||||
use_gpu: true
|
||||
epoch_num: 500
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/rec_french
|
||||
save_epoch_step: 1
|
||||
eval_batch_step: 2000
|
||||
train_batch_size_per_card: 256
|
||||
test_batch_size_per_card: 256
|
||||
image_shape: [3, 32, 320]
|
||||
max_text_length: 25
|
||||
character_type: french
|
||||
character_dict_path: ./ppocr/utils/french_dict.txt
|
||||
loss_type: ctc
|
||||
distort: true
|
||||
use_space_char: false
|
||||
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: small
|
||||
small_stride: [1, 2, 2, 2]
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
|
||||
encoder_type: rnn
|
||||
SeqRNN:
|
||||
hidden_size: 48
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
l2_decay: 0.00001
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay
|
||||
step_each_epoch: 254
|
||||
total_epoch: 500
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/french_train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/french_eval.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
|
@ -0,0 +1,52 @@
|
|||
Global:
|
||||
algorithm: CRNN
|
||||
use_gpu: true
|
||||
epoch_num: 500
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/rec_german
|
||||
save_epoch_step: 1
|
||||
eval_batch_step: 2000
|
||||
train_batch_size_per_card: 256
|
||||
test_batch_size_per_card: 256
|
||||
image_shape: [3, 32, 320]
|
||||
max_text_length: 25
|
||||
character_type: german
|
||||
character_dict_path: ./ppocr/utils/german_dict.txt
|
||||
loss_type: ctc
|
||||
distort: true
|
||||
use_space_char: false
|
||||
reader_yml: ./configs/rec/multi_languages/rec_ger_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: small
|
||||
small_stride: [1, 2, 2, 2]
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
|
||||
encoder_type: rnn
|
||||
SeqRNN:
|
||||
hidden_size: 48
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
l2_decay: 0.00001
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay
|
||||
step_each_epoch: 254
|
||||
total_epoch: 500
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/de_train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/de_eval.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
|
@ -0,0 +1,52 @@
|
|||
Global:
|
||||
algorithm: CRNN
|
||||
use_gpu: true
|
||||
epoch_num: 500
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/rec_japan
|
||||
save_epoch_step: 1
|
||||
eval_batch_step: 2000
|
||||
train_batch_size_per_card: 256
|
||||
test_batch_size_per_card: 256
|
||||
image_shape: [3, 32, 320]
|
||||
max_text_length: 25
|
||||
character_type: japan
|
||||
character_dict_path: ./ppocr/utils/japan_dict.txt
|
||||
loss_type: ctc
|
||||
distort: true
|
||||
use_space_char: false
|
||||
reader_yml: ./configs/rec/multi_languages/rec_japan_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: small
|
||||
small_stride: [1, 2, 2, 2]
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
|
||||
encoder_type: rnn
|
||||
SeqRNN:
|
||||
hidden_size: 48
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
l2_decay: 0.00001
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay
|
||||
step_each_epoch: 254
|
||||
total_epoch: 500
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/japan_train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/japan_eval.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
|
@ -0,0 +1,52 @@
|
|||
Global:
|
||||
algorithm: CRNN
|
||||
use_gpu: true
|
||||
epoch_num: 500
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/rec_korean
|
||||
save_epoch_step: 1
|
||||
eval_batch_step: 2000
|
||||
train_batch_size_per_card: 256
|
||||
test_batch_size_per_card: 256
|
||||
image_shape: [3, 32, 320]
|
||||
max_text_length: 25
|
||||
character_type: korean
|
||||
character_dict_path: ./ppocr/utils/korean_dict.txt
|
||||
loss_type: ctc
|
||||
distort: true
|
||||
use_space_char: false
|
||||
reader_yml: ./configs/rec/multi_languages/rec_korean_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: small
|
||||
small_stride: [1, 2, 2, 2]
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
|
||||
encoder_type: rnn
|
||||
SeqRNN:
|
||||
hidden_size: 48
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
l2_decay: 0.00001
|
||||
base_lr: 0.001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
decay:
|
||||
function: cosine_decay
|
||||
step_each_epoch: 254
|
||||
total_epoch: 500
|
|
@ -0,0 +1,13 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/korean_train.txt
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
||||
img_set_dir: ./train_data
|
||||
label_file_path: ./train_data/korean_eval.txt
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
|
After Width: | Height: | Size: 198 KiB |
After Width: | Height: | Size: 171 KiB |
After Width: | Height: | Size: 61 KiB |
|
@ -4,29 +4,29 @@
|
|||
|
||||
#include "native.h"
|
||||
#include "ocr_ppredictor.h"
|
||||
#include <string>
|
||||
#include <algorithm>
|
||||
#include <paddle_api.h>
|
||||
#include <string>
|
||||
|
||||
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode);
|
||||
|
||||
extern "C"
|
||||
JNIEXPORT jlong JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(JNIEnv *env, jobject thiz,
|
||||
jstring j_det_model_path,
|
||||
jstring j_rec_model_path,
|
||||
jint j_thread_num,
|
||||
jstring j_cpu_mode) {
|
||||
std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path);
|
||||
std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path);
|
||||
int thread_num = j_thread_num;
|
||||
std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode);
|
||||
ppredictor::OCR_Config conf;
|
||||
conf.thread_num = thread_num;
|
||||
conf.mode = str_to_cpu_mode(cpu_mode);
|
||||
ppredictor::OCR_PPredictor *orc_predictor = new ppredictor::OCR_PPredictor{conf};
|
||||
orc_predictor->init_from_file(det_model_path, rec_model_path);
|
||||
return reinterpret_cast<jlong>(orc_predictor);
|
||||
extern "C" JNIEXPORT jlong JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(
|
||||
JNIEnv *env, jobject thiz, jstring j_det_model_path,
|
||||
jstring j_rec_model_path, jstring j_cls_model_path, jint j_thread_num,
|
||||
jstring j_cpu_mode) {
|
||||
std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path);
|
||||
std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path);
|
||||
std::string cls_model_path = jstring_to_cpp_string(env, j_cls_model_path);
|
||||
int thread_num = j_thread_num;
|
||||
std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode);
|
||||
ppredictor::OCR_Config conf;
|
||||
conf.thread_num = thread_num;
|
||||
conf.mode = str_to_cpu_mode(cpu_mode);
|
||||
ppredictor::OCR_PPredictor *orc_predictor =
|
||||
new ppredictor::OCR_PPredictor{conf};
|
||||
orc_predictor->init_from_file(det_model_path, rec_model_path, cls_model_path);
|
||||
return reinterpret_cast<jlong>(orc_predictor);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -34,82 +34,81 @@ Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(JNIEnv *env, jobject
|
|||
* @param cpu_mode
|
||||
* @return
|
||||
*/
|
||||
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode) {
|
||||
static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{
|
||||
{"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH},
|
||||
{"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH},
|
||||
{"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL},
|
||||
{"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND},
|
||||
{"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH},
|
||||
{"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}
|
||||
};
|
||||
std::string upper_key;
|
||||
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), ::toupper);
|
||||
auto index = cpu_mode_map.find(upper_key);
|
||||
if (index == cpu_mode_map.end()) {
|
||||
LOGE("cpu_mode not found %s", upper_key.c_str());
|
||||
return paddle::lite_api::LITE_POWER_HIGH;
|
||||
} else {
|
||||
return index->second;
|
||||
}
|
||||
|
||||
static paddle::lite_api::PowerMode
|
||||
str_to_cpu_mode(const std::string &cpu_mode) {
|
||||
static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{
|
||||
{"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH},
|
||||
{"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH},
|
||||
{"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL},
|
||||
{"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND},
|
||||
{"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH},
|
||||
{"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}};
|
||||
std::string upper_key;
|
||||
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(),
|
||||
::toupper);
|
||||
auto index = cpu_mode_map.find(upper_key);
|
||||
if (index == cpu_mode_map.end()) {
|
||||
LOGE("cpu_mode not found %s", upper_key.c_str());
|
||||
return paddle::lite_api::LITE_POWER_HIGH;
|
||||
} else {
|
||||
return index->second;
|
||||
}
|
||||
}
|
||||
|
||||
extern "C"
|
||||
JNIEXPORT jfloatArray JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(JNIEnv *env, jobject thiz,
|
||||
jlong java_pointer, jfloatArray buf,
|
||||
jfloatArray ddims,
|
||||
jobject original_image) {
|
||||
LOGI("begin to run native forward");
|
||||
if (java_pointer == 0) {
|
||||
LOGE("JAVA pointer is NULL");
|
||||
return cpp_array_to_jfloatarray(env, nullptr, 0);
|
||||
}
|
||||
cv::Mat origin = bitmap_to_cv_mat(env, original_image);
|
||||
if (origin.size == 0) {
|
||||
LOGE("origin bitmap cannot convert to CV Mat");
|
||||
return cpp_array_to_jfloatarray(env, nullptr, 0);
|
||||
}
|
||||
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer;
|
||||
std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims);
|
||||
std::vector<int64_t> dims_arr;
|
||||
dims_arr.resize(dims_float_arr.size());
|
||||
std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
|
||||
extern "C" JNIEXPORT jfloatArray JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(
|
||||
JNIEnv *env, jobject thiz, jlong java_pointer, jfloatArray buf,
|
||||
jfloatArray ddims, jobject original_image) {
|
||||
LOGI("begin to run native forward");
|
||||
if (java_pointer == 0) {
|
||||
LOGE("JAVA pointer is NULL");
|
||||
return cpp_array_to_jfloatarray(env, nullptr, 0);
|
||||
}
|
||||
cv::Mat origin = bitmap_to_cv_mat(env, original_image);
|
||||
if (origin.size == 0) {
|
||||
LOGE("origin bitmap cannot convert to CV Mat");
|
||||
return cpp_array_to_jfloatarray(env, nullptr, 0);
|
||||
}
|
||||
ppredictor::OCR_PPredictor *ppredictor =
|
||||
(ppredictor::OCR_PPredictor *)java_pointer;
|
||||
std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims);
|
||||
std::vector<int64_t> dims_arr;
|
||||
dims_arr.resize(dims_float_arr.size());
|
||||
std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
|
||||
|
||||
// 这里值有点大,就不调用jfloatarray_to_float_vector了
|
||||
int64_t buf_len = (int64_t) env->GetArrayLength(buf);
|
||||
jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE);
|
||||
float *data = (jfloat *) buf_data;
|
||||
std::vector<ppredictor::OCRPredictResult> results = ppredictor->infer_ocr(dims_arr, data,
|
||||
buf_len,
|
||||
NET_OCR, origin);
|
||||
LOGI("infer_ocr finished with boxes %ld", results.size());
|
||||
// 这里将std::vector<ppredictor::OCRPredictResult> 序列化成 float数组,传输到java层再反序列化
|
||||
std::vector<float> float_arr;
|
||||
for (const ppredictor::OCRPredictResult &r :results) {
|
||||
float_arr.push_back(r.points.size());
|
||||
float_arr.push_back(r.word_index.size());
|
||||
float_arr.push_back(r.score);
|
||||
for (const std::vector<int> &point : r.points) {
|
||||
float_arr.push_back(point.at(0));
|
||||
float_arr.push_back(point.at(1));
|
||||
}
|
||||
for (int index: r.word_index) {
|
||||
float_arr.push_back(index);
|
||||
}
|
||||
// 这里值有点大,就不调用jfloatarray_to_float_vector了
|
||||
int64_t buf_len = (int64_t)env->GetArrayLength(buf);
|
||||
jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE);
|
||||
float *data = (jfloat *)buf_data;
|
||||
std::vector<ppredictor::OCRPredictResult> results =
|
||||
ppredictor->infer_ocr(dims_arr, data, buf_len, NET_OCR, origin);
|
||||
LOGI("infer_ocr finished with boxes %ld", results.size());
|
||||
// 这里将std::vector<ppredictor::OCRPredictResult> 序列化成
|
||||
// float数组,传输到java层再反序列化
|
||||
std::vector<float> float_arr;
|
||||
for (const ppredictor::OCRPredictResult &r : results) {
|
||||
float_arr.push_back(r.points.size());
|
||||
float_arr.push_back(r.word_index.size());
|
||||
float_arr.push_back(r.score);
|
||||
for (const std::vector<int> &point : r.points) {
|
||||
float_arr.push_back(point.at(0));
|
||||
float_arr.push_back(point.at(1));
|
||||
}
|
||||
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size());
|
||||
for (int index : r.word_index) {
|
||||
float_arr.push_back(index);
|
||||
}
|
||||
}
|
||||
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size());
|
||||
}
|
||||
|
||||
extern "C"
|
||||
JNIEXPORT void JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(JNIEnv *env, jobject thiz,
|
||||
jlong java_pointer){
|
||||
if (java_pointer == 0) {
|
||||
LOGE("JAVA pointer is NULL");
|
||||
return;
|
||||
}
|
||||
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer;
|
||||
delete ppredictor;
|
||||
extern "C" JNIEXPORT void JNICALL
|
||||
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(
|
||||
JNIEnv *env, jobject thiz, jlong java_pointer) {
|
||||
if (java_pointer == 0) {
|
||||
LOGE("JAVA pointer is NULL");
|
||||
return;
|
||||
}
|
||||
ppredictor::OCR_PPredictor *ppredictor =
|
||||
(ppredictor::OCR_PPredictor *)java_pointer;
|
||||
delete ppredictor;
|
||||
}
|
|
@ -0,0 +1,46 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#include "ocr_cls_process.h"
|
||||
#include <cmath>
|
||||
#include <cstring>
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
#include <iostream>
|
||||
#include <vector>
|
||||
|
||||
const std::vector<int> CLS_IMAGE_SHAPE = {3, 32, 100};
|
||||
|
||||
cv::Mat cls_resize_img(const cv::Mat &img) {
|
||||
int imgC = CLS_IMAGE_SHAPE[0];
|
||||
int imgW = CLS_IMAGE_SHAPE[2];
|
||||
int imgH = CLS_IMAGE_SHAPE[1];
|
||||
|
||||
float ratio = float(img.cols) / float(img.rows);
|
||||
int resize_w = 0;
|
||||
if (ceilf(imgH * ratio) > imgW)
|
||||
resize_w = imgW;
|
||||
else
|
||||
resize_w = int(ceilf(imgH * ratio));
|
||||
|
||||
cv::Mat resize_img;
|
||||
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
|
||||
cv::INTER_CUBIC);
|
||||
|
||||
if (resize_w < imgW) {
|
||||
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, int(imgW - resize_w),
|
||||
cv::BORDER_CONSTANT, {0, 0, 0});
|
||||
}
|
||||
return resize_img;
|
||||
}
|
|
@ -0,0 +1,23 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "common.h"
|
||||
#include <opencv2/opencv.hpp>
|
||||
#include <vector>
|
||||
|
||||
extern const std::vector<int> CLS_IMAGE_SHAPE;
|
||||
|
||||
cv::Mat cls_resize_img(const cv::Mat &img);
|
|
@ -3,38 +3,48 @@
|
|||
//
|
||||
|
||||
#include "ocr_ppredictor.h"
|
||||
#include "preprocess.h"
|
||||
#include "common.h"
|
||||
#include "ocr_db_post_process.h"
|
||||
#include "ocr_cls_process.h"
|
||||
#include "ocr_crnn_process.h"
|
||||
#include "ocr_db_post_process.h"
|
||||
#include "preprocess.h"
|
||||
|
||||
namespace ppredictor {
|
||||
|
||||
OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {
|
||||
OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {}
|
||||
|
||||
int OCR_PPredictor::init(const std::string &det_model_content,
|
||||
const std::string &rec_model_content,
|
||||
const std::string &cls_model_content) {
|
||||
_det_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
|
||||
_det_predictor->init_nb(det_model_content);
|
||||
|
||||
_rec_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_rec_predictor->init_nb(rec_model_content);
|
||||
|
||||
_cls_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_cls_predictor->init_nb(cls_model_content);
|
||||
return RETURN_OK;
|
||||
}
|
||||
|
||||
int
|
||||
OCR_PPredictor::init(const std::string &det_model_content, const std::string &rec_model_content) {
|
||||
_det_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
|
||||
_det_predictor->init_nb(det_model_content);
|
||||
int OCR_PPredictor::init_from_file(const std::string &det_model_path,
|
||||
const std::string &rec_model_path,
|
||||
const std::string &cls_model_path) {
|
||||
_det_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
|
||||
_det_predictor->init_from_file(det_model_path);
|
||||
|
||||
_rec_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_rec_predictor->init_nb(rec_model_content);
|
||||
return RETURN_OK;
|
||||
}
|
||||
_rec_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_rec_predictor->init_from_file(rec_model_path);
|
||||
|
||||
int OCR_PPredictor::init_from_file(const std::string &det_model_path, const std::string &rec_model_path){
|
||||
_det_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
|
||||
_det_predictor->init_from_file(det_model_path);
|
||||
|
||||
_rec_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_rec_predictor->init_from_file(rec_model_path);
|
||||
return RETURN_OK;
|
||||
_cls_predictor = std::unique_ptr<PPredictor>(
|
||||
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
|
||||
_cls_predictor->init_from_file(cls_model_path);
|
||||
return RETURN_OK;
|
||||
}
|
||||
/**
|
||||
* for debug use, show result of First Step
|
||||
|
@ -42,145 +52,188 @@ int OCR_PPredictor::init_from_file(const std::string &det_model_path, const std:
|
|||
* @param boxes
|
||||
* @param srcimg
|
||||
*/
|
||||
static void visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes,
|
||||
const std::vector<std::vector<std::vector<int>>> &boxes,
|
||||
const cv::Mat &srcimg) {
|
||||
// visualization
|
||||
cv::Point rook_points[filter_boxes.size()][4];
|
||||
for (int n = 0; n < filter_boxes.size(); n++) {
|
||||
for (int m = 0; m < filter_boxes[0].size(); m++) {
|
||||
rook_points[n][m] = cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1]));
|
||||
}
|
||||
static void
|
||||
visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes,
|
||||
const std::vector<std::vector<std::vector<int>>> &boxes,
|
||||
const cv::Mat &srcimg) {
|
||||
// visualization
|
||||
cv::Point rook_points[filter_boxes.size()][4];
|
||||
for (int n = 0; n < filter_boxes.size(); n++) {
|
||||
for (int m = 0; m < filter_boxes[0].size(); m++) {
|
||||
rook_points[n][m] =
|
||||
cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1]));
|
||||
}
|
||||
}
|
||||
|
||||
cv::Mat img_vis;
|
||||
srcimg.copyTo(img_vis);
|
||||
for (int n = 0; n < boxes.size(); n++) {
|
||||
const cv::Point *ppt[1] = {rook_points[n]};
|
||||
int npt[] = {4};
|
||||
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
|
||||
}
|
||||
// 调试用,自行替换需要修改的路径
|
||||
cv::imwrite("/sdcard/1/vis.png", img_vis);
|
||||
cv::Mat img_vis;
|
||||
srcimg.copyTo(img_vis);
|
||||
for (int n = 0; n < boxes.size(); n++) {
|
||||
const cv::Point *ppt[1] = {rook_points[n]};
|
||||
int npt[] = {4};
|
||||
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
|
||||
}
|
||||
// 调试用,自行替换需要修改的路径
|
||||
cv::imwrite("/sdcard/1/vis.png", img_vis);
|
||||
}
|
||||
|
||||
std::vector<OCRPredictResult>
|
||||
OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len,
|
||||
int net_flag, cv::Mat &origin) {
|
||||
OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims,
|
||||
const float *input_data, int input_len, int net_flag,
|
||||
cv::Mat &origin) {
|
||||
PredictorInput input = _det_predictor->get_first_input();
|
||||
input.set_dims(dims);
|
||||
input.set_data(input_data, input_len);
|
||||
std::vector<PredictorOutput> results = _det_predictor->infer();
|
||||
PredictorOutput &res = results.at(0);
|
||||
std::vector<std::vector<std::vector<int>>> filtered_box = calc_filtered_boxes(
|
||||
res.get_float_data(), res.get_size(), (int)dims[2], (int)dims[3], origin);
|
||||
LOGI("Filter_box size %ld", filtered_box.size());
|
||||
return infer_rec(filtered_box, origin);
|
||||
}
|
||||
|
||||
PredictorInput input = _det_predictor->get_first_input();
|
||||
std::vector<OCRPredictResult> OCR_PPredictor::infer_rec(
|
||||
const std::vector<std::vector<std::vector<int>>> &boxes,
|
||||
const cv::Mat &origin_img) {
|
||||
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
std::vector<int64_t> dims = {1, 3, 0, 0};
|
||||
std::vector<OCRPredictResult> ocr_results;
|
||||
|
||||
PredictorInput input = _rec_predictor->get_first_input();
|
||||
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
|
||||
const std::vector<std::vector<int>> &box = *bp;
|
||||
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
|
||||
crop_img = infer_cls(crop_img);
|
||||
|
||||
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
|
||||
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
|
||||
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
|
||||
const float *dimg = reinterpret_cast<const float *>(input_image.data);
|
||||
int input_size = input_image.rows * input_image.cols;
|
||||
|
||||
dims[2] = input_image.rows;
|
||||
dims[3] = input_image.cols;
|
||||
input.set_dims(dims);
|
||||
input.set_data(input_data, input_len);
|
||||
std::vector<PredictorOutput> results = _det_predictor->infer();
|
||||
PredictorOutput &res = results.at(0);
|
||||
std::vector<std::vector<std::vector<int>>> filtered_box
|
||||
= calc_filtered_boxes(res.get_float_data(), res.get_size(), (int) dims[2], (int) dims[3],
|
||||
origin);
|
||||
LOGI("Filter_box size %ld", filtered_box.size());
|
||||
return infer_rec(filtered_box, origin);
|
||||
|
||||
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
|
||||
scale);
|
||||
|
||||
std::vector<PredictorOutput> results = _rec_predictor->infer();
|
||||
|
||||
OCRPredictResult res;
|
||||
res.word_index = postprocess_rec_word_index(results.at(0));
|
||||
if (res.word_index.empty()) {
|
||||
continue;
|
||||
}
|
||||
res.score = postprocess_rec_score(results.at(1));
|
||||
res.points = box;
|
||||
ocr_results.emplace_back(std::move(res));
|
||||
}
|
||||
LOGI("ocr_results finished %lu", ocr_results.size());
|
||||
return ocr_results;
|
||||
}
|
||||
|
||||
std::vector<OCRPredictResult>
|
||||
OCR_PPredictor::infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes,
|
||||
const cv::Mat &origin_img) {
|
||||
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
std::vector<int64_t> dims = {1, 3, 0, 0};
|
||||
std::vector<OCRPredictResult> ocr_results;
|
||||
cv::Mat OCR_PPredictor::infer_cls(const cv::Mat &img, float thresh) {
|
||||
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
std::vector<int64_t> dims = {1, 3, 0, 0};
|
||||
std::vector<OCRPredictResult> ocr_results;
|
||||
|
||||
PredictorInput input = _rec_predictor->get_first_input();
|
||||
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
|
||||
const std::vector<std::vector<int>> &box = *bp;
|
||||
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
|
||||
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
|
||||
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
|
||||
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
|
||||
const float *dimg = reinterpret_cast<const float *>(input_image.data);
|
||||
int input_size = input_image.rows * input_image.cols;
|
||||
PredictorInput input = _cls_predictor->get_first_input();
|
||||
|
||||
dims[2] = input_image.rows;
|
||||
dims[3] = input_image.cols;
|
||||
input.set_dims(dims);
|
||||
cv::Mat input_image = cls_resize_img(img);
|
||||
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
|
||||
const float *dimg = reinterpret_cast<const float *>(input_image.data);
|
||||
int input_size = input_image.rows * input_image.cols;
|
||||
|
||||
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, scale);
|
||||
dims[2] = input_image.rows;
|
||||
dims[3] = input_image.cols;
|
||||
input.set_dims(dims);
|
||||
|
||||
std::vector<PredictorOutput> results = _rec_predictor->infer();
|
||||
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
|
||||
scale);
|
||||
|
||||
OCRPredictResult res;
|
||||
res.word_index = postprocess_rec_word_index(results.at(0));
|
||||
if (res.word_index.empty()) {
|
||||
continue;
|
||||
}
|
||||
res.score = postprocess_rec_score(results.at(1));
|
||||
res.points = box;
|
||||
ocr_results.emplace_back(std::move(res));
|
||||
}
|
||||
LOGI("ocr_results finished %lu", ocr_results.size());
|
||||
return ocr_results;
|
||||
std::vector<PredictorOutput> results = _cls_predictor->infer();
|
||||
|
||||
const float *scores = results.at(0).get_float_data();
|
||||
const int *labels = results.at(1).get_int_data();
|
||||
for (int64_t i = 0; i < results.at(0).get_size(); i++) {
|
||||
LOGI("output scores [%f]", scores[i]);
|
||||
}
|
||||
for (int64_t i = 0; i < results.at(1).get_size(); i++) {
|
||||
LOGI("output label [%d]", labels[i]);
|
||||
}
|
||||
int label_idx = labels[0];
|
||||
float score = scores[label_idx];
|
||||
|
||||
cv::Mat srcimg;
|
||||
img.copyTo(srcimg);
|
||||
if (label_idx % 2 == 1 && score > thresh) {
|
||||
cv::rotate(srcimg, srcimg, 1);
|
||||
}
|
||||
return srcimg;
|
||||
}
|
||||
|
||||
std::vector<std::vector<std::vector<int>>>
|
||||
OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size, int output_height,
|
||||
int output_width, const cv::Mat &origin) {
|
||||
const double threshold = 0.3;
|
||||
const double maxvalue = 1;
|
||||
OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size,
|
||||
int output_height, int output_width,
|
||||
const cv::Mat &origin) {
|
||||
const double threshold = 0.3;
|
||||
const double maxvalue = 1;
|
||||
|
||||
cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F);
|
||||
memcpy(pred_map.data, pred, pred_size * sizeof(float));
|
||||
cv::Mat cbuf_map;
|
||||
pred_map.convertTo(cbuf_map, CV_8UC1);
|
||||
cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F);
|
||||
memcpy(pred_map.data, pred, pred_size * sizeof(float));
|
||||
cv::Mat cbuf_map;
|
||||
pred_map.convertTo(cbuf_map, CV_8UC1);
|
||||
|
||||
cv::Mat bit_map;
|
||||
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
|
||||
cv::Mat bit_map;
|
||||
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
|
||||
|
||||
std::vector<std::vector<std::vector<int>>> boxes = boxes_from_bitmap(pred_map, bit_map);
|
||||
float ratio_h = output_height * 1.0f / origin.rows;
|
||||
float ratio_w = output_width * 1.0f / origin.cols;
|
||||
std::vector<std::vector<std::vector<int>>> filter_boxes = filter_tag_det_res(boxes, ratio_h,
|
||||
ratio_w, origin);
|
||||
return filter_boxes;
|
||||
std::vector<std::vector<std::vector<int>>> boxes =
|
||||
boxes_from_bitmap(pred_map, bit_map);
|
||||
float ratio_h = output_height * 1.0f / origin.rows;
|
||||
float ratio_w = output_width * 1.0f / origin.cols;
|
||||
std::vector<std::vector<std::vector<int>>> filter_boxes =
|
||||
filter_tag_det_res(boxes, ratio_h, ratio_w, origin);
|
||||
return filter_boxes;
|
||||
}
|
||||
|
||||
std::vector<int> OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) {
|
||||
const int *rec_idx = res.get_int_data();
|
||||
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod();
|
||||
std::vector<int>
|
||||
OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) {
|
||||
const int *rec_idx = res.get_int_data();
|
||||
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod();
|
||||
|
||||
std::vector<int> pred_idx;
|
||||
for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) {
|
||||
pred_idx.emplace_back(rec_idx[n]);
|
||||
}
|
||||
return pred_idx;
|
||||
std::vector<int> pred_idx;
|
||||
for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) {
|
||||
pred_idx.emplace_back(rec_idx[n]);
|
||||
}
|
||||
return pred_idx;
|
||||
}
|
||||
|
||||
float OCR_PPredictor::postprocess_rec_score(const PredictorOutput &res) {
|
||||
const float *predict_batch = res.get_float_data();
|
||||
const std::vector<int64_t> predict_shape = res.get_shape();
|
||||
const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod();
|
||||
int blank = predict_shape[1];
|
||||
float score = 0.f;
|
||||
int count = 0;
|
||||
for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) {
|
||||
int argmax_idx = argmax(predict_batch + n * predict_shape[1],
|
||||
predict_batch + (n + 1) * predict_shape[1]);
|
||||
float max_value = predict_batch[n * predict_shape[1] + argmax_idx];
|
||||
if (blank - 1 - argmax_idx > 1e-5) {
|
||||
score += max_value;
|
||||
count += 1;
|
||||
}
|
||||
|
||||
const float *predict_batch = res.get_float_data();
|
||||
const std::vector<int64_t> predict_shape = res.get_shape();
|
||||
const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod();
|
||||
int blank = predict_shape[1];
|
||||
float score = 0.f;
|
||||
int count = 0;
|
||||
for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) {
|
||||
int argmax_idx = argmax(predict_batch + n * predict_shape[1],
|
||||
predict_batch + (n + 1) * predict_shape[1]);
|
||||
float max_value = predict_batch[n * predict_shape[1] + argmax_idx];
|
||||
if (blank - 1 - argmax_idx > 1e-5) {
|
||||
score += max_value;
|
||||
count += 1;
|
||||
}
|
||||
if (count == 0) {
|
||||
LOGE("calc score count 0");
|
||||
} else {
|
||||
score /= count;
|
||||
}
|
||||
LOGI("calc score: %f", score);
|
||||
return score;
|
||||
|
||||
}
|
||||
if (count == 0) {
|
||||
LOGE("calc score count 0");
|
||||
} else {
|
||||
score /= count;
|
||||
}
|
||||
LOGI("calc score: %f", score);
|
||||
return score;
|
||||
}
|
||||
|
||||
|
||||
NET_TYPE OCR_PPredictor::get_net_flag() const {
|
||||
return NET_OCR;
|
||||
}
|
||||
NET_TYPE OCR_PPredictor::get_net_flag() const { return NET_OCR; }
|
||||
}
|
|
@ -4,10 +4,10 @@
|
|||
|
||||
#pragma once
|
||||
|
||||
#include <string>
|
||||
#include "ppredictor.h"
|
||||
#include <opencv2/opencv.hpp>
|
||||
#include <paddle_api.h>
|
||||
#include "ppredictor.h"
|
||||
#include <string>
|
||||
|
||||
namespace ppredictor {
|
||||
|
||||
|
@ -15,17 +15,18 @@ namespace ppredictor {
|
|||
* Config
|
||||
*/
|
||||
struct OCR_Config {
|
||||
int thread_num = 4; // Thread num
|
||||
paddle::lite_api::PowerMode mode = paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode
|
||||
int thread_num = 4; // Thread num
|
||||
paddle::lite_api::PowerMode mode =
|
||||
paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode
|
||||
};
|
||||
|
||||
/**
|
||||
* PolyGone Result
|
||||
*/
|
||||
struct OCRPredictResult {
|
||||
std::vector<int> word_index;
|
||||
std::vector<std::vector<int>> points;
|
||||
float score;
|
||||
std::vector<int> word_index;
|
||||
std::vector<std::vector<int>> points;
|
||||
float score;
|
||||
};
|
||||
|
||||
/**
|
||||
|
@ -35,78 +36,87 @@ struct OCRPredictResult {
|
|||
*/
|
||||
class OCR_PPredictor : public PPredictor_Interface {
|
||||
public:
|
||||
OCR_PPredictor(const OCR_Config &config);
|
||||
OCR_PPredictor(const OCR_Config &config);
|
||||
|
||||
virtual ~OCR_PPredictor() {
|
||||
virtual ~OCR_PPredictor() {}
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* 初始化二个模型的Predictor
|
||||
* @param det_model_content
|
||||
* @param rec_model_content
|
||||
* @return
|
||||
*/
|
||||
int init(const std::string &det_model_content, const std::string &rec_model_content);
|
||||
int init_from_file(const std::string &det_model_path, const std::string &rec_model_path);
|
||||
/**
|
||||
* Return OCR result
|
||||
* @param dims
|
||||
* @param input_data
|
||||
* @param input_len
|
||||
* @param net_flag
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
virtual std::vector<OCRPredictResult>
|
||||
infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len,
|
||||
int net_flag, cv::Mat &origin);
|
||||
|
||||
|
||||
virtual NET_TYPE get_net_flag() const;
|
||||
/**
|
||||
* 初始化二个模型的Predictor
|
||||
* @param det_model_content
|
||||
* @param rec_model_content
|
||||
* @return
|
||||
*/
|
||||
int init(const std::string &det_model_content,
|
||||
const std::string &rec_model_content,
|
||||
const std::string &cls_model_content);
|
||||
int init_from_file(const std::string &det_model_path,
|
||||
const std::string &rec_model_path,
|
||||
const std::string &cls_model_path);
|
||||
/**
|
||||
* Return OCR result
|
||||
* @param dims
|
||||
* @param input_data
|
||||
* @param input_len
|
||||
* @param net_flag
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
virtual std::vector<OCRPredictResult>
|
||||
infer_ocr(const std::vector<int64_t> &dims, const float *input_data,
|
||||
int input_len, int net_flag, cv::Mat &origin);
|
||||
|
||||
virtual NET_TYPE get_net_flag() const;
|
||||
|
||||
private:
|
||||
/**
|
||||
* calcul Polygone from the result image of first model
|
||||
* @param pred
|
||||
* @param output_height
|
||||
* @param output_width
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
std::vector<std::vector<std::vector<int>>>
|
||||
calc_filtered_boxes(const float *pred, int pred_size, int output_height,
|
||||
int output_width, const cv::Mat &origin);
|
||||
|
||||
/**
|
||||
* calcul Polygone from the result image of first model
|
||||
* @param pred
|
||||
* @param output_height
|
||||
* @param output_width
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
std::vector<std::vector<std::vector<int>>>
|
||||
calc_filtered_boxes(const float *pred, int pred_size, int output_height, int output_width,
|
||||
const cv::Mat &origin);
|
||||
/**
|
||||
* infer for second model
|
||||
*
|
||||
* @param boxes
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
std::vector<OCRPredictResult>
|
||||
infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes,
|
||||
const cv::Mat &origin);
|
||||
|
||||
/**
|
||||
* infer for second model
|
||||
*
|
||||
* @param boxes
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
std::vector<OCRPredictResult>
|
||||
infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes, const cv::Mat &origin);
|
||||
/**
|
||||
* infer for cls model
|
||||
*
|
||||
* @param boxes
|
||||
* @param origin
|
||||
* @return
|
||||
*/
|
||||
cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.5);
|
||||
|
||||
/**
|
||||
* Postprocess or sencod model to extract text
|
||||
* @param res
|
||||
* @return
|
||||
*/
|
||||
std::vector<int> postprocess_rec_word_index(const PredictorOutput &res);
|
||||
/**
|
||||
* Postprocess or sencod model to extract text
|
||||
* @param res
|
||||
* @return
|
||||
*/
|
||||
std::vector<int> postprocess_rec_word_index(const PredictorOutput &res);
|
||||
|
||||
/**
|
||||
* calculate confidence of second model text result
|
||||
* @param res
|
||||
* @return
|
||||
*/
|
||||
float postprocess_rec_score(const PredictorOutput &res);
|
||||
|
||||
std::unique_ptr<PPredictor> _det_predictor;
|
||||
std::unique_ptr<PPredictor> _rec_predictor;
|
||||
OCR_Config _config;
|
||||
/**
|
||||
* calculate confidence of second model text result
|
||||
* @param res
|
||||
* @return
|
||||
*/
|
||||
float postprocess_rec_score(const PredictorOutput &res);
|
||||
|
||||
std::unique_ptr<PPredictor> _det_predictor;
|
||||
std::unique_ptr<PPredictor> _rec_predictor;
|
||||
std::unique_ptr<PPredictor> _cls_predictor;
|
||||
OCR_Config _config;
|
||||
};
|
||||
}
|
||||
|
|
|
@ -29,7 +29,7 @@ public class OCRPredictorNative {
|
|||
public OCRPredictorNative(Config config) {
|
||||
this.config = config;
|
||||
loadLibrary();
|
||||
nativePointer = init(config.detModelFilename, config.recModelFilename,
|
||||
nativePointer = init(config.detModelFilename, config.recModelFilename,config.clsModelFilename,
|
||||
config.cpuThreadNum, config.cpuPower);
|
||||
Log.i("OCRPredictorNative", "load success " + nativePointer);
|
||||
|
||||
|
@ -38,7 +38,7 @@ public class OCRPredictorNative {
|
|||
public void release() {
|
||||
if (nativePointer != 0) {
|
||||
nativePointer = 0;
|
||||
destory(nativePointer);
|
||||
// destory(nativePointer);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -55,10 +55,11 @@ public class OCRPredictorNative {
|
|||
public String cpuPower;
|
||||
public String detModelFilename;
|
||||
public String recModelFilename;
|
||||
public String clsModelFilename;
|
||||
|
||||
}
|
||||
|
||||
protected native long init(String detModelPath, String recModelPath, int threadNum, String cpuMode);
|
||||
protected native long init(String detModelPath, String recModelPath,String clsModelPath, int threadNum, String cpuMode);
|
||||
|
||||
protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage);
|
||||
|
||||
|
|
|
@ -121,7 +121,8 @@ public class Predictor {
|
|||
config.cpuThreadNum = cpuThreadNum;
|
||||
config.detModelFilename = realPath + File.separator + "ch_det_mv3_db_opt.nb";
|
||||
config.recModelFilename = realPath + File.separator + "ch_rec_mv3_crnn_opt.nb";
|
||||
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename);
|
||||
config.clsModelFilename = realPath + File.separator + "cls_opt_arm.nb";
|
||||
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename + ";" + config.clsModelFilename);
|
||||
config.cpuPower = cpuPowerMode;
|
||||
paddlePredictor = new OCRPredictorNative(config);
|
||||
|
||||
|
|
|
@ -57,6 +57,12 @@ public:
|
|||
|
||||
this->char_list_file.assign(config_map_["char_list_file"]);
|
||||
|
||||
this->use_angle_cls = bool(stoi(config_map_["use_angle_cls"]));
|
||||
|
||||
this->cls_model_dir.assign(config_map_["cls_model_dir"]);
|
||||
|
||||
this->cls_thresh = stod(config_map_["cls_thresh"]);
|
||||
|
||||
this->visualize = bool(stoi(config_map_["visualize"]));
|
||||
}
|
||||
|
||||
|
@ -84,8 +90,14 @@ public:
|
|||
|
||||
std::string rec_model_dir;
|
||||
|
||||
bool use_angle_cls;
|
||||
|
||||
std::string char_list_file;
|
||||
|
||||
std::string cls_model_dir;
|
||||
|
||||
double cls_thresh;
|
||||
|
||||
bool visualize = true;
|
||||
|
||||
void PrintConfigInfo();
|
||||
|
|
|
@ -0,0 +1,81 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#include "opencv2/core.hpp"
|
||||
#include "opencv2/imgcodecs.hpp"
|
||||
#include "opencv2/imgproc.hpp"
|
||||
#include "paddle_api.h"
|
||||
#include "paddle_inference_api.h"
|
||||
#include <chrono>
|
||||
#include <iomanip>
|
||||
#include <iostream>
|
||||
#include <ostream>
|
||||
#include <vector>
|
||||
|
||||
#include <cstring>
|
||||
#include <fstream>
|
||||
#include <numeric>
|
||||
|
||||
#include <include/preprocess_op.h>
|
||||
#include <include/utility.h>
|
||||
|
||||
namespace PaddleOCR {
|
||||
|
||||
class Classifier {
|
||||
public:
|
||||
explicit Classifier(const std::string &model_dir, const bool &use_gpu,
|
||||
const int &gpu_id, const int &gpu_mem,
|
||||
const int &cpu_math_library_num_threads,
|
||||
const bool &use_mkldnn, const bool &use_zero_copy_run,
|
||||
const double &cls_thresh) {
|
||||
this->use_gpu_ = use_gpu;
|
||||
this->gpu_id_ = gpu_id;
|
||||
this->gpu_mem_ = gpu_mem;
|
||||
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
|
||||
this->use_mkldnn_ = use_mkldnn;
|
||||
this->use_zero_copy_run_ = use_zero_copy_run;
|
||||
|
||||
this->cls_thresh = cls_thresh;
|
||||
|
||||
LoadModel(model_dir);
|
||||
}
|
||||
|
||||
// Load Paddle inference model
|
||||
void LoadModel(const std::string &model_dir);
|
||||
|
||||
cv::Mat Run(cv::Mat &img);
|
||||
|
||||
private:
|
||||
std::shared_ptr<PaddlePredictor> predictor_;
|
||||
|
||||
bool use_gpu_ = false;
|
||||
int gpu_id_ = 0;
|
||||
int gpu_mem_ = 4000;
|
||||
int cpu_math_library_num_threads_ = 4;
|
||||
bool use_mkldnn_ = false;
|
||||
bool use_zero_copy_run_ = false;
|
||||
double cls_thresh = 0.5;
|
||||
|
||||
std::vector<float> mean_ = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale_ = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
bool is_scale_ = true;
|
||||
|
||||
// pre-process
|
||||
ClsResizeImg resize_op_;
|
||||
Normalize normalize_op_;
|
||||
Permute permute_op_;
|
||||
|
||||
}; // class Classifier
|
||||
|
||||
} // namespace PaddleOCR
|
|
@ -27,6 +27,7 @@
|
|||
#include <fstream>
|
||||
#include <numeric>
|
||||
|
||||
#include <include/ocr_cls.h>
|
||||
#include <include/postprocess_op.h>
|
||||
#include <include/preprocess_op.h>
|
||||
#include <include/utility.h>
|
||||
|
@ -56,7 +57,8 @@ public:
|
|||
// Load Paddle inference model
|
||||
void LoadModel(const std::string &model_dir);
|
||||
|
||||
void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img);
|
||||
void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img,
|
||||
Classifier *cls);
|
||||
|
||||
private:
|
||||
std::shared_ptr<PaddlePredictor> predictor_;
|
||||
|
|
|
@ -56,4 +56,10 @@ public:
|
|||
const std::vector<int> &rec_image_shape = {3, 32, 320});
|
||||
};
|
||||
|
||||
class ClsResizeImg {
|
||||
public:
|
||||
virtual void Run(const cv::Mat &img, cv::Mat &resize_img,
|
||||
const std::vector<int> &rec_image_shape = {3, 32, 320});
|
||||
};
|
||||
|
||||
} // namespace PaddleOCR
|
|
@ -193,6 +193,9 @@ make -j
|
|||
sh tools/run.sh
|
||||
```
|
||||
|
||||
* 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1,表示开启方向分类器的预测。
|
||||
|
||||
|
||||
最终屏幕上会输出检测结果如下。
|
||||
|
||||
<div align="center">
|
||||
|
|
|
@ -162,7 +162,7 @@ inference/
|
|||
sh tools/build.sh
|
||||
```
|
||||
|
||||
具体地,`tools/build.sh`中内容如下。
|
||||
Specifically, the content in `tools/build.sh` is as follows.
|
||||
|
||||
```shell
|
||||
OPENCV_DIR=your_opencv_dir
|
||||
|
@ -201,6 +201,8 @@ make -j
|
|||
sh tools/run.sh
|
||||
```
|
||||
|
||||
* If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function.
|
||||
|
||||
The detection results will be shown on the screen, which is as follows.
|
||||
|
||||
<div align="center">
|
||||
|
|
|
@ -53,6 +53,15 @@ int main(int argc, char **argv) {
|
|||
config.cpu_math_library_num_threads, config.use_mkldnn,
|
||||
config.use_zero_copy_run, config.max_side_len, config.det_db_thresh,
|
||||
config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize);
|
||||
|
||||
Classifier *cls = nullptr;
|
||||
if (config.use_angle_cls == true) {
|
||||
cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id,
|
||||
config.gpu_mem, config.cpu_math_library_num_threads,
|
||||
config.use_mkldnn, config.use_zero_copy_run,
|
||||
config.cls_thresh);
|
||||
}
|
||||
|
||||
CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id,
|
||||
config.gpu_mem, config.cpu_math_library_num_threads,
|
||||
config.use_mkldnn, config.use_zero_copy_run,
|
||||
|
@ -62,7 +71,7 @@ int main(int argc, char **argv) {
|
|||
std::vector<std::vector<std::vector<int>>> boxes;
|
||||
det.Run(srcimg, boxes);
|
||||
|
||||
rec.Run(boxes, srcimg);
|
||||
rec.Run(boxes, srcimg, cls);
|
||||
|
||||
auto end = std::chrono::system_clock::now();
|
||||
auto duration =
|
||||
|
|
|
@ -0,0 +1,110 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#include <include/ocr_cls.h>
|
||||
|
||||
namespace PaddleOCR {
|
||||
|
||||
cv::Mat Classifier::Run(cv::Mat &img) {
|
||||
cv::Mat src_img;
|
||||
img.copyTo(src_img);
|
||||
cv::Mat resize_img;
|
||||
|
||||
std::vector<int> rec_image_shape = {3, 32, 100};
|
||||
int index = 0;
|
||||
float wh_ratio = float(img.cols) / float(img.rows);
|
||||
|
||||
this->resize_op_.Run(img, resize_img, rec_image_shape);
|
||||
|
||||
this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
|
||||
this->is_scale_);
|
||||
|
||||
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
|
||||
|
||||
this->permute_op_.Run(&resize_img, input.data());
|
||||
|
||||
// Inference.
|
||||
if (this->use_zero_copy_run_) {
|
||||
auto input_names = this->predictor_->GetInputNames();
|
||||
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
|
||||
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
|
||||
input_t->copy_from_cpu(input.data());
|
||||
this->predictor_->ZeroCopyRun();
|
||||
} else {
|
||||
paddle::PaddleTensor input_t;
|
||||
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
|
||||
input_t.data =
|
||||
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
|
||||
input_t.dtype = PaddleDType::FLOAT32;
|
||||
std::vector<paddle::PaddleTensor> outputs;
|
||||
this->predictor_->Run({input_t}, &outputs, 1);
|
||||
}
|
||||
|
||||
std::vector<float> softmax_out;
|
||||
std::vector<int64_t> label_out;
|
||||
auto output_names = this->predictor_->GetOutputNames();
|
||||
auto softmax_out_t = this->predictor_->GetOutputTensor(output_names[0]);
|
||||
auto label_out_t = this->predictor_->GetOutputTensor(output_names[1]);
|
||||
auto softmax_shape_out = softmax_out_t->shape();
|
||||
auto label_shape_out = label_out_t->shape();
|
||||
|
||||
int softmax_out_num =
|
||||
std::accumulate(softmax_shape_out.begin(), softmax_shape_out.end(), 1,
|
||||
std::multiplies<int>());
|
||||
|
||||
int label_out_num =
|
||||
std::accumulate(label_shape_out.begin(), label_shape_out.end(), 1,
|
||||
std::multiplies<int>());
|
||||
softmax_out.resize(softmax_out_num);
|
||||
label_out.resize(label_out_num);
|
||||
|
||||
softmax_out_t->copy_to_cpu(softmax_out.data());
|
||||
label_out_t->copy_to_cpu(label_out.data());
|
||||
|
||||
int label = label_out[0];
|
||||
float score = softmax_out[label];
|
||||
// std::cout << "\nlabel "<<label<<" score: "<<score;
|
||||
if (label % 2 == 1 && score > this->cls_thresh) {
|
||||
cv::rotate(src_img, src_img, 1);
|
||||
}
|
||||
return src_img;
|
||||
}
|
||||
|
||||
void Classifier::LoadModel(const std::string &model_dir) {
|
||||
AnalysisConfig config;
|
||||
config.SetModel(model_dir + "/model", model_dir + "/params");
|
||||
|
||||
if (this->use_gpu_) {
|
||||
config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
|
||||
} else {
|
||||
config.DisableGpu();
|
||||
if (this->use_mkldnn_) {
|
||||
config.EnableMKLDNN();
|
||||
}
|
||||
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
|
||||
}
|
||||
|
||||
// false for zero copy tensor
|
||||
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
|
||||
// true for multiple input
|
||||
config.SwitchSpecifyInputNames(true);
|
||||
|
||||
config.SwitchIrOptim(true);
|
||||
|
||||
config.EnableMemoryOptim();
|
||||
config.DisableGlogInfo();
|
||||
|
||||
this->predictor_ = CreatePaddlePredictor(config);
|
||||
}
|
||||
} // namespace PaddleOCR
|
|
@ -108,9 +108,11 @@ void DBDetector::Run(cv::Mat &img,
|
|||
const double maxvalue = 255;
|
||||
cv::Mat bit_map;
|
||||
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
|
||||
|
||||
cv::Mat dilation_map;
|
||||
cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2,2));
|
||||
cv::dilate(bit_map, dilation_map, dila_ele);
|
||||
boxes = post_processor_.BoxesFromBitmap(
|
||||
pred_map, bit_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_);
|
||||
pred_map, dilation_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_);
|
||||
|
||||
boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg);
|
||||
|
||||
|
|
|
@ -17,7 +17,7 @@
|
|||
namespace PaddleOCR {
|
||||
|
||||
void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
|
||||
cv::Mat &img) {
|
||||
cv::Mat &img, Classifier *cls) {
|
||||
cv::Mat srcimg;
|
||||
img.copyTo(srcimg);
|
||||
cv::Mat crop_img;
|
||||
|
@ -27,6 +27,9 @@ void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
|
|||
int index = 0;
|
||||
for (int i = boxes.size() - 1; i >= 0; i--) {
|
||||
crop_img = GetRotateCropImage(srcimg, boxes[i]);
|
||||
if (cls != nullptr) {
|
||||
crop_img = cls->Run(crop_img);
|
||||
}
|
||||
|
||||
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
|
||||
|
||||
|
|
|
@ -294,7 +294,7 @@ PostProcessor::FilterTagDetRes(std::vector<std::vector<std::vector<int>>> boxes,
|
|||
pow(boxes[n][0][1] - boxes[n][1][1], 2)));
|
||||
rect_height = int(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) +
|
||||
pow(boxes[n][0][1] - boxes[n][3][1], 2)));
|
||||
if (rect_width <= 10 || rect_height <= 10)
|
||||
if (rect_width <= 4 || rect_height <= 4)
|
||||
continue;
|
||||
root_points.push_back(boxes[n]);
|
||||
}
|
||||
|
|
|
@ -116,4 +116,26 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio,
|
|||
cv::INTER_LINEAR);
|
||||
}
|
||||
|
||||
void ClsResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
|
||||
const std::vector<int> &rec_image_shape) {
|
||||
int imgC, imgH, imgW;
|
||||
imgC = rec_image_shape[0];
|
||||
imgH = rec_image_shape[1];
|
||||
imgW = rec_image_shape[2];
|
||||
|
||||
float ratio = float(img.cols) / float(img.rows);
|
||||
int resize_w, resize_h;
|
||||
if (ceilf(imgH * ratio) > imgW)
|
||||
resize_w = imgW;
|
||||
else
|
||||
resize_w = int(ceilf(imgH * ratio));
|
||||
|
||||
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
|
||||
cv::INTER_LINEAR);
|
||||
if (resize_w < imgW) {
|
||||
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, imgW - resize_w,
|
||||
cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace PaddleOCR
|
|
@ -10,9 +10,14 @@ use_zero_copy_run 1
|
|||
max_side_len 960
|
||||
det_db_thresh 0.3
|
||||
det_db_box_thresh 0.5
|
||||
det_db_unclip_ratio 2.0
|
||||
det_db_unclip_ratio 1.6
|
||||
det_model_dir ./inference/det_db
|
||||
|
||||
# cls config
|
||||
use_angle_cls 0
|
||||
cls_model_dir ./inference/cls
|
||||
cls_thresh 0.9
|
||||
|
||||
# rec config
|
||||
rec_model_dir ./inference/rec_crnn
|
||||
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt
|
||||
|
|
|
@ -13,7 +13,7 @@ def read_params():
|
|||
|
||||
#params for text detector
|
||||
cfg.det_algorithm = "DB"
|
||||
cfg.det_model_dir = "./inference/ch_det_mv3_db/"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
|
||||
cfg.det_max_side_len = 960
|
||||
|
||||
#DB parmas
|
||||
|
|
|
@ -28,7 +28,7 @@ def read_params():
|
|||
|
||||
#params for text recognizer
|
||||
cfg.rec_algorithm = "CRNN"
|
||||
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/"
|
||||
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
|
||||
|
||||
cfg.rec_image_shape = "3, 32, 320"
|
||||
cfg.rec_char_type = 'ch'
|
||||
|
|
|
@ -10,10 +10,10 @@ class Config(object):
|
|||
|
||||
def read_params():
|
||||
cfg = Config()
|
||||
|
||||
|
||||
#params for text detector
|
||||
cfg.det_algorithm = "DB"
|
||||
cfg.det_model_dir = "./inference/ch_det_mv3_db/"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
|
||||
cfg.det_max_side_len = 960
|
||||
|
||||
#DB parmas
|
||||
|
@ -28,7 +28,7 @@ def read_params():
|
|||
|
||||
#params for text recognizer
|
||||
cfg.rec_algorithm = "CRNN"
|
||||
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/"
|
||||
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
|
||||
|
||||
cfg.rec_image_shape = "3, 32, 320"
|
||||
cfg.rec_char_type = 'ch'
|
||||
|
@ -38,6 +38,13 @@ def read_params():
|
|||
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
|
||||
cfg.use_space_char = True
|
||||
|
||||
#params for text classifier
|
||||
cfg.use_angle_cls = False
|
||||
cfg.cls_model_dir = "./inference/ch_ppocr_mobile-v1.1.cls_infer/"
|
||||
cfg.cls_image_shape = "3, 48, 192"
|
||||
cfg.label_list = ['0', '180']
|
||||
cfg.cls_batch_num = 30
|
||||
|
||||
cfg.use_zero_copy_run = False
|
||||
|
||||
return cfg
|
||||
|
|
|
@ -1,10 +1,12 @@
|
|||
# 服务部署
|
||||
[English](readme_en.md) | 简体中文
|
||||
|
||||
PaddleOCR提供2种服务部署方式:
|
||||
- 基于HubServing的部署:已集成到PaddleOCR中([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)),按照本教程使用;
|
||||
- 基于PaddleServing的部署:详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr),后续也将集成到PaddleOCR。
|
||||
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用;
|
||||
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../pdserving/readme.md)。
|
||||
|
||||
服务部署目录下包括检测、识别、2阶段串联三种服务包,根据需求选择相应的服务包进行安装和启动。目录如下:
|
||||
# 基于PaddleHub Serving的服务部署
|
||||
|
||||
hubserving服务部署目录下包括检测、识别、2阶段串联三种服务包,请根据需求选择相应的服务包进行安装和启动。目录结构如下:
|
||||
```
|
||||
deploy/hubserving/
|
||||
└─ ocr_det 检测模块服务包
|
||||
|
@ -30,11 +32,18 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
||||
# 在Linux下设置环境变量
|
||||
export PYTHONPATH=.
|
||||
# 在Windows下设置环境变量
|
||||
|
||||
# 或者,在Windows下设置环境变量
|
||||
SET PYTHONPATH=.
|
||||
```
|
||||
|
||||
### 2. 安装服务模块
|
||||
### 2. 下载推理模型
|
||||
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认检测模型路径为:
|
||||
`./inference/ch_ppocr_mobile_v1.1_det_infer/`,识别模型路径为:`./inference/ch_ppocr_mobile_v1.1_rec_infer/`。
|
||||
|
||||
**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。
|
||||
|
||||
### 3. 安装服务模块
|
||||
PaddleOCR提供3种服务模块,根据需要安装所需模块。
|
||||
|
||||
* 在Linux环境下,安装示例如下:
|
||||
|
@ -61,15 +70,7 @@ hub install deploy\hubserving\ocr_rec\
|
|||
hub install deploy\hubserving\ocr_system\
|
||||
```
|
||||
|
||||
#### 安装模型
|
||||
安装服务模块前,需要将训练好的模型放到对应的文件夹内。默认使用的是:
|
||||
./inference/ch_det_mv3_db/
|
||||
和
|
||||
./inference/ch_rec_mv3_crnn/
|
||||
这两个模型可以在https://github.com/PaddlePaddle/PaddleOCR 下载
|
||||
可以在./deploy/hubserving/ocr_system/params.py 里面修改成自己的模型
|
||||
|
||||
### 3. 启动服务
|
||||
### 4. 启动服务
|
||||
#### 方式1. 命令行命令启动(仅支持CPU)
|
||||
**启动命令:**
|
||||
```shell
|
||||
|
@ -172,7 +173,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
|
|||
```hub serving stop --port/-p XXXX```
|
||||
|
||||
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
|
||||
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。
|
||||
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。**
|
||||
|
||||
- 3、 卸载旧服务包
|
||||
```hub uninstall ocr_system```
|
|
@ -1,10 +1,12 @@
|
|||
# Service deployment
|
||||
English | [简体中文](readme.md)
|
||||
|
||||
PaddleOCR provides 2 service deployment methods::
|
||||
- Based on **HubServing**:Has been integrated into PaddleOCR ([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)). Please follow this tutorial.
|
||||
- Based on **PaddleServing**:See PaddleServing official website for details ([demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)). Follow-up will also be integrated into PaddleOCR.
|
||||
PaddleOCR provides 2 service deployment methods:
|
||||
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial.
|
||||
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../pdserving/readme_en.md) for usage.
|
||||
|
||||
The service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Select the corresponding service package to install and start service according to your needs. The directory is as follows:
|
||||
# Service deployment based on PaddleHub Serving
|
||||
|
||||
The hubserving service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:
|
||||
```
|
||||
deploy/hubserving/
|
||||
└─ ocr_det detection module service package
|
||||
|
@ -31,11 +33,17 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
||||
# Set environment variables on Linux
|
||||
export PYTHONPATH=.
|
||||
|
||||
# Set environment variables on Windows
|
||||
SET PYTHONPATH=.
|
||||
```
|
||||
|
||||
### 2. Install Service Module
|
||||
### 2. Download inference model
|
||||
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default detection model path is: `./inference/ch_ppocr_mobile_v1.1_det_infer/`, the default recognition model path is: `./inference/ch_ppocr_mobile_v1.1_rec_infer/`.
|
||||
|
||||
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
|
||||
|
||||
### 3. Install Service Module
|
||||
PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs.
|
||||
|
||||
* On Linux platform, the examples are as follows.
|
||||
|
@ -62,7 +70,7 @@ hub install deploy\hubserving\ocr_rec\
|
|||
hub install deploy\hubserving\ocr_system\
|
||||
```
|
||||
|
||||
### 3. Start service
|
||||
### 4. Start service
|
||||
#### Way 1. Start with command line parameters (CPU only)
|
||||
|
||||
**start command:**
|
|
@ -40,8 +40,8 @@ CXX_LIBS = ${OPENCV_LIBS} -L$(LITE_ROOT)/cxx/lib/ -lpaddle_light_api_shared $(SY
|
|||
|
||||
#CXX_LIBS = $(LITE_ROOT)/cxx/lib/libpaddle_api_light_bundled.a $(SYSTEM_LIBS)
|
||||
|
||||
ocr_db_crnn: fetch_opencv ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o
|
||||
$(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o -o ocr_db_crnn $(CXX_LIBS) $(LDFLAGS)
|
||||
ocr_db_crnn: fetch_opencv ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o cls_process.o
|
||||
$(CC) $(SYSROOT_LINK) $(CXXFLAGS_LINK) ocr_db_crnn.o crnn_process.o db_post_process.o clipper.o cls_process.o -o ocr_db_crnn $(CXX_LIBS) $(LDFLAGS)
|
||||
|
||||
ocr_db_crnn.o: ocr_db_crnn.cc
|
||||
$(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o ocr_db_crnn.o -c ocr_db_crnn.cc
|
||||
|
@ -49,6 +49,9 @@ ocr_db_crnn.o: ocr_db_crnn.cc
|
|||
crnn_process.o: fetch_opencv crnn_process.cc
|
||||
$(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o crnn_process.o -c crnn_process.cc
|
||||
|
||||
cls_process.o: fetch_opencv cls_process.cc
|
||||
$(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o cls_process.o -c cls_process.cc
|
||||
|
||||
db_post_process.o: fetch_clipper fetch_opencv db_post_process.cc
|
||||
$(CC) $(SYSROOT_COMPLILE) $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -o db_post_process.o -c db_post_process.cc
|
||||
|
||||
|
@ -73,5 +76,5 @@ fetch_opencv:
|
|||
|
||||
.PHONY: clean
|
||||
clean:
|
||||
rm -f ocr_db_crnn.o clipper.o db_post_process.o crnn_process.o
|
||||
rm -f ocr_db_crnn.o clipper.o db_post_process.o crnn_process.o cls_process.o
|
||||
rm -f ocr_db_crnn
|
||||
|
|
|
@ -0,0 +1,43 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#include "cls_process.h" //NOLINT
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
|
||||
const std::vector<int> rec_image_shape{3, 32, 100};
|
||||
|
||||
cv::Mat ClsResizeImg(cv::Mat img) {
|
||||
int imgC, imgH, imgW;
|
||||
imgC = rec_image_shape[0];
|
||||
imgH = rec_image_shape[1];
|
||||
imgW = rec_image_shape[2];
|
||||
|
||||
float ratio = static_cast<float>(img.cols) / static_cast<float>(img.rows);
|
||||
|
||||
int resize_w, resize_h;
|
||||
if (ceilf(imgH * ratio) > imgW)
|
||||
resize_w = imgW;
|
||||
else
|
||||
resize_w = int(ceilf(imgH * ratio));
|
||||
cv::Mat resize_img;
|
||||
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
|
||||
cv::INTER_LINEAR);
|
||||
if (resize_w < imgW) {
|
||||
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, imgW - resize_w,
|
||||
cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
|
||||
}
|
||||
return resize_img;
|
||||
}
|
|
@ -0,0 +1,29 @@
|
|||
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <cstring>
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "math.h" //NOLINT
|
||||
#include "opencv2/core.hpp"
|
||||
#include "opencv2/imgcodecs.hpp"
|
||||
#include "opencv2/imgproc.hpp"
|
||||
|
||||
cv::Mat ClsResizeImg(cv::Mat img);
|
|
@ -1,4 +1,4 @@
|
|||
max_side_len 960
|
||||
det_db_thresh 0.3
|
||||
det_db_box_thresh 0.5
|
||||
det_db_unclip_ratio 2.0
|
||||
det_db_unclip_ratio 1.6
|
|
@ -293,7 +293,7 @@ FilterTagDetRes(std::vector<std::vector<std::vector<int>>> boxes, float ratio_h,
|
|||
rect_height =
|
||||
static_cast<int>(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) +
|
||||
pow(boxes[n][0][1] - boxes[n][3][1], 2)));
|
||||
if (rect_width <= 10 || rect_height <= 10)
|
||||
if (rect_width <= 4 || rect_height <= 4)
|
||||
continue;
|
||||
root_points.push_back(boxes[n]);
|
||||
}
|
||||
|
|
|
@ -15,6 +15,7 @@
|
|||
#include "paddle_api.h" // NOLINT
|
||||
#include <chrono>
|
||||
|
||||
#include "cls_process.h"
|
||||
#include "crnn_process.h"
|
||||
#include "db_post_process.h"
|
||||
|
||||
|
@ -105,11 +106,55 @@ cv::Mat DetResizeImg(const cv::Mat img, int max_size_len,
|
|||
return resize_img;
|
||||
}
|
||||
|
||||
cv::Mat RunClsModel(cv::Mat img, std::shared_ptr<PaddlePredictor> predictor_cls,
|
||||
const float thresh = 0.5) {
|
||||
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
|
||||
cv::Mat srcimg;
|
||||
img.copyTo(srcimg);
|
||||
cv::Mat crop_img;
|
||||
cv::Mat resize_img;
|
||||
|
||||
int index = 0;
|
||||
float wh_ratio =
|
||||
static_cast<float>(crop_img.cols) / static_cast<float>(crop_img.rows);
|
||||
|
||||
resize_img = ClsResizeImg(crop_img);
|
||||
resize_img.convertTo(resize_img, CV_32FC3, 1 / 255.f);
|
||||
|
||||
const float *dimg = reinterpret_cast<const float *>(resize_img.data);
|
||||
|
||||
std::unique_ptr<Tensor> input_tensor0(std::move(predictor_cls->GetInput(0)));
|
||||
input_tensor0->Resize({1, 3, resize_img.rows, resize_img.cols});
|
||||
auto *data0 = input_tensor0->mutable_data<float>();
|
||||
|
||||
NeonMeanScale(dimg, data0, resize_img.rows * resize_img.cols, mean, scale);
|
||||
// Run CLS predictor
|
||||
predictor_cls->Run();
|
||||
|
||||
// Get output and run postprocess
|
||||
std::unique_ptr<const Tensor> softmax_out(
|
||||
std::move(predictor_cls->GetOutput(0)));
|
||||
std::unique_ptr<const Tensor> label_out(
|
||||
std::move(predictor_cls->GetOutput(1)));
|
||||
auto *softmax_scores = softmax_out->mutable_data<float>();
|
||||
auto *label_idxs = label_out->data<int64>();
|
||||
int label_idx = label_idxs[0];
|
||||
float score = softmax_scores[label_idx];
|
||||
|
||||
if (label_idx % 2 == 1 && score > thresh) {
|
||||
cv::rotate(srcimg, srcimg, 1);
|
||||
}
|
||||
return srcimg;
|
||||
}
|
||||
|
||||
void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
|
||||
std::shared_ptr<PaddlePredictor> predictor_crnn,
|
||||
std::vector<std::string> &rec_text,
|
||||
std::vector<float> &rec_text_score,
|
||||
std::vector<std::string> charactor_dict) {
|
||||
std::vector<std::string> charactor_dict,
|
||||
std::shared_ptr<PaddlePredictor> predictor_cls) {
|
||||
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
|
||||
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
|
||||
|
||||
|
@ -121,6 +166,7 @@ void RunRecModel(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat img,
|
|||
int index = 0;
|
||||
for (int i = boxes.size() - 1; i >= 0; i--) {
|
||||
crop_img = GetRotateCropImage(srcimg, boxes[i]);
|
||||
crop_img = RunClsModel(crop_img, predictor_cls);
|
||||
float wh_ratio =
|
||||
static_cast<float>(crop_img.cols) / static_cast<float>(crop_img.rows);
|
||||
|
||||
|
@ -243,8 +289,10 @@ RunDetModel(std::shared_ptr<PaddlePredictor> predictor, cv::Mat img,
|
|||
const double maxvalue = 255;
|
||||
cv::Mat bit_map;
|
||||
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
|
||||
|
||||
auto boxes = BoxesFromBitmap(pred_map, bit_map, Config);
|
||||
cv::Mat dilation_map;
|
||||
cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2,2));
|
||||
cv::dilate(bit_map, dilation_map, dila_ele);
|
||||
auto boxes = BoxesFromBitmap(pred_map, dilation_map, Config);
|
||||
|
||||
std::vector<std::vector<std::vector<int>>> filter_boxes =
|
||||
FilterTagDetRes(boxes, ratio_hw[0], ratio_hw[1], srcimg);
|
||||
|
@ -323,8 +371,9 @@ int main(int argc, char **argv) {
|
|||
}
|
||||
std::string det_model_file = argv[1];
|
||||
std::string rec_model_file = argv[2];
|
||||
std::string img_path = argv[3];
|
||||
std::string dict_path = argv[4];
|
||||
std::string cls_model_file = argv[3];
|
||||
std::string img_path = argv[4];
|
||||
std::string dict_path = argv[5];
|
||||
|
||||
//// load config from txt file
|
||||
auto Config = LoadConfigTxt("./config.txt");
|
||||
|
@ -333,6 +382,7 @@ int main(int argc, char **argv) {
|
|||
|
||||
auto det_predictor = loadModel(det_model_file);
|
||||
auto rec_predictor = loadModel(rec_model_file);
|
||||
auto cls_predictor = loadModel(cls_model_file);
|
||||
|
||||
auto charactor_dict = ReadDict(dict_path);
|
||||
charactor_dict.push_back(" ");
|
||||
|
@ -343,7 +393,7 @@ int main(int argc, char **argv) {
|
|||
std::vector<std::string> rec_text;
|
||||
std::vector<float> rec_text_score;
|
||||
RunRecModel(boxes, srcimg, rec_predictor, rec_text, rec_text_score,
|
||||
charactor_dict);
|
||||
charactor_dict, cls_predictor);
|
||||
|
||||
auto end = std::chrono::system_clock::now();
|
||||
auto duration =
|
||||
|
|
|
@ -1,5 +1,10 @@
|
|||
# Paddle Serving 服务部署(Beta)
|
||||
[English](readme_en.md) | 简体中文
|
||||
|
||||
PaddleOCR提供2种服务部署方式:
|
||||
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",使用方法参考[文档](../hubserving/readme.md)。
|
||||
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",按照本教程使用。
|
||||
|
||||
# Paddle Serving 服务部署
|
||||
本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。
|
||||
|
||||
## 快速启动服务
|
||||
|
@ -14,36 +19,19 @@
|
|||
|
||||
**操作系统版本:CentOS 6以上**
|
||||
|
||||
**Python3操作指南:**
|
||||
**Python版本: 2.7/3.6/3.7**
|
||||
|
||||
**Python操作指南:**
|
||||
```
|
||||
#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线
|
||||
#GPU用户下载server包使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_server_gpu-0.3.2-py3-none-any.whl
|
||||
#CPU版本使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_server-0.3.2-py3-none-any.whl
|
||||
#CPU/GPU版本选择一个
|
||||
#GPU版本服务端
|
||||
python -m pip install paddle_serving_server_gpu
|
||||
#CPU版本服务端
|
||||
python -m pip install paddle_serving_server
|
||||
#客户端和App包使用以下链接(CPU,GPU通用)
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp36-none-any.whl
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_app-0.1.2-py3-none-any.whl paddle_serving_client-0.3.2-cp36-none-any.whl
|
||||
python -m pip install paddle_serving_app paddle_serving_client
|
||||
```
|
||||
|
||||
**Python2操作指南:**
|
||||
```
|
||||
#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线
|
||||
#GPU用户下载server包使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py2-none-any.whl
|
||||
python -m pip install paddle_serving_server_gpu-0.3.2-py2-none-any.whl
|
||||
#CPU版本使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py2-none-any.whl
|
||||
python -m pip install paddle_serving_server-0.3.2-py2-none-any.whl
|
||||
|
||||
#客户端和App包使用以下链接(CPU,GPU通用)
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py2-none-any.whl
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp27-none-any.whl
|
||||
python -m pip install paddle_serving_app-0.1.2-py2-none-any.whl paddle_serving_client-0.3.2-cp27-none-any.whl
|
||||
```
|
||||
|
||||
### 2. 模型转换
|
||||
可以使用`paddle_serving_app`提供的模型,执行下列命令
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
English | [简体中文](readme.md)
|
||||
|
||||
PaddleOCR provides 2 service deployment methods:
|
||||
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../hubserving/readme_en.md) for usage.
|
||||
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please follow this tutorial.
|
||||
|
||||
# Service deployment based on Paddle Serving
|
||||
|
||||
This tutorial will introduce the detail steps of deploying PaddleOCR online prediction service based on [Paddle Serving](https://github.com/PaddlePaddle/Serving).
|
||||
|
||||
## Quick start service
|
||||
|
||||
### 1. Prepare the environment
|
||||
Let's first install the relevant components of Paddle Serving. GPU is recommended for service deployment with Paddle Serving.
|
||||
|
||||
**Requirements:**
|
||||
- **CUDA version: 9.0**
|
||||
- **CUDNN version: 7.0**
|
||||
- **Operating system version: >= CentOS 6**
|
||||
- **Python version: 2.7/3.6/3.7**
|
||||
|
||||
**Installation:**
|
||||
```
|
||||
# install GPU server
|
||||
python -m pip install paddle_serving_server_gpu
|
||||
|
||||
# or, install CPU server
|
||||
python -m pip install paddle_serving_server
|
||||
|
||||
# install client and App package (CPU/GPU)
|
||||
python -m pip install paddle_serving_app paddle_serving_client
|
||||
```
|
||||
|
||||
### 2. Model transformation
|
||||
You can directly use converted model provided by `paddle_serving_app` for convenience. Execute the following command to obtain:
|
||||
```
|
||||
python -m paddle_serving_app.package --get_model ocr_rec
|
||||
tar -xzvf ocr_rec.tar.gz
|
||||
python -m paddle_serving_app.package --get_model ocr_det
|
||||
tar -xzvf ocr_det.tar.gz
|
||||
```
|
||||
Executing the above command will download the `db_crnn_mobile` model, which is in different format with inference model. If you want to use other models for deployment, you can refer to the [tutorial](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md) to convert your inference model to a model which is deployable for Paddle Serving.
|
||||
|
||||
We take `ch_rec_r34_vd_crnn` model as example. Download the inference model by executing the following command:
|
||||
```
|
||||
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
|
||||
tar xf ch_rec_r34_vd_crnn_infer.tar
|
||||
```
|
||||
|
||||
Convert the downloaded model by executing the following python script:
|
||||
```
|
||||
from paddle_serving_client.io import inference_model_to_serving
|
||||
inference_model_dir = "ch_rec_r34_vd_crnn"
|
||||
serving_client_dir = "serving_client_dir"
|
||||
serving_server_dir = "serving_server_dir"
|
||||
feed_var_names, fetch_var_names = inference_model_to_serving(
|
||||
inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
|
||||
```
|
||||
|
||||
Finally, model configuration of client and server will be generated in `serving_client_dir` and `serving_server_dir`.
|
||||
|
||||
### 3. Start service
|
||||
Start the standard version or the fast version service according to your actual needs. The comparison of the two versions is shown in the table below:
|
||||
|
||||
|version|characteristics|recommended scenarios|
|
||||
|-|-|-|
|
||||
|standard version|High stability, suitable for distributed deployment|Large throughput and cross regional deployment|
|
||||
|fast version|Easy to deploy and fast to predict|Suitable for scenarios which requires high prediction speed and fast iteration speed|
|
||||
|
||||
#### Mode 1. Start the standard mode service
|
||||
|
||||
```
|
||||
# start with CPU
|
||||
python -m paddle_serving_server.serve --model ocr_det_model --port 9293
|
||||
python ocr_web_server.py cpu
|
||||
|
||||
# or, with GPU
|
||||
python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
|
||||
python ocr_web_server.py gpu
|
||||
```
|
||||
|
||||
#### Mode 2. Start the fast mode service
|
||||
|
||||
```
|
||||
# start with CPU
|
||||
python ocr_local_server.py cpu
|
||||
|
||||
# or, with GPU
|
||||
python ocr_local_server.py gpu
|
||||
```
|
||||
|
||||
## Send prediction requests
|
||||
|
||||
```
|
||||
python ocr_web_client.py
|
||||
```
|
||||
|
||||
## Returned result format
|
||||
|
||||
The returned result is a JSON string, eg.
|
||||
```
|
||||
{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}}
|
||||
```
|
||||
|
||||
You can also print the readable result in `res`:
|
||||
```
|
||||
土地整治与土壤修复研究中心
|
||||
华南农业大学1素图
|
||||
```
|
||||
|
||||
## User defined service module modification
|
||||
|
||||
The pre-processing and post-processing process, can be found in the `preprocess` and `postprocess` function in `ocr_web_server.py` or `ocr_local_server.py`. The pre-processing/post-processing library for common CV models provided by `paddle_serving_app` is called.
|
||||
You can modify the corresponding code as actual needs.
|
||||
|
||||
If you only want to start the detection service or the recognition service, execute the corresponding script reffering to the following table. Indicate the CPU or GPU is used in the start command parameters.
|
||||
|
||||
| task | standard | fast |
|
||||
| ---- | ----------------- | ------------------- |
|
||||
| detection | det_web_server.py | det_local_server.py |
|
||||
| recognition | rec_web_server.py | rec_local_server.py |
|
||||
|
||||
More info can be found in [Paddle Serving](https://github.com/PaddlePaddle/Serving).
|
|
@ -0,0 +1,180 @@
|
|||
\> 运行示例前请先安装develop版本PaddleSlim
|
||||
|
||||
|
||||
|
||||
# 模型裁剪压缩教程
|
||||
|
||||
压缩结果:
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>序号</th>
|
||||
<th>任务</th>
|
||||
<th>模型</th>
|
||||
<th>压缩策略<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
|
||||
<th>精度(自建中文数据集)</th>
|
||||
<th>耗时<sup><a href="#latency">[1]</a></sup>(ms)</th>
|
||||
<th>整体耗时<sup><a href="#rec">[2]</a></sup>(ms)</th>
|
||||
<th>加速比</th>
|
||||
<th>整体模型大小(M)</th>
|
||||
<th>压缩比例</th>
|
||||
<th>下载链接</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td rowspan="2">0</td>
|
||||
<td>检测</td>
|
||||
<td>MobileNetV3_DB</td>
|
||||
<td>无</td>
|
||||
<td>61.7</td>
|
||||
<td>224</td>
|
||||
<td rowspan="2">375</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td rowspan="2">8.6</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>MobileNetV3_CRNN</td>
|
||||
<td>无</td>
|
||||
<td>62.0</td>
|
||||
<td>9.52</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">1</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>62.1</td>
|
||||
<td>195</td>
|
||||
<td rowspan="2">348</td>
|
||||
<td rowspan="2">8%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">2</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet_quat_pruning</td>
|
||||
<td>剪裁+PACT量化训练</td>
|
||||
<td>60.86</td>
|
||||
<td>142</td>
|
||||
<td rowspan="2">288</td>
|
||||
<td rowspan="2">30%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">3</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet_pruning</td>
|
||||
<td>剪裁</td>
|
||||
<td>61.57</td>
|
||||
<td>138</td>
|
||||
<td rowspan="2">295</td>
|
||||
<td rowspan="2">27%</td>
|
||||
<td rowspan="2">2.9</td>
|
||||
<td rowspan="2">66.28%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
## 概述
|
||||
|
||||
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型裁剪通过移出网络模型中的子模型来减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
|
||||
|
||||
该示例使用PaddleSlim提供的[裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)对OCR模型进行压缩。
|
||||
|
||||
在阅读该示例前,建议您先了解以下内容:
|
||||
|
||||
|
||||
|
||||
\- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
|
||||
|
||||
\- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
|
||||
|
||||
|
||||
|
||||
## 安装PaddleSlim
|
||||
|
||||
```bash
|
||||
|
||||
git clone https://github.com/PaddlePaddle/PaddleSlim.git
|
||||
|
||||
cd Paddleslim
|
||||
|
||||
python setup.py install
|
||||
|
||||
```
|
||||
|
||||
|
||||
## 获取预训练模型
|
||||
[检测预训练模型下载地址]()
|
||||
|
||||
|
||||
## 敏感度分析训练
|
||||
加载预训练模型后,通过对现有模型的每个网络层进行敏感度分析,了解各网络层冗余度,从而决定每个网络层的裁剪比例。敏感度分析的具体细节见:[敏感度分析](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
|
||||
|
||||
进入PaddleOCR根目录,通过以下命令对模型进行敏感度分析:
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 裁剪模型与fine-tune
|
||||
裁剪时通过之前的敏感度分析文件决定每个网络层的裁剪比例。在具体实现时,为了尽可能多的保留从图像中提取的低阶特征,我们跳过了backbone中靠近输入的4个卷积层。同样,为了减少由于裁剪导致的模型性能损失,我们通过之前敏感度分析所获得的敏感度表,挑选出了一些冗余较少,对裁剪较为敏感的[网络层](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41),并在之后的裁剪过程中选择避开这些网络层。裁剪过后finetune的过程沿用OCR检测模型原始的训练策略。
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## 导出模型
|
||||
|
||||
在得到裁剪训练保存的模型后,我们可以将其导出为inference_model,用于预测部署:
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
|
||||
|
||||
```
|
|
@ -0,0 +1,183 @@
|
|||
\> PaddleSlim develop version should be installed before runing this example.
|
||||
|
||||
|
||||
|
||||
# Model compress tutorial (Pruning)
|
||||
|
||||
Compress results:
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>ID</th>
|
||||
<th>Task</th>
|
||||
<th>Model</th>
|
||||
<th>Compress Strategy<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
|
||||
<th>Criterion(Chinese dataset)</th>
|
||||
<th>Inference Time<sup><a href="#latency">[1]</a></sup>(ms)</th>
|
||||
<th>Inference Time(Total model)<sup><a href="#rec">[2]</a></sup>(ms)</th>
|
||||
<th>Acceleration Ratio</th>
|
||||
<th>Model Size(MB)</th>
|
||||
<th>Commpress Ratio</th>
|
||||
<th>Download Link</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td rowspan="2">0</td>
|
||||
<td>Detection</td>
|
||||
<td>MobileNetV3_DB</td>
|
||||
<td>None</td>
|
||||
<td>61.7</td>
|
||||
<td>224</td>
|
||||
<td rowspan="2">375</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td rowspan="2">8.6</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>MobileNetV3_CRNN</td>
|
||||
<td>None</td>
|
||||
<td>62.0</td>
|
||||
<td>9.52</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">1</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>62.1</td>
|
||||
<td>195</td>
|
||||
<td rowspan="2">348</td>
|
||||
<td rowspan="2">8%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">2</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet_quat_pruning</td>
|
||||
<td>Pruning+PACT Quant Aware Training</td>
|
||||
<td>60.86</td>
|
||||
<td>142</td>
|
||||
<td rowspan="2">288</td>
|
||||
<td rowspan="2">30%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PPACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">3</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet_pruning</td>
|
||||
<td>Pruning</td>
|
||||
<td>61.57</td>
|
||||
<td>138</td>
|
||||
<td rowspan="2">295</td>
|
||||
<td rowspan="2">27%</td>
|
||||
<td rowspan="2">2.9</td>
|
||||
<td rowspan="2">66.28%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
|
||||
|
||||
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
|
||||
|
||||
It is recommended that you could understand following pages before reading this example,:
|
||||
|
||||
|
||||
|
||||
\- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
|
||||
|
||||
\- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/)
|
||||
|
||||
|
||||
|
||||
## Install PaddleSlim
|
||||
|
||||
```bash
|
||||
|
||||
git clone https://github.com/PaddlePaddle/PaddleSlim.git
|
||||
|
||||
cd Paddleslim
|
||||
|
||||
python setup.py install
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Download Pretrain Model
|
||||
|
||||
[Download link of Detection pretrain model]()
|
||||
|
||||
|
||||
## Pruning sensitivity analysis
|
||||
|
||||
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, thereby determining the pruning ratio of each network layer. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
|
||||
|
||||
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Model pruning and Fine-tune
|
||||
|
||||
When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Export inference model
|
||||
|
||||
After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
|
||||
|
||||
```bash
|
||||
|
||||
python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
|
||||
|
||||
```
|
|
@ -0,0 +1,67 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..'))
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools'))
|
||||
|
||||
import program
|
||||
from paddle import fluid
|
||||
from ppocr.utils.utility import initial_logger
|
||||
logger = initial_logger()
|
||||
from ppocr.utils.save_load import init_model
|
||||
from paddleslim.prune import load_model
|
||||
|
||||
|
||||
def main():
|
||||
startup_prog, eval_program, place, config, _ = program.preprocess()
|
||||
|
||||
feeded_var_names, target_vars, fetches_var_name = program.build_export(
|
||||
config, eval_program, startup_prog)
|
||||
eval_program = eval_program.clone(for_test=True)
|
||||
exe = fluid.Executor(place)
|
||||
exe.run(startup_prog)
|
||||
|
||||
if config['Global']['checkpoints'] is not None:
|
||||
path = config['Global']['checkpoints']
|
||||
else:
|
||||
path = config['Global']['pretrain_weights']
|
||||
|
||||
load_model(exe, eval_program, path)
|
||||
|
||||
save_inference_dir = config['Global']['save_inference_dir']
|
||||
if not os.path.exists(save_inference_dir):
|
||||
os.makedirs(save_inference_dir)
|
||||
fluid.io.save_inference_model(
|
||||
dirname=save_inference_dir,
|
||||
feeded_var_names=feeded_var_names,
|
||||
main_program=eval_program,
|
||||
target_vars=target_vars,
|
||||
executor=exe,
|
||||
model_filename='model',
|
||||
params_filename='params')
|
||||
print("inference model saved in {}/model and {}/params".format(
|
||||
save_inference_dir, save_inference_dir))
|
||||
print("save success, output_name_list:", fetches_var_name)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,145 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
import numpy as np
|
||||
__dir__ = os.path.dirname(__file__)
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..'))
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools'))
|
||||
|
||||
import tools.program as program
|
||||
from paddle import fluid
|
||||
from ppocr.utils.utility import initial_logger
|
||||
logger = initial_logger()
|
||||
from ppocr.data.reader_main import reader_main
|
||||
from ppocr.utils.save_load import init_model
|
||||
from ppocr.utils.character import CharacterOps
|
||||
from ppocr.utils.utility import initial_logger
|
||||
from paddleslim.prune import Pruner, save_model
|
||||
from paddleslim.analysis import flops
|
||||
from paddleslim.core.graph_wrapper import *
|
||||
from paddleslim.prune import load_sensitivities, get_ratios_by_loss, merge_sensitive
|
||||
logger = initial_logger()
|
||||
|
||||
skip_list = [
|
||||
'conv10_linear_weights', 'conv11_linear_weights', 'conv12_expand_weights',
|
||||
'conv12_linear_weights', 'conv12_se_2_weights', 'conv13_linear_weights',
|
||||
'conv2_linear_weights', 'conv4_linear_weights', 'conv5_expand_weights',
|
||||
'conv5_linear_weights', 'conv5_se_2_weights', 'conv6_linear_weights',
|
||||
'conv7_linear_weights', 'conv8_expand_weights', 'conv8_linear_weights',
|
||||
'conv9_expand_weights', 'conv9_linear_weights'
|
||||
]
|
||||
|
||||
|
||||
def main():
|
||||
config = program.load_config(FLAGS.config)
|
||||
program.merge_config(FLAGS.opt)
|
||||
logger.info(config)
|
||||
|
||||
# check if set use_gpu=True in paddlepaddle cpu version
|
||||
use_gpu = config['Global']['use_gpu']
|
||||
program.check_gpu(use_gpu)
|
||||
|
||||
alg = config['Global']['algorithm']
|
||||
assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE']
|
||||
if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']:
|
||||
config['Global']['char_ops'] = CharacterOps(config['Global'])
|
||||
|
||||
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
|
||||
startup_program = fluid.Program()
|
||||
train_program = fluid.Program()
|
||||
train_build_outputs = program.build(
|
||||
config, train_program, startup_program, mode='train')
|
||||
train_loader = train_build_outputs[0]
|
||||
train_fetch_name_list = train_build_outputs[1]
|
||||
train_fetch_varname_list = train_build_outputs[2]
|
||||
train_opt_loss_name = train_build_outputs[3]
|
||||
|
||||
eval_program = fluid.Program()
|
||||
eval_build_outputs = program.build(
|
||||
config, eval_program, startup_program, mode='eval')
|
||||
eval_fetch_name_list = eval_build_outputs[1]
|
||||
eval_fetch_varname_list = eval_build_outputs[2]
|
||||
eval_program = eval_program.clone(for_test=True)
|
||||
|
||||
train_reader = reader_main(config=config, mode="train")
|
||||
train_loader.set_sample_list_generator(train_reader, places=place)
|
||||
|
||||
eval_reader = reader_main(config=config, mode="eval")
|
||||
|
||||
exe = fluid.Executor(place)
|
||||
exe.run(startup_program)
|
||||
|
||||
# compile program for multi-devices
|
||||
init_model(config, train_program, exe)
|
||||
|
||||
sen = load_sensitivities("sensitivities_0.data")
|
||||
for i in skip_list:
|
||||
sen.pop(i)
|
||||
back_bone_list = ['conv' + str(x) for x in range(1, 5)]
|
||||
for i in back_bone_list:
|
||||
for key in list(sen.keys()):
|
||||
if i + '_' in key:
|
||||
sen.pop(key)
|
||||
ratios = get_ratios_by_loss(sen, 0.03)
|
||||
logger.info("FLOPs before pruning: {}".format(flops(eval_program)))
|
||||
pruner = Pruner(criterion='geometry_median')
|
||||
print("ratios: {}".format(ratios))
|
||||
pruned_val_program, _, _ = pruner.prune(
|
||||
eval_program,
|
||||
fluid.global_scope(),
|
||||
params=ratios.keys(),
|
||||
ratios=ratios.values(),
|
||||
place=place,
|
||||
only_graph=True)
|
||||
|
||||
pruned_program, _, _ = pruner.prune(
|
||||
train_program,
|
||||
fluid.global_scope(),
|
||||
params=ratios.keys(),
|
||||
ratios=ratios.values(),
|
||||
place=place)
|
||||
logger.info("FLOPs after pruning: {}".format(flops(pruned_val_program)))
|
||||
train_compile_program = program.create_multi_devices_program(
|
||||
pruned_program, train_opt_loss_name)
|
||||
|
||||
|
||||
train_info_dict = {'compile_program':train_compile_program,\
|
||||
'train_program':pruned_program,\
|
||||
'reader':train_loader,\
|
||||
'fetch_name_list':train_fetch_name_list,\
|
||||
'fetch_varname_list':train_fetch_varname_list}
|
||||
|
||||
eval_info_dict = {'program':pruned_val_program,\
|
||||
'reader':eval_reader,\
|
||||
'fetch_name_list':eval_fetch_name_list,\
|
||||
'fetch_varname_list':eval_fetch_varname_list}
|
||||
|
||||
if alg in ['EAST', 'DB']:
|
||||
program.train_eval_det_run(
|
||||
config, exe, train_info_dict, eval_info_dict, is_pruning=True)
|
||||
else:
|
||||
program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = program.ArgsParser()
|
||||
FLAGS = parser.parse_args()
|
||||
main()
|
|
@ -0,0 +1,115 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
__dir__ = os.path.dirname(__file__)
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..'))
|
||||
sys.path.append(os.path.join(__dir__, '..', '..', '..', 'tools'))
|
||||
|
||||
import json
|
||||
import cv2
|
||||
from paddle import fluid
|
||||
import paddleslim as slim
|
||||
from copy import deepcopy
|
||||
from tools.eval_utils.eval_det_utils import eval_det_run
|
||||
|
||||
from tools import program
|
||||
from ppocr.utils.utility import initial_logger
|
||||
from ppocr.data.reader_main import reader_main
|
||||
from ppocr.utils.save_load import init_model
|
||||
from ppocr.utils.character import CharacterOps
|
||||
from ppocr.utils.utility import create_module
|
||||
from ppocr.data.reader_main import reader_main
|
||||
|
||||
logger = initial_logger()
|
||||
|
||||
|
||||
def get_pruned_params(program):
|
||||
params = []
|
||||
for param in program.global_block().all_parameters():
|
||||
if len(
|
||||
param.shape
|
||||
) == 4 and 'depthwise' not in param.name and 'transpose' not in param.name:
|
||||
params.append(param.name)
|
||||
return params
|
||||
|
||||
|
||||
def eval_function(eval_args, mode='eval'):
|
||||
exe = eval_args['exe']
|
||||
config = eval_args['config']
|
||||
eval_info_dict = eval_args['eval_info_dict']
|
||||
metrics = eval_det_run(exe, config, eval_info_dict, mode=mode)
|
||||
return metrics['hmean']
|
||||
|
||||
|
||||
def main():
|
||||
config = program.load_config(FLAGS.config)
|
||||
program.merge_config(FLAGS.opt)
|
||||
logger.info(config)
|
||||
|
||||
# check if set use_gpu=True in paddlepaddle cpu version
|
||||
use_gpu = config['Global']['use_gpu']
|
||||
program.check_gpu(use_gpu)
|
||||
|
||||
alg = config['Global']['algorithm']
|
||||
assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE']
|
||||
if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']:
|
||||
config['Global']['char_ops'] = CharacterOps(config['Global'])
|
||||
|
||||
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
|
||||
startup_prog = fluid.Program()
|
||||
eval_program = fluid.Program()
|
||||
eval_build_outputs = program.build(
|
||||
config, eval_program, startup_prog, mode='test')
|
||||
eval_fetch_name_list = eval_build_outputs[1]
|
||||
eval_fetch_varname_list = eval_build_outputs[2]
|
||||
eval_program = eval_program.clone(for_test=True)
|
||||
exe = fluid.Executor(place)
|
||||
exe.run(startup_prog)
|
||||
|
||||
init_model(config, eval_program, exe)
|
||||
|
||||
eval_reader = reader_main(config=config, mode="eval")
|
||||
eval_info_dict = {'program':eval_program,\
|
||||
'reader':eval_reader,\
|
||||
'fetch_name_list':eval_fetch_name_list,\
|
||||
'fetch_varname_list':eval_fetch_varname_list}
|
||||
eval_args = dict()
|
||||
eval_args = {'exe': exe, 'config': config, 'eval_info_dict': eval_info_dict}
|
||||
metrics = eval_function(eval_args)
|
||||
print("Baseline: {}".format(metrics))
|
||||
|
||||
params = get_pruned_params(eval_program)
|
||||
print('Start to analyze')
|
||||
sens_0 = slim.prune.sensitivity(
|
||||
eval_program,
|
||||
place,
|
||||
params,
|
||||
eval_function,
|
||||
sensitivities_file="sensitivities_0.data",
|
||||
pruned_ratios=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
|
||||
eval_args=eval_args,
|
||||
criterion='geometry_median')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = program.ArgsParser()
|
||||
FLAGS = parser.parse_args()
|
||||
main()
|
|
@ -1,21 +1,148 @@
|
|||
> 运行示例前请先安装1.2.0或更高版本PaddleSlim
|
||||
|
||||
|
||||
# 模型量化压缩教程
|
||||
|
||||
压缩结果:
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>序号</th>
|
||||
<th>任务</th>
|
||||
<th>模型</th>
|
||||
<th>压缩策略</th>
|
||||
<th>精度(自建中文数据集)</th>
|
||||
<th>耗时(ms)</th>
|
||||
<th>整体耗时(ms)</th>
|
||||
<th>加速比</th>
|
||||
<th>整体模型大小(M)</th>
|
||||
<th>压缩比例</th>
|
||||
<th>下载链接</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td rowspan="2">0</td>
|
||||
<td>检测</td>
|
||||
<td>MobileNetV3_DB</td>
|
||||
<td>无</td>
|
||||
<td>61.7</td>
|
||||
<td>224</td>
|
||||
<td rowspan="2">375</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td rowspan="2">8.6</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>MobileNetV3_CRNN</td>
|
||||
<td>无</td>
|
||||
<td>62.0</td>
|
||||
<td>9.52</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">1</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>62.1</td>
|
||||
<td>195</td>
|
||||
<td rowspan="2">348</td>
|
||||
<td rowspan="2">8%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">2</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet_quat_pruning</td>
|
||||
<td>剪裁+PACT量化训练</td>
|
||||
<td>60.86</td>
|
||||
<td>142</td>
|
||||
<td rowspan="2">288</td>
|
||||
<td rowspan="2">30%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">3</td>
|
||||
<td>检测</td>
|
||||
<td>SlimTextDet_pruning</td>
|
||||
<td>剪裁</td>
|
||||
<td>61.57</td>
|
||||
<td>138</td>
|
||||
<td rowspan="2">295</td>
|
||||
<td rowspan="2">27%</td>
|
||||
<td rowspan="2">2.9</td>
|
||||
<td rowspan="2">66.28%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>识别</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT量化训练</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
## 概述
|
||||
|
||||
复杂的模型有利于提高模型的性能,但也导致模型中存在一定冗余,模型量化将全精度缩减到定点数减少这种冗余,达到减少模型计算复杂度,提高模型推理性能的目的。
|
||||
|
||||
该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对OCR模型进行压缩。
|
||||
在阅读该示例前,建议您先了解以下内容:
|
||||
|
||||
- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
|
||||
- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
|
||||
- [PaddleSlim使用文档](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
|
||||
|
||||
|
||||
|
||||
## 安装PaddleSlim
|
||||
可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。
|
||||
|
||||
```bash
|
||||
git clone https://github.com/PaddlePaddle/PaddleSlim.git
|
||||
|
||||
cd Paddleslim
|
||||
|
||||
python setup.py install
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 获取预训练模型
|
||||
|
||||
[识别预训练模型下载地址]()
|
||||
|
||||
[检测预训练模型下载地址]()
|
||||
|
||||
|
||||
## 量化训练
|
||||
加载预训练模型后,在定义好量化策略后即可对模型进行量化。量化相关功能的使用具体细节见:[模型量化](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html)
|
||||
|
||||
进入PaddleOCR根目录,通过以下命令对模型进行量化:
|
||||
|
||||
|
@ -25,10 +152,11 @@ python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global
|
|||
|
||||
|
||||
|
||||
## 评估并导出
|
||||
|
||||
## 导出模型
|
||||
|
||||
在得到量化训练保存的模型后,我们可以将其导出为inference_model,用于预测部署:
|
||||
|
||||
```bash
|
||||
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_model
|
||||
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
|
||||
```
|
||||
|
|
|
@ -0,0 +1,167 @@
|
|||
\> PaddleSlim 1.2.0 or higher version should be installed before runing this example.
|
||||
|
||||
|
||||
|
||||
# Model compress tutorial (Quantization)
|
||||
|
||||
Compress results:
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>ID</th>
|
||||
<th>Task</th>
|
||||
<th>Model</th>
|
||||
<th>Compress Strategy</th>
|
||||
<th>Criterion(Chinese dataset)</th>
|
||||
<th>Inference Time(ms)</th>
|
||||
<th>Inference Time(Total model)(ms)</th>
|
||||
<th>Acceleration Ratio</th>
|
||||
<th>Model Size(MB)</th>
|
||||
<th>Commpress Ratio</th>
|
||||
<th>Download Link</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td rowspan="2">0</td>
|
||||
<td>Detection</td>
|
||||
<td>MobileNetV3_DB</td>
|
||||
<td>None</td>
|
||||
<td>61.7</td>
|
||||
<td>224</td>
|
||||
<td rowspan="2">375</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td rowspan="2">8.6</td>
|
||||
<td rowspan="2">-</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>MobileNetV3_CRNN</td>
|
||||
<td>None</td>
|
||||
<td>62.0</td>
|
||||
<td>9.52</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">1</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>62.1</td>
|
||||
<td>195</td>
|
||||
<td rowspan="2">348</td>
|
||||
<td rowspan="2">8%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">2</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet_quat_pruning</td>
|
||||
<td>Pruning+PACT Quant Aware Training</td>
|
||||
<td>60.86</td>
|
||||
<td>142</td>
|
||||
<td rowspan="2">288</td>
|
||||
<td rowspan="2">30%</td>
|
||||
<td rowspan="2">2.8</td>
|
||||
<td rowspan="2">67.82%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PPACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">3</td>
|
||||
<td>Detection</td>
|
||||
<td>SlimTextDet_pruning</td>
|
||||
<td>Pruning</td>
|
||||
<td>61.57</td>
|
||||
<td>138</td>
|
||||
<td rowspan="2">295</td>
|
||||
<td rowspan="2">27%</td>
|
||||
<td rowspan="2">2.9</td>
|
||||
<td rowspan="2">66.28%</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Recognition</td>
|
||||
<td>SlimTextRec</td>
|
||||
<td>PACT Quant Aware Training</td>
|
||||
<td>61.48</td>
|
||||
<td>8.6</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancyby reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
|
||||
|
||||
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model.
|
||||
|
||||
It is recommended that you could understand following pages before reading this example,:
|
||||
|
||||
|
||||
|
||||
- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
|
||||
|
||||
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
|
||||
|
||||
|
||||
|
||||
## Install PaddleSlim
|
||||
|
||||
```bash
|
||||
git clone https://github.com/PaddlePaddle/PaddleSlim.git
|
||||
|
||||
cd Paddleslim
|
||||
|
||||
python setup.py install
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Download Pretrain Model
|
||||
|
||||
[Download link of Detection pretrain model]()
|
||||
|
||||
[Download link of recognization pretrain model]()
|
||||
|
||||
|
||||
## Quan-Aware Training
|
||||
|
||||
After loading the pre training model, the model can be quantified after defining the quantization strategy. For specific details of quantization method, see:[Model Quantization](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html)
|
||||
|
||||
Enter the PaddleOCR root directory,perform model quantization with the following command:
|
||||
|
||||
```bash
|
||||
python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Export inference model
|
||||
|
||||
After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
|
||||
|
||||
```bash
|
||||
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
|
||||
```
|
After Width: | Height: | Size: 404 KiB |
|
@ -0,0 +1,78 @@
|
|||
<a name="算法介绍"></a>
|
||||
## 算法介绍
|
||||
- [1.文本检测算法](#文本检测算法)
|
||||
- [2.文本识别算法](#文本识别算法)
|
||||
|
||||
<a name="文本检测算法"></a>
|
||||
### 1.文本检测算法
|
||||
|
||||
PaddleOCR开源的文本检测算法列表:
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐)
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
||||
|
||||
|
||||
使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下:
|
||||
|
||||
|模型|骨干网络|配置文件|预训练模型|
|
||||
|-|-|-|-|
|
||||
|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|
||||
|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
|
||||
|
||||
* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
|
||||
|
||||
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./detection.md)。
|
||||
|
||||
<a name="文本识别算法"></a>
|
||||
### 2.文本识别算法
|
||||
|
||||
PaddleOCR开源的文本识别算法列表:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐)
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))
|
||||
|
||||
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||
|
||||
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA)上下载,提取码: y3ry。
|
||||
原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。
|
||||
|
||||
使用[LSVT](./datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下:
|
||||
|
||||
|模型|骨干网络|配置文件|预训练模型|
|
||||
|-|-|-|-|
|
||||
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|
||||
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
|
||||
|
||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
|
@ -0,0 +1,127 @@
|
|||
## 文字角度分类
|
||||
|
||||
### 数据准备
|
||||
|
||||
请按如下步骤设置数据集:
|
||||
|
||||
训练数据的默认存储路径是 `PaddleOCR/train_data/cls`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
|
||||
|
||||
```
|
||||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
|
||||
```
|
||||
|
||||
请参考下文组织您的数据。
|
||||
- 训练集
|
||||
|
||||
首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(cls_gt_train.txt)记录图片路径和标签。
|
||||
|
||||
**注意:** 默认请将图片路径和图片标签用 `\t` 分割,如用其他方式分割将造成训练报错
|
||||
|
||||
0和180分别表示图片的角度为0度和180度
|
||||
|
||||
```
|
||||
" 图像文件名 图像标注信息 "
|
||||
|
||||
train_data/cls/word_001.jpg 0
|
||||
train_data/cls/word_002.jpg 180
|
||||
```
|
||||
|
||||
最终训练集应有如下文件结构:
|
||||
```
|
||||
|-train_data
|
||||
|-cls
|
||||
|- cls_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- 测试集
|
||||
|
||||
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个cls_gt_test.txt,测试集的结构如下所示:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-cls
|
||||
|- 和一个cls_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
### 启动训练
|
||||
|
||||
PaddleOCR提供了训练脚本、评估脚本和预测脚本。
|
||||
|
||||
开始训练:
|
||||
|
||||
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
||||
|
||||
```
|
||||
# 设置PYTHONPATH路径
|
||||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
# 启动训练
|
||||
python3 tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- 数据增强
|
||||
|
||||
PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。
|
||||
|
||||
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。
|
||||
|
||||
训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考:
|
||||
[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py)
|
||||
[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
|
||||
|
||||
*由于OpenCV的兼容性问题,扰动操作暂时只支持linux*
|
||||
|
||||
### 训练
|
||||
|
||||
PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/cls_mv3/best_accuracy` 。
|
||||
|
||||
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。
|
||||
|
||||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
### 评估
|
||||
|
||||
评估数据集可以通过`configs/cls/cls_reader.yml` 修改EvalReader中的 `label_file_path` 设置。
|
||||
|
||||
*注意* 评估时必须确保配置文件中 infer_img 字段为空
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
# GPU 评估, Global.checkpoints 为待测权重
|
||||
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
### 预测
|
||||
|
||||
* 训练引擎的预测
|
||||
|
||||
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
|
||||
|
||||
默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 指定权重:
|
||||
|
||||
```
|
||||
# 预测分类结果
|
||||
python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
|
||||
```
|
||||
|
||||
预测图片:
|
||||
|
||||

|
||||
|
||||
得到输入图像的预测结果:
|
||||
|
||||
```
|
||||
infer_img: doc/imgs_words/en/word_1.png
|
||||
scores: [[0.93161047 0.06838956]]
|
||||
label: [0]
|
||||
```
|
|
@ -1,29 +1,51 @@
|
|||
# Benchmark
|
||||
|
||||
本文给出了PaddleOCR超轻量中文模型(8.6M)在各平台的预测耗时benchmark。
|
||||
本文给出了中英文OCR系列模型精度指标和在各平台预测耗时的benchmark。
|
||||
|
||||
## 测试数据
|
||||
- 从中文公开数据集[ICDAR2017-RCTW](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#ICDAR2017-RCTW-17)中随机采样**500**张图像。
|
||||
该集合大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。
|
||||
针对OCR实际应用场景,包括合同,车牌,铭牌,火车票,化验单,表格,证书,街景文字,名片,数码显示屏等,收集的300张图像,每张图平均有17个文本框,下图给出了一些图像示例。
|
||||
|
||||
<div align="center">
|
||||
<img src="../datasets/doc.jpg" width = "1000" height = "500" />
|
||||
</div>
|
||||
|
||||
## 评估指标
|
||||
在四种平台上的预测耗时指标如下:
|
||||
|
||||
|长边尺寸(px)|T4(s)|V100(s)|Intel至强6148(s)|骁龙855(s)|
|
||||
|-|-|-|-|-|
|
||||
|960|0.092|0.057|0.319|0.354|
|
||||
|640|0.067|0.045|0.198|0.236|
|
||||
|480|0.057|0.043|0.151|0.175|
|
||||
|
||||
说明:
|
||||
说明:
|
||||
- v1.0是未添加优化策略的DB+CRNN模型,v1.1是添加多种优化策略和方向分类器的PP-OCR模型。slim_v1.1是使用裁剪或量化的模型。
|
||||
- 检测输入图像的的长边尺寸是960。
|
||||
- 评估耗时阶段为图像输入到结果输出的完整阶段,包括了图像的预处理和后处理。
|
||||
- `Intel至强6148`为服务器端CPU型号,测试中使用Intel MKL-DNN 加速CPU预测速度,使用该操作需要:
|
||||
- 更新到飞桨latest版本:https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev ,请根据自己环境的CUDA版本和Python版本选择相应的mkl版wheel包,如,CUDA10、Python3.7环境,应操作:
|
||||
```shell
|
||||
# 获取安装包
|
||||
wget https://paddle-wheel.bj.bcebos.com/0.0.0-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
# 安装
|
||||
pip3.7 install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
```
|
||||
- 预测时使用参数打开加速开关: `--enable_mkldnn True`
|
||||
- `Intel至强6148`为服务器端CPU型号,测试中使用Intel MKL-DNN 加速。
|
||||
- `骁龙855`为移动端处理平台型号。
|
||||
|
||||
不同预测模型大小和整体识别精度对比
|
||||
|
||||
| 模型名称 | 整体模型<br>大小\(M\) | 检测模型<br>大小\(M\) | 方向分类器<br>模型大小\(M\) | 识别模型<br>大小\(M\) | 整体识别<br>F\-score |
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 |
|
||||
| ch\_ppocr\_server\_v1\.1 | 155\.1 | 47\.2 | 0\.9 | 107 | 0\.5414 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 8\.6 | 4\.1 | \- | 4\.5 | 0\.393 |
|
||||
| ch\_ppocr\_server\_v1\.0 | 203\.8 | 98\.5 | \- | 105\.3 | 0\.4436 |
|
||||
|
||||
不同预测模型在T4 GPU上预测速度对比,单位ms
|
||||
|
||||
| 模型名称 | 整体 | 检测 | 方向分类器 | 识别 |
|
||||
|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 137 | 35 | 24 | 78 |
|
||||
| ch\_ppocr\_server\_v1\.1 | 204 | 39 | 25 | 140 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 117 | 41 | \- | 76 |
|
||||
| ch\_ppocr\_server\_v1\.0 | 199 | 52 | \- | 147 |
|
||||
|
||||
不同预测模型在CPU上预测速度对比,单位ms
|
||||
|
||||
| 模型名称 | 整体 | 检测 | 方向分类器 | 识别 |
|
||||
|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 421 | 164 | 51 | 206 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 398 | 219 | \- | 179 |
|
||||
|
||||
裁剪量化模型和原始模型模型大小,整体识别精度和在SD 855上预测速度对比
|
||||
|
||||
| 模型名称 | 整体模型<br>大小\(M\) | 检测模型<br>大小\(M\) | 方向分类器<br>模型大小\(M\) | 识别模型<br>大小\(M\) | 整体识别<br>F\-score | SD 855<br>\(ms\) |
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | 306 |
|
||||
| ch\_ppocr\_mobile\_slim\_v1\.1 | 3\.5 | 1\.4 | 0\.5 | 1\.6 | 0\.521 | 268 |
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
PaddleOCR提供了EAST、DB两种文本检测算法,均支持MobileNetV3、ResNet50_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的DB检测模型(即超轻量模型使用的配置):
|
||||
```
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee det_db.log
|
||||
```
|
||||
更详细的数据准备和训练教程参考文档教程中[文本检测模型训练/评估/预测](./detection.md)。
|
||||
|
||||
|
@ -14,7 +14,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml
|
|||
|
||||
PaddleOCR提供了CRNN、Rosetta、STAR-Net、RARE四种文本识别算法,均支持MobileNetV3、ResNet34_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的CRNN识别模型(即超轻量模型使用的配置):
|
||||
```
|
||||
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
|
||||
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml 2>&1 | tee rec_ch_lite.log
|
||||
```
|
||||
更详细的数据准备和训练教程参考文档教程中[文本识别模型训练/评估/预测](./recognition.md)。
|
||||
|
||||
|
|
|
@ -14,6 +14,15 @@ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_l
|
|||
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
|
||||
```
|
||||
|
||||
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例:
|
||||
|
||||
```
|
||||
# 将官网下载的标签文件转换为 train_icdar2015_label.txt
|
||||
python gen_label.py --mode="det" --root_path="icdar_c4_train_imgs/" \
|
||||
--input_path="ch4_training_localization_transcription_gt" \
|
||||
--output_label="train_icdar2015_label.txt"
|
||||
```
|
||||
|
||||
解压数据集和下载标注文件后,PaddleOCR/train_data/ 有两个文件夹和两个文件,分别是:
|
||||
```
|
||||
/PaddleOCR/train_data/icdar2015/text_localization/
|
||||
|
@ -62,7 +71,10 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
|
|||
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
||||
|
||||
```shell
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||
# 训练 mv3_db 模型,并将训练日志保存为 tain_det.log
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ \
|
||||
2>&1 | tee train_det.log
|
||||
```
|
||||
|
||||
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。
|
||||
|
|
After Width: | Height: | Size: 261 KiB |
|
@ -11,24 +11,29 @@ inference 模型(`fluid.io.save_inference_model`保存的模型)
|
|||
- [一、训练模型转inference模型](#训练模型转inference模型)
|
||||
- [检测模型转inference模型](#检测模型转inference模型)
|
||||
- [识别模型转inference模型](#识别模型转inference模型)
|
||||
|
||||
- [方向分类模型转inference模型](#方向分类模型转inference模型)
|
||||
|
||||
- [二、文本检测模型推理](#文本检测模型推理)
|
||||
- [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理)
|
||||
- [2. DB文本检测模型推理](#DB文本检测模型推理)
|
||||
- [3. EAST文本检测模型推理](#EAST文本检测模型推理)
|
||||
- [4. SAST文本检测模型推理](#SAST文本检测模型推理)
|
||||
|
||||
|
||||
- [三、文本识别模型推理](#文本识别模型推理)
|
||||
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
||||
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
||||
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
|
||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
|
||||
- [四、文本检测、识别串联推理](#文本检测、识别串联推理)
|
||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [5. 多语言模型的推理](#多语言模型的推理)
|
||||
|
||||
- [四、方向分类模型推理](#方向识别模型推理)
|
||||
- [1. 方向分类模型推理](#方向分类模型推理)
|
||||
|
||||
- [五、文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理)
|
||||
- [1. 超轻量中文OCR模型推理](#超轻量中文OCR模型推理)
|
||||
- [2. 其他模型推理](#其他模型推理)
|
||||
|
||||
|
||||
|
||||
|
||||
<a name="训练模型转inference模型"></a>
|
||||
## 一、训练模型转inference模型
|
||||
<a name="检测模型转inference模型"></a>
|
||||
|
@ -84,6 +89,32 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa
|
|||
└─ params 识别inference模型的参数文件
|
||||
```
|
||||
|
||||
<a name="方向分类模型转inference模型"></a>
|
||||
### 方向分类模型转inference模型
|
||||
|
||||
下载方向分类模型:
|
||||
```
|
||||
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/
|
||||
```
|
||||
|
||||
方向分类模型转inference模型与检测的方式相同,如下:
|
||||
```
|
||||
# -c后面设置训练算法的yml配置文件
|
||||
# -o配置可选参数
|
||||
# Global.checkpoints参数设置待转换的训练模型地址,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
|
||||
# Global.save_inference_dir参数设置转换的模型将保存的地址。
|
||||
|
||||
python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \
|
||||
Global.save_inference_dir=./inference/cls/
|
||||
```
|
||||
|
||||
转换成功后,在目录下有两个文件:
|
||||
```
|
||||
/inference/cls/
|
||||
└─ model 识别inference模型的program文件
|
||||
└─ params 识别inference模型的参数文件
|
||||
```
|
||||
|
||||
<a name="文本检测模型推理"></a>
|
||||
## 二、文本检测模型推理
|
||||
|
||||
|
@ -275,15 +306,52 @@ dict_character = list(self.character_str)
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
<a name="文本检测、识别串联推理"></a>
|
||||
## 四、文本检测、识别串联推理
|
||||
<a name="多语言模型的推理"></a>
|
||||
### 5. 多语言模型的推理
|
||||
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
||||
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/korean_dict.txt" --vis_font_path="doc/korean.ttf"
|
||||
```
|
||||

|
||||
|
||||
执行命令后,上图的预测结果为:
|
||||
``` text
|
||||
2020-09-19 16:15:05,076-INFO: index: [205 206 38 39]
|
||||
2020-09-19 16:15:05,077-INFO: word : 바탕으로
|
||||
2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535
|
||||
```
|
||||
|
||||
<a name="方向分类模型推理"></a>
|
||||
## 四、方向分类模型推理
|
||||
|
||||
下面将介绍方向分类模型推理。
|
||||
|
||||
<a name="方向分类模型推理"></a>
|
||||
### 1. 方向分类模型推理
|
||||
|
||||
方向分类模型推理,可以执行如下命令:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="./inference/cls/"
|
||||
```
|
||||
|
||||

|
||||
|
||||
执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
|
||||
|
||||
Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963]
|
||||
|
||||
<a name="文本检测、方向分类和文字识别串联推理"></a>
|
||||
## 五、文本检测、方向分类和文字识别串联推理
|
||||
<a name="超轻量中文OCR模型推理"></a>
|
||||
### 1. 超轻量中文OCR模型推理
|
||||
|
||||
在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。
|
||||
在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。可视化识别结果默认保存到 ./inference_results 文件夹里面。
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
|
||||
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls true
|
||||
```
|
||||
|
||||
执行命令后,识别结果图像如下:
|
||||
|
|
|
@ -7,7 +7,7 @@ PaddleOCR 工作环境
|
|||
- glibc 2.23
|
||||
- cuDNN 7.6+ (GPU)
|
||||
|
||||
建议使用我们提供的docker运行PaddleOCR,有关docker、nvidia-docker使用请参考[链接](https://docs.docker.com/get-started/)。
|
||||
建议使用我们提供的docker运行PaddleOCR,有关docker、nvidia-docker使用请参考[链接](https://www.runoob.com/docker/docker-tutorial.html/)。
|
||||
|
||||
*如您希望使用 mac 或 windows直接运行预测代码,可以从第2步开始执行。*
|
||||
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
## OCR模型列表(V1.1,9月22日更新)
|
||||
|
||||
- [一、文本检测模型](#文本检测模型)
|
||||
- [二、文本识别模型](#文本识别模型)
|
||||
- [1. 中文识别模型](#中文识别模型)
|
||||
- [2. 英文识别模型](#英文识别模型)
|
||||
- [3. 多语言识别模型](#多语言识别模型)
|
||||
- [三、文本方向分类模型](#文本方向分类模型)
|
||||
|
||||
PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训练模型`、`slim模型`,模型区别说明如下:
|
||||
|
||||
|模型类型|模型格式|简介|
|
||||
|-|-|-|
|
||||
|推理模型|model、params|用于python预测引擎推理,[详情](./inference.md)|
|
||||
|训练模型、预训练模型|\*.pdmodel、\*.pdopt、\*.pdparams|训练过程中保存的checkpoints模型,保存的是模型的参数,多用于模型指标评估和恢复训练|
|
||||
|slim模型|\*.nb|用于lite部署|
|
||||
|
||||
|
||||
<a name="文本检测模型"></a>
|
||||
### 一、文本检测模型
|
||||
|模型名称|模型简介|推理模型大小|下载地址|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_slim_v1.1_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|1.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|
|
||||
|ch_ppocr_mobile_v1.1_det|原始超轻量模型,支持中英文、多语种文本检测|2.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|
|
||||
|ch_ppocr_server_v1.1_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|47.2M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)|
|
||||
|
||||
|
||||
<a name="文本识别模型"></a>
|
||||
### 二、文本识别模型
|
||||
|
||||
<a name="中文识别模型"></a>
|
||||
#### 1. 中文识别模型
|
||||
|模型名称|模型简介|推理模型大小|下载地址|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|1.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|
|
||||
|ch_ppocr_mobile_v1.1_rec|原始超轻量模型,支持中英文、数字识别|4.6M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)|
|
||||
|ch_ppocr_server_v1.1_rec|通用模型,支持中英文、数字识别|105M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)|
|
||||
|
||||
**说明:** `训练模型`是基于预训练模型在真实数据与竖排合成文本数据上finetune得到的模型,在真实应用场景中有着更好的表现,`预训练模型`则是直接基于全量真实数据与合成数据训练得到,更适合用于在自己的数据集上finetune。
|
||||
|
||||
<a name="英文识别模型"></a>
|
||||
#### 2. 英文识别模型
|
||||
|模型名称|模型简介|推理模型大小|下载地址|
|
||||
|-|-|-|-|
|
||||
|en_ppocr_mobile_slim_v1.1_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|0.9M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)|
|
||||
|en_ppocr_mobile_v1.1_rec|原始超轻量模型,支持英文、数字识别|2.0M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
|
||||
<a name="多语言识别模型"></a>
|
||||
#### 3. 多语言识别模型(更多语言持续更新中...)
|
||||
|模型名称|模型简介|推理模型大小|下载地址|
|
||||
|-|-|-|-|
|
||||
| french_ppocr_mobile_v1.1_rec |法文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| german_ppocr_mobile_v1.1_rec |德文识别|2.1M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| korean_ppocr_mobile_v1.1_rec |韩文识别|3.4M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| japan_ppocr_mobile_v1.1_rec |日文识别|3.7M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
|
||||
|
||||
<a name="文本方向分类模型"></a>
|
||||
### 三、文本方向分类模型
|
||||
|模型名称|模型简介|推理模型大小|下载地址|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_v1.1_cls_quant|slim量化版模型|0.5M|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim模型]()|
|
||||
|ch_ppocr_mobile_v1.1_cls|原始模型|850kb|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)|
|
||||
|
||||
|
||||
## OCR模型列表(V1.0,7月16日更新)
|
||||
|
||||
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
|
||||
|-|-|-|-|-|
|
||||
|chinese_db_crnn_mobile|8.6M超轻量级中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|
||||
|chinese_db_crnn_server|通用中文OCR模型|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[推理模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
|
|
@ -44,6 +44,13 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
|
|||
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
|
||||
```
|
||||
|
||||
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例:
|
||||
|
||||
```
|
||||
# 将官网下载的标签文件转换为 rec_gt_label.txt
|
||||
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
|
||||
```
|
||||
|
||||
最终训练集应有如下文件结构:
|
||||
```
|
||||
|-train_data
|
||||
|
@ -128,8 +135,8 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
|
|||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
# 训练icdar15英文数据
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
|
||||
```
|
||||
|
||||
- 数据增强
|
||||
|
@ -201,7 +208,19 @@ Optimizer:
|
|||
```
|
||||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
- 小语种
|
||||
|
||||
PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的提供了多语言的配置文件,目前PaddleOCR支持的多语言算法有:
|
||||
|
||||
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language |
|
||||
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
|
||||
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语 |
|
||||
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 |
|
||||
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 |
|
||||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
|
||||
|
||||
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体和测试数据可以在[百度网盘]()上下载。
|
||||
|
||||
### 评估
|
||||
|
||||
|
|
|
@ -0,0 +1,208 @@
|
|||
# 整体目录结构
|
||||
|
||||
PaddleOCR 的整体目录结构介绍如下:
|
||||
|
||||
```
|
||||
PaddleOCR
|
||||
├── configs // 配置文件,可通过yml文件选择模型结构并修改超参
|
||||
│ ├── cls // 方向分类器相关配置文件
|
||||
│ │ ├── cls_mv3.yml // 训练配置相关,包括骨干网络、head、loss、优化器
|
||||
│ │ └── cls_reader.yml // 数据读取相关,数据读取方式、数据存储路径
|
||||
│ ├── det // 检测相关配置文件
|
||||
│ │ ├── det_db_icdar15_reader.yml // 数据读取
|
||||
│ │ ├── det_mv3_db.yml // 训练配置
|
||||
│ │ ...
|
||||
│ └── rec // 识别相关配置文件
|
||||
│ ├── rec_benchmark_reader.yml // LMDB 格式数据读取相关
|
||||
│ ├── rec_chinese_common_train.yml // 通用中文训练配置
|
||||
│ ├── rec_icdar15_reader.yml // simple 数据读取相关,包括数据读取函数、数据路径、标签文件
|
||||
│ ...
|
||||
├── deploy // 部署相关
|
||||
│ ├── android_demo // android_demo
|
||||
│ │ ...
|
||||
│ ├── cpp_infer // C++ infer
|
||||
│ │ ├── CMakeLists.txt // Cmake 文件
|
||||
│ │ ├── docs // 说明文档
|
||||
│ │ │ └── windows_vs2019_build.md
|
||||
│ │ ├── include // 头文件
|
||||
│ │ │ ├── clipper.h // clipper 库
|
||||
│ │ │ ├── config.h // 预测配置
|
||||
│ │ │ ├── ocr_cls.h // 方向分类器
|
||||
│ │ │ ├── ocr_det.h // 文字检测
|
||||
│ │ │ ├── ocr_rec.h // 文字识别
|
||||
│ │ │ ├── postprocess_op.h // 检测后处理
|
||||
│ │ │ ├── preprocess_op.h // 检测预处理
|
||||
│ │ │ └── utility.h // 工具
|
||||
│ │ ├── readme.md // 说明文档
|
||||
│ │ ├── ...
|
||||
│ │ ├── src // 源文件
|
||||
│ │ │ ├── clipper.cpp
|
||||
│ │ │ ├── config.cpp
|
||||
│ │ │ ├── main.cpp
|
||||
│ │ │ ├── ocr_cls.cpp
|
||||
│ │ │ ├── ocr_det.cpp
|
||||
│ │ │ ├── ocr_rec.cpp
|
||||
│ │ │ ├── postprocess_op.cpp
|
||||
│ │ │ ├── preprocess_op.cpp
|
||||
│ │ │ └── utility.cpp
|
||||
│ │ └── tools // 编译、执行脚本
|
||||
│ │ ├── build.sh // 编译脚本
|
||||
│ │ ├── config.txt // 配置文件
|
||||
│ │ └── run.sh // 测试启动脚本
|
||||
│ ├── docker
|
||||
│ │ └── hubserving
|
||||
│ │ ├── cpu
|
||||
│ │ │ └── Dockerfile
|
||||
│ │ ├── gpu
|
||||
│ │ │ └── Dockerfile
|
||||
│ │ ├── README_cn.md
|
||||
│ │ ├── README.md
|
||||
│ │ └── sample_request.txt
|
||||
│ ├── hubserving // hubserving
|
||||
│ │ ├── ocr_det // 文字检测
|
||||
│ │ │ ├── config.json // serving 配置
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── module.py // 预测模型
|
||||
│ │ │ └── params.py // 预测参数
|
||||
│ │ ├── ocr_rec // 文字识别
|
||||
│ │ │ ├── config.json
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── module.py
|
||||
│ │ │ └── params.py
|
||||
│ │ └── ocr_system // 系统预测
|
||||
│ │ ├── config.json
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── module.py
|
||||
│ │ └── params.py
|
||||
│ ├── imgs // 预测图片
|
||||
│ │ ├── cpp_infer_pred_12.png
|
||||
│ │ └── demo.png
|
||||
│ ├── ios_demo // ios demo
|
||||
│ │ ...
|
||||
│ ├── lite // lite 部署
|
||||
│ │ ├── cls_process.cc // 方向分类器数据处理
|
||||
│ │ ├── cls_process.h
|
||||
│ │ ├── config.txt // 检测配置参数
|
||||
│ │ ├── crnn_process.cc // crnn数据处理
|
||||
│ │ ├── crnn_process.h
|
||||
│ │ ├── db_post_process.cc // db数据处理
|
||||
│ │ ├── db_post_process.h
|
||||
│ │ ├── Makefile // 编译文件
|
||||
│ │ ├── ocr_db_crnn.cc // 串联预测
|
||||
│ │ ├── prepare.sh // 数据准备
|
||||
│ │ ├── readme.md // 说明文档
|
||||
│ │ ...
|
||||
│ ├── pdserving // pdserving 部署
|
||||
│ │ ├── det_local_server.py // 检测 快速版,部署方便预测速度快
|
||||
│ │ ├── det_web_server.py // 检测 完整版,稳定性高分布式部署
|
||||
│ │ ├── ocr_local_server.py // 检测+识别 快速版
|
||||
│ │ ├── ocr_web_client.py // 客户端
|
||||
│ │ ├── ocr_web_server.py // 检测+识别 完整版
|
||||
│ │ ├── readme.md // 说明文档
|
||||
│ │ ├── rec_local_server.py // 识别 快速版
|
||||
│ │ └── rec_web_server.py // 识别 完整版
|
||||
│ └── slim
|
||||
│ └── quantization // 量化相关
|
||||
│ ├── export_model.py // 导出模型
|
||||
│ ├── quant.py // 量化
|
||||
│ └── README.md // 说明文档
|
||||
├── doc // 文档教程
|
||||
│ ...
|
||||
├── paddleocr.py
|
||||
├── ppocr // 网络核心代码
|
||||
│ ├── data // 数据处理
|
||||
│ │ ├── cls // 方向分类器
|
||||
│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch
|
||||
│ │ │ └── randaugment.py // 随机数据增广操作
|
||||
│ │ ├── det // 检测
|
||||
│ │ │ ├── data_augment.py // 数据增广操作
|
||||
│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch
|
||||
│ │ │ ├── db_process.py // db 数据处理
|
||||
│ │ │ ├── east_process.py // east 数据处理
|
||||
│ │ │ ├── make_border_map.py // 生成边界图
|
||||
│ │ │ ├── make_shrink_map.py // 生成收缩图
|
||||
│ │ │ ├── random_crop_data.py // 随机切割
|
||||
│ │ │ └── sast_process.py // sast 数据处理
|
||||
│ │ ├── reader_main.py // 数据读取器主函数
|
||||
│ │ └── rec // 识别
|
||||
│ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,包含 LMDB_Reader 和 Simple_Reader
|
||||
│ │ └── img_tools.py // 数据处理相关,包括数据归一化、扰动
|
||||
│ ├── __init__.py
|
||||
│ ├── modeling // 组网相关
|
||||
│ │ ├── architectures // 模型架构,定义模型所需的各个模块
|
||||
│ │ │ ├── cls_model.py // 方向分类器
|
||||
│ │ │ ├── det_model.py // 检测
|
||||
│ │ │ └── rec_model.py // 识别
|
||||
│ │ ├── backbones // 骨干网络
|
||||
│ │ │ ├── det_mobilenet_v3.py // 检测 mobilenet_v3
|
||||
│ │ │ ├── det_resnet_vd.py
|
||||
│ │ │ ├── det_resnet_vd_sast.py
|
||||
│ │ │ ├── rec_mobilenet_v3.py // 识别 mobilenet_v3
|
||||
│ │ │ ├── rec_resnet_fpn.py
|
||||
│ │ │ └── rec_resnet_vd.py
|
||||
│ │ ├── common_functions.py // 公共函数
|
||||
│ │ ├── heads // 头函数
|
||||
│ │ │ ├── cls_head.py // 分类头
|
||||
│ │ │ ├── det_db_head.py // db 检测头
|
||||
│ │ │ ├── det_east_head.py // east 检测头
|
||||
│ │ │ ├── det_sast_head.py // sast 检测头
|
||||
│ │ │ ├── rec_attention_head.py // 识别 attention
|
||||
│ │ │ ├── rec_ctc_head.py // 识别 ctc
|
||||
│ │ │ ├── rec_seq_encoder.py // 识别 序列编码
|
||||
│ │ │ ├── rec_srn_all_head.py // 识别 srn 相关
|
||||
│ │ │ └── self_attention // srn attention
|
||||
│ │ │ └── model.py
|
||||
│ │ ├── losses // 损失函数
|
||||
│ │ │ ├── cls_loss.py // 方向分类器损失函数
|
||||
│ │ │ ├── det_basic_loss.py // 检测基础loss
|
||||
│ │ │ ├── det_db_loss.py // DB loss
|
||||
│ │ │ ├── det_east_loss.py // EAST loss
|
||||
│ │ │ ├── det_sast_loss.py // SAST loss
|
||||
│ │ │ ├── rec_attention_loss.py // attention loss
|
||||
│ │ │ ├── rec_ctc_loss.py // ctc loss
|
||||
│ │ │ └── rec_srn_loss.py // srn loss
|
||||
│ │ └── stns // 空间变换网络
|
||||
│ │ └── tps.py // TPS 变换
|
||||
│ ├── optimizer.py // 优化器
|
||||
│ ├── postprocess // 后处理
|
||||
│ │ ├── db_postprocess.py // DB 后处理
|
||||
│ │ ├── east_postprocess.py // East 后处理
|
||||
│ │ ├── lanms // lanms 相关
|
||||
│ │ │ ...
|
||||
│ │ ├── locality_aware_nms.py // nms
|
||||
│ │ └── sast_postprocess.py // sast 后处理
|
||||
│ └── utils // 工具
|
||||
│ ├── character.py // 字符处理,包括对文本的编码和解码,计算预测准确率
|
||||
│ ├── check.py // 参数加载检查
|
||||
│ ├── ic15_dict.txt // 英文数字字典,区分大小写
|
||||
│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型
|
||||
│ ├── save_load.py // 模型保存和加载函数
|
||||
│ ├── stats.py // 统计
|
||||
│ └── utility.py // 工具函数,包含输入参数是否合法等相关检查工具
|
||||
├── README_en.md // 说明文档
|
||||
├── README.md
|
||||
├── requirments.txt // 安装依赖
|
||||
├── setup.py // whl包打包脚本
|
||||
└── tools // 启动工具
|
||||
├── eval.py // 评估函数
|
||||
├── eval_utils // 评估工具
|
||||
│ ├── eval_cls_utils.py // 分类相关
|
||||
│ ├── eval_det_iou.py // 检测 iou 相关
|
||||
│ ├── eval_det_utils.py // 检测相关
|
||||
│ ├── eval_rec_utils.py // 识别相关
|
||||
│ └── __init__.py
|
||||
├── export_model.py // 导出 infer 模型
|
||||
├── infer // 基于预测引擎预测
|
||||
│ ├── predict_cls.py
|
||||
│ ├── predict_det.py
|
||||
│ ├── predict_rec.py
|
||||
│ ├── predict_system.py
|
||||
│ └── utility.py
|
||||
├── infer_cls.py // 基于训练引擎 预测分类
|
||||
├── infer_det.py // 基于训练引擎 预测检测
|
||||
├── infer_rec.py // 基于训练引擎 预测识别
|
||||
├── program.py // 整体流程
|
||||
├── test_hubserving.py
|
||||
└── train.py // 启动训练
|
||||
|
||||
```
|
|
@ -1,68 +0,0 @@
|
|||
## 中文OCR训练预测技巧
|
||||
这里整理了一些中文OCR训练预测技巧,持续更新中,欢迎各位小伙伴贡献OCR炼丹秘籍~
|
||||
- [更换骨干网络](#更换骨干网络)
|
||||
- [中文长文本识别](#中文长文本识别)
|
||||
- [空格识别](#空格识别)
|
||||
|
||||
<a name="更换骨干网络"></a>
|
||||
#### 1、更换骨干网络
|
||||
- **问题描述**
|
||||
|
||||
目前PaddleOCR中使用的骨干网络有ResNet_vd系列和MobileNetV3系列,更换骨干网络是否有助于效果提升?更换时需要注意什么?
|
||||
|
||||
- **炼丹建议**
|
||||
|
||||
- 无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
|
||||
- 文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。
|
||||
- 文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。
|
||||
|
||||
<a name="中文长文本识别"></a>
|
||||
#### 2、中文长文本识别
|
||||
- **问题描述**
|
||||
|
||||
中文识别模型训练时分辨率最大是[3,32,320],如果待识别的文本图像太长,如下图所示,该如何适配?
|
||||
|
||||
<div align="center">
|
||||
<img src="../tricks/long_text_examples.jpg" width="600">
|
||||
</div>
|
||||
|
||||
- **炼丹建议**
|
||||
|
||||
在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。[参考代码如下](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py):
|
||||
|
||||
```
|
||||
def resize_norm_img(self, img, max_wh_ratio):
|
||||
imgC, imgH, imgW = self.rec_image_shape
|
||||
assert imgC == img.shape[2]
|
||||
if self.character_type == "ch":
|
||||
imgW = int((32 * max_wh_ratio))
|
||||
h, w = img.shape[:2]
|
||||
ratio = w / float(h)
|
||||
if math.ceil(imgH * ratio) > imgW:
|
||||
resized_w = imgW
|
||||
else:
|
||||
resized_w = int(math.ceil(imgH * ratio))
|
||||
resized_image = cv2.resize(img, (resized_w, imgH))
|
||||
resized_image = resized_image.astype('float32')
|
||||
resized_image = resized_image.transpose((2, 0, 1)) / 255
|
||||
resized_image -= 0.5
|
||||
resized_image /= 0.5
|
||||
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
|
||||
padding_im[:, :, 0:resized_w] = resized_image
|
||||
return padding_im
|
||||
```
|
||||
|
||||
<a name="空格识别"></a>
|
||||
#### 3、空格识别
|
||||
- **问题描述**
|
||||
|
||||
如下图所示,对于中英文混合场景,为了便于阅读和使用识别结果,往往需要将单词之间的空格识别出来,这种情况如何适配?
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="600">
|
||||
</div>
|
||||
|
||||
- **炼丹建议**
|
||||
|
||||
空格识别可以考虑以下两种方案:(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。PaddleOCR目前采用的是第二种方案。
|
||||
|
|
@ -1,4 +1,7 @@
|
|||
# 更新
|
||||
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
|
||||
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载)
|
||||
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./doc/doc_ch/FAQ.md)
|
||||
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)
|
||||
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
|
||||
|
|
|
@ -1,10 +1,27 @@
|
|||
# 效果展示
|
||||
- [超轻量级中文OCR效果展示](#超轻量级中文OCR)
|
||||
- [通用中文OCR效果展示](#通用中文OCR)
|
||||
- [支持空格的中文OCR效果展示](#支持空格的中文OCR)
|
||||
- PP-OCR 1.1系列模型效果
|
||||
- [通用ppocr_server_1.1效果展示](#通用ppocr_server_1.1效果展示)
|
||||
- [通用ppocr_mobile_1.1效果展示(待补充)]()
|
||||
- PP-OCR 1.0系列模型效果
|
||||
- [超轻量ppocr_mobile_1.0效果展示](#超轻量ppocr_mobile_1.0效果展示)
|
||||
- [通用ppocr_server_1.0效果展示](#通用ppocr_server_1.0效果展示)
|
||||
|
||||
<a name="超轻量级中文OCR"></a>
|
||||
## 超轻量级中文OCR效果展示
|
||||
<a name="通用ppocr_server_1.1效果展示"></a>
|
||||
## 通用ppocr_server_1.1效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1102.jpg" width="800">
|
||||
<img src="../imgs_results/1103.jpg" width="800">
|
||||
<img src="../imgs_results/1104.jpg" width="800">
|
||||
<img src="../imgs_results/1105.jpg" width="800">
|
||||
<img src="../imgs_results/1110.jpg" width="800">
|
||||
<img src="../imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<a name="超轻量ppocr_mobile_1.0效果展示"></a>
|
||||
## 超轻量ppocr_mobile_1.0效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1.jpg" width="800">
|
||||
|
@ -14,32 +31,17 @@
|
|||
<img src="../imgs_results/7.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/4.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/6.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/9.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/16.png" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/22.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="通用中文OCR"></a>
|
||||
## 通用中文OCR效果展示
|
||||
<a name="通用ppocr_server_1.0效果展示"></a>
|
||||
## 通用ppocr_server_1.0效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
|
||||
|
@ -52,16 +54,3 @@
|
|||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="支持空格的中文OCR"></a>
|
||||
## 支持空格的中文OCR效果展示
|
||||
|
||||
### 轻量级模型
|
||||
<div align="center">
|
||||
<img src="../imgs_results/img_11.jpg" width="800">
|
||||
</div>
|
||||
|
||||
### 通用模型
|
||||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
|
||||
</div>
|
||||
|
|
|
@ -12,11 +12,46 @@ pip install paddleocr
|
|||
本地构建并安装
|
||||
```bash
|
||||
python setup.py bdist_wheel
|
||||
pip install dist/paddleocr-0.0.3-py3-none-any.whl
|
||||
pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号
|
||||
```
|
||||
### 1. 代码使用
|
||||
|
||||
* 检测+识别全流程
|
||||
* 检测+分类+识别全流程
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
|
||||
# 参数依次为`zh`, `en`, `french`, `german`, `korean`, `japan`。
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs/11.jpg'
|
||||
result = ocr.ocr(img_path, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
# 显示结果
|
||||
from PIL import Image
|
||||
image = Image.open(img_path).convert('RGB')
|
||||
boxes = [line[0] for line in result]
|
||||
txts = [line[1][0] for line in result]
|
||||
scores = [line[1][1] for line in result]
|
||||
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
|
||||
im_show = Image.fromarray(im_show)
|
||||
im_show.save('result.jpg')
|
||||
```
|
||||
结果是一个list,每个item包含了文本框,文字和识别置信度
|
||||
```bash
|
||||
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
|
||||
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
|
||||
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
|
||||
......
|
||||
```
|
||||
结果可视化
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
* 检测+识别
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
ocr = PaddleOCR() # need to run only once to download and load model into memory
|
||||
|
@ -48,12 +83,27 @@ im_show.save('result.jpg')
|
|||
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
* 分类+识别
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
|
||||
result = ocr.ocr(img_path, det=False, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
结果是一个list,每个item只包含识别结果和识别置信度
|
||||
```bash
|
||||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行检测
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
ocr = PaddleOCR() # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs/11.jpg'
|
||||
result = ocr.ocr(img_path,rec=False)
|
||||
result = ocr.ocr(img_path, rec=False)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
|
@ -84,7 +134,7 @@ im_show.save('result.jpg')
|
|||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR() # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
|
||||
result = ocr.ocr(img_path,det=False)
|
||||
result = ocr.ocr(img_path, det=False)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
|
@ -93,6 +143,20 @@ for line in result:
|
|||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行分类
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
|
||||
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
结果是一个list,每个item只包含分类结果和分类置信度
|
||||
```bash
|
||||
['0', 0.9999924]
|
||||
```
|
||||
|
||||
### 通过命令行使用
|
||||
|
||||
查看帮助信息
|
||||
|
@ -100,7 +164,19 @@ for line in result:
|
|||
paddleocr -h
|
||||
```
|
||||
|
||||
* 检测+识别全流程
|
||||
* 检测+分类+识别全流程
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true
|
||||
```
|
||||
结果是一个list,每个item包含了文本框,文字和识别置信度
|
||||
```bash
|
||||
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
|
||||
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
|
||||
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
|
||||
......
|
||||
```
|
||||
|
||||
* 检测+识别
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
|
||||
```
|
||||
|
@ -112,6 +188,16 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
|
|||
......
|
||||
```
|
||||
|
||||
* 分类+识别
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false
|
||||
```
|
||||
|
||||
结果是一个list,每个item只包含识别结果和识别置信度
|
||||
```bash
|
||||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行检测
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
|
||||
|
@ -134,17 +220,27 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
|
|||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行分类
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false
|
||||
```
|
||||
|
||||
结果是一个list,每个item只包含分类结果和分类置信度
|
||||
```bash
|
||||
['0', 0.9999924]
|
||||
```
|
||||
|
||||
## 自定义模型
|
||||
当内置模型无法满足需求时,需要使用到自己训练的模型。
|
||||
首先,参照[inference.md](./inference.md) 第一节转换将检测和识别模型转换为inference模型,然后按照如下方式使用
|
||||
首先,参照[inference.md](./inference.md) 第一节转换将检测、分类和识别模型转换为inference模型,然后按照如下方式使用
|
||||
|
||||
### 代码使用
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
# 检测模型和识别模型路径下必须含有model和params文件
|
||||
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}')
|
||||
# 模型路径下必须含有model和params文件
|
||||
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
|
||||
img_path = 'PaddleOCR/doc/imgs/11.jpg'
|
||||
result = ocr.ocr(img_path)
|
||||
result = ocr.ocr(img_path, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
|
@ -162,7 +258,7 @@ im_show.save('result.jpg')
|
|||
### 通过命令行使用
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
|
||||
```
|
||||
|
||||
## 参数说明
|
||||
|
@ -182,13 +278,21 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
|
|||
| det_east_cover_thresh | EAST模型输出框的阈值,低于此值的预测框会被丢弃 | 0.1 |
|
||||
| det_east_nms_thresh | EAST模型输出框NMS的阈值 | 0.2 |
|
||||
| rec_algorithm | 使用的识别算法类型 | CRNN |
|
||||
| rec_model_dir | 识别模型所在文件夹。传承那方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
|
||||
| rec_model_dir | 识别模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
|
||||
| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" |
|
||||
| rec_char_type | 识别算法的字符类型,中文(ch)或英文(en) | ch |
|
||||
| rec_batch_num | 进行识别时,同时前向的图片数 | 30 |
|
||||
| max_text_length | 识别算法能识别的最大文字长度 | 25 |
|
||||
| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt |
|
||||
| use_space_char | 是否识别空格 | TRUE |
|
||||
| use_angle_cls | 是否加载分类模型 | FALSE |
|
||||
| cls_model_dir | 分类模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/cls`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
|
||||
| cls_image_shape | 分类算法的输入图片尺寸 | "3, 48, 192" |
|
||||
| label_list | 分类算法的标签列表 | ['0', '180'] |
|
||||
| cls_batch_num | 进行分类时,同时前向的图片数 |30 |
|
||||
| enable_mkldnn | 是否启用mkldnn | FALSE |
|
||||
| use_zero_copy_run | 是否通过zero_copy_run的方式进行前向 | FALSE |
|
||||
| lang | 模型语言类型,目前支持 中文(ch)和英文(en) | ch |
|
||||
| det | 前向时使用启动检测 | TRUE |
|
||||
| rec | 前向时是否启动识别 | TRUE |
|
||||
| cls | 前向时是否启动分类 | FALSE |
|
||||
|
|
|
@ -0,0 +1,77 @@
|
|||
## Algorithm introduction
|
||||
|
||||
[TOC]
|
||||
|
||||
<a name="TEXTDETECTIONALGORITHM"></a>
|
||||
|
||||
### 1. Text Detection Algorithm
|
||||
|
||||
PaddleOCR open source text detection algorithms list:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
||||
|
||||
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows:
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|-|-|-|-|
|
||||
|ultra-lightweight OCR model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|
||||
|General OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
|
||||
|
||||
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
|
||||
|
||||
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
|
||||
|
||||
<a name="TEXTRECOGNITIONALGORITHM"></a>
|
||||
### 2. Text Recognition Algorithm
|
||||
|
||||
PaddleOCR open-source text recognition algorithms list:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research)
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry).
|
||||
|
||||
The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).
|
||||
|
||||
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w training data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows:
|
||||
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|-|-|-|-|
|
||||
|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|
||||
|General OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
|
||||
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
|
@ -0,0 +1,126 @@
|
|||
## TEXT ANGLE CLASSIFICATION
|
||||
|
||||
### DATA PREPARATION
|
||||
|
||||
Please organize the dataset as follows:
|
||||
|
||||
The default storage path for training data is `PaddleOCR/train_data/cls`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
|
||||
|
||||
```
|
||||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
|
||||
```
|
||||
|
||||
please refer to the following to organize your data.
|
||||
|
||||
- Training set
|
||||
|
||||
First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label.
|
||||
|
||||
* Note: by default, the image path and image label are split with `\t`, if you use other methods to split, it will cause training error
|
||||
|
||||
0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively.
|
||||
|
||||
```
|
||||
" Image file name Image annotation "
|
||||
|
||||
train_data/word_001.jpg 0
|
||||
train_data/word_002.jpg 180
|
||||
```
|
||||
|
||||
The final training set should have the following file structure:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-cls
|
||||
|- cls_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- Test set
|
||||
|
||||
Similar to the training set, the test set also needs to be provided a folder
|
||||
containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-cls
|
||||
|- cls_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
### TRAINING
|
||||
|
||||
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.
|
||||
|
||||
Start training:
|
||||
|
||||
```
|
||||
# Set PYTHONPATH path
|
||||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
# Training icdar15 English data
|
||||
python3 tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- Data Augmentation
|
||||
|
||||
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
|
||||
|
||||
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
|
||||
|
||||
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
|
||||
[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py)
|
||||
[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
|
||||
|
||||
|
||||
- Training
|
||||
|
||||
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/cls_mv3/best_accuracy` during the evaluation process.
|
||||
|
||||
If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
|
||||
|
||||
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
|
||||
|
||||
### EVALUATION
|
||||
|
||||
The evaluation data set can be modified via `configs/cls/cls_reader.yml` setting of `label_file_path` in EvalReader.
|
||||
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
# GPU evaluation, Global.checkpoints is the weight to be tested
|
||||
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
### PREDICTION
|
||||
|
||||
* Training engine prediction
|
||||
|
||||
Using the model trained by paddleocr, you can quickly get prediction through the following script.
|
||||
|
||||
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
|
||||
|
||||
```
|
||||
# Predict English results
|
||||
python3 tools/infer_rec.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
|
||||
```
|
||||
|
||||
Input image:
|
||||
|
||||

|
||||
|
||||
Get the prediction result of the input image:
|
||||
|
||||
```
|
||||
infer_img: doc/imgs_words/en/word_1.png
|
||||
scores: [[0.93161047 0.06838956]]
|
||||
label: [0]
|
||||
```
|
|
@ -1,36 +1,56 @@
|
|||
# BENCHMARK
|
||||
|
||||
This document gives the prediction time-consuming benchmark of PaddleOCR Ultra Lightweight Chinese Model (8.6M) on each platform.
|
||||
This document gives the performance of the series models for Chinese and English recognition.
|
||||
|
||||
## TEST DATA
|
||||
* 500 images were randomly sampled from the Chinese public data set [ICDAR2017-RCTW](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#ICDAR2017-RCTW-17).
|
||||
Most of the pictures in the set were collected in the wild through mobile phone cameras.
|
||||
Some are screenshots.
|
||||
These pictures show various scenes, including street scenes, posters, menus, indoor scenes and screenshots of mobile applications.
|
||||
|
||||
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
|
||||
|
||||
<div align="center">
|
||||
<img src="../datasets/doc.jpg" width = "1000" height = "500" />
|
||||
</div>
|
||||
|
||||
## MEASUREMENT
|
||||
The predicted time-consuming indicators on the four platforms are as follows:
|
||||
|
||||
| Long size(px) | T4(s) | V100(s) | Intel Xeon 6148(s) | Snapdragon 855(s) |
|
||||
| :---------: | :-----: | :-------: | :------------------: | :-----------------: |
|
||||
| 960 | 0.092 | 0.057 | 0.319 | 0.354 |
|
||||
| 640 | 0.067 | 0.045 | 0.198 | 0.236 |
|
||||
| 480 | 0.057 | 0.043 | 0.151 | 0.175 |
|
||||
|
||||
Explanation:
|
||||
* The evaluation time-consuming stage is the complete stage from image input to result output, including image
|
||||
pre-processing and post-processing.
|
||||
* ```Intel Xeon 6148``` is the server-side CPU model. Intel MKL-DNN is used in the test to accelerate the CPU prediction speed.
|
||||
To use this operation, you need to:
|
||||
* Update to the latest version of PaddlePaddle: https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev
|
||||
Please select the corresponding mkl version wheel package according to the CUDA version and Python version of your environment,
|
||||
for example, CUDA10, Python3.7 environment, you should:
|
||||
- v1.0 indicates DB+CRNN models without the strategies. v1.1 indicates the PP-OCR models with the strategies and the direction classify. slim_v1.1 indicates the PP-OCR models with prunner or quantization.
|
||||
|
||||
```
|
||||
# Obtain the installation package
|
||||
wget https://paddle-wheel.bj.bcebos.com/0.0.0-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
# Installation
|
||||
pip3.7 install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
```
|
||||
* Use parameters ```--enable_mkldnn True``` to turn on the acceleration switch when making predictions
|
||||
* ```Snapdragon 855``` is a mobile processing platform model.
|
||||
- The long size of the input for the text detector is 960.
|
||||
|
||||
- The evaluation time-consuming stage is the complete stage from image input to result output, including image pre-processing and post-processing.
|
||||
|
||||
- ```Intel Xeon 6148``` is the server-side CPU model. Intel MKL-DNN is used in the test to accelerate the CPU prediction speed.
|
||||
|
||||
- ```Snapdragon 855``` is a mobile processing platform model.
|
||||
|
||||
Compares the model size and F-score:
|
||||
|
||||
| Model Name | Model Size <br> of the <br> Whole System\(M\) | Model Size <br>of the Text <br> Detector\(M\) | Model Size <br> of the Direction <br> Classifier\(M\) | Model Size<br>of the Text <br> Recognizer \(M\) | F\-score |
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 |
|
||||
| ch\_ppocr\_server\_v1\.1 | 155\.1 | 47\.2 | 0\.9 | 107 | 0\.5414 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 8\.6 | 4\.1 | \- | 4\.5 | 0\.393 |
|
||||
| ch\_ppocr\_server\_v1\.0 | 203\.8 | 98\.5 | \- | 105\.3 | 0\.4436 |
|
||||
|
||||
Compares the time-consuming on T4 GPU (ms):
|
||||
|
||||
| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer |
|
||||
|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 137 | 35 | 24 | 78 |
|
||||
| ch\_ppocr\_server\_v1\.1 | 204 | 39 | 25 | 140 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 117 | 41 | \- | 76 |
|
||||
| ch\_ppocr\_server\_v1\.0 | 199 | 52 | \- | 147 |
|
||||
|
||||
Compares the time-consuming on CPU (ms):
|
||||
|
||||
| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer |
|
||||
|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 421 | 164 | 51 | 206 |
|
||||
| ch\_ppocr\_mobile\_v1\.0 | 398 | 219 | \- | 179 |
|
||||
|
||||
Compares the model size, F-score, the time-consuming on SD 855 of between the slim models and the original models:
|
||||
|
||||
| Model Name | Model Size <br> of the <br> Whole System\(M\) | Model Size <br>of the Text <br> Detector\(M\) | Model Size <br> of the Direction <br> Classifier\(M\) | Model Size<br>of the Text <br> Recognizer \(M\) | F\-score | SD 855<br>\(ms\) |
|
||||
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | 306 |
|
||||
| ch\_ppocr\_mobile\_slim\_v1\.1 | 3\.5 | 1\.4 | 0\.5 | 1\.6 | 0\.521 | 268 |
|
||||
|
|
|
@ -6,7 +6,7 @@ The process of making a customized ultra-lightweight OCR models can be divided i
|
|||
|
||||
PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
|
||||
```
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee det_db.log
|
||||
```
|
||||
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)
|
||||
|
||||
|
@ -14,7 +14,7 @@ For more details about data preparation and training tutorials, refer to the doc
|
|||
|
||||
PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
|
||||
```
|
||||
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
|
||||
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml 2>&1 | tee rec_ch_lite.log
|
||||
```
|
||||
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)
|
||||
|
||||
|
|
|
@ -62,7 +62,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
|
|||
#### START TRAINING
|
||||
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
|
||||
```shell
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee train_det.log
|
||||
```
|
||||
|
||||
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
|
||||
|
@ -73,7 +73,7 @@ You can also use `-o` to change the training parameters without modifying the ym
|
|||
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
```
|
||||
|
||||
#### load trained model and conntinue training
|
||||
#### load trained model and continue training
|
||||
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
|
||||
|
||||
For example:
|
||||
|
|
|
@ -12,25 +12,29 @@ Next, we first introduce how to convert a trained model into an inference model,
|
|||
- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
|
||||
- [Convert detection model to inference model](#Convert_detection_model)
|
||||
- [Convert recognition model to inference model](#Convert_recognition_model)
|
||||
|
||||
|
||||
- [Convert angle classification model to inference model](#Convert_angle_class_model)
|
||||
|
||||
|
||||
- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
|
||||
- [1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION)
|
||||
- [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
|
||||
- [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
|
||||
- [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
|
||||
|
||||
- [5. Multilingual model inference](#Multilingual model inference)
|
||||
|
||||
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
||||
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
|
||||
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
|
||||
|
||||
- [TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
|
||||
|
||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
|
||||
- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
|
||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
|
||||
- [2. OTHER MODELS](#OTHER_MODELS)
|
||||
|
||||
|
||||
<a name="CONVERT"></a>
|
||||
## CONVERT TRAINING MODEL TO INFERENCE MODEL
|
||||
<a name="Convert_detection_model"></a>
|
||||
|
@ -87,6 +91,33 @@ After the conversion is successful, there are two files in the directory:
|
|||
└─ params Identify the parameter files of the inference model
|
||||
```
|
||||
|
||||
<a name="Convert_angle_class_model"></a>
|
||||
### Convert angle classification model to inference model
|
||||
|
||||
Download the angle classification model:
|
||||
```
|
||||
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile-v1.1.cls_pre.tar && tar xf ./ch_lite/ch_ppocr_mobile-v1.1.cls_pre.tar -C ./ch_lite/
|
||||
```
|
||||
|
||||
The angle classification model is converted to the inference model in the same way as the detection, as follows:
|
||||
```
|
||||
# -c Set the training algorithm yml configuration file
|
||||
# -o Set optional parameters
|
||||
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
|
||||
# Global.save_inference_dir Set the address where the converted model will be saved.
|
||||
|
||||
python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/cls_model/best_accuracy \
|
||||
Global.save_inference_dir=./inference/cls/
|
||||
```
|
||||
|
||||
After the conversion is successful, there are two files in the directory:
|
||||
```
|
||||
/inference/cls/
|
||||
└─ model Identify the saved model files
|
||||
└─ params Identify the parameter files of the inference model
|
||||
```
|
||||
|
||||
|
||||
<a name="DETECTION_MODEL_INFERENCE"></a>
|
||||
## TEXT DETECTION MODEL INFERENCE
|
||||
|
||||
|
@ -276,16 +307,57 @@ If the chars dictionary is modified during training, you need to specify the new
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
<a name="Multilingual model inference"></a>
|
||||
|
||||
### 5. Multilingual Model Reasoning
|
||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/ utils/korean_dict.txt" --vis_font_path="doc/korean.ttf"
|
||||
```
|
||||

|
||||
|
||||
After executing the command, the prediction result of the above figure is:
|
||||
|
||||
``` text
|
||||
2020-09-19 16:15:05,076-INFO: index: [205 206 38 39]
|
||||
2020-09-19 16:15:05,077-INFO: word : 바탕으로
|
||||
2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535
|
||||
```
|
||||
|
||||
<a name="ANGLE_CLASSIFICATION_MODEL_INFERENCE"></a>
|
||||
## ANGLE CLASSIFICATION MODEL INFERENCE
|
||||
|
||||
The following will introduce the angle classification model inference.
|
||||
|
||||
|
||||
<a name="ANGLE_CLASS_MODEL_INFERENCE"></a>
|
||||
### 1.ANGLE CLASSIFICATION MODEL INFERENCE
|
||||
|
||||
For angle classification model inference, you can execute the following commands:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="./inference/cls/"
|
||||
```
|
||||
|
||||

|
||||
|
||||
After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen.
|
||||
|
||||
Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963]
|
||||
|
||||
|
||||
<a name="CONCATENATION"></a>
|
||||
## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION
|
||||
## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
|
||||
|
||||
<a name="LIGHTWEIGHT_CHINESE_MODEL"></a>
|
||||
### 1. LIGHTWEIGHT CHINESE MODEL
|
||||
|
||||
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default.
|
||||
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model.The visualized recognition results are saved to the `./inference_results` folder by default.
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
|
||||
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls true
|
||||
```
|
||||
|
||||
After executing the command, the recognition result image is as follows:
|
||||
|
|
|
@ -7,7 +7,7 @@ PaddleOCR working environment:
|
|||
- python3.7
|
||||
- glibc 2.23
|
||||
|
||||
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/).
|
||||
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://www.runoob.com/docker/docker-tutorial.html/).
|
||||
|
||||
*If you want to directly run the prediction code on mac or windows, you can start from step 2.*
|
||||
|
||||
|
|
|
@ -0,0 +1,70 @@
|
|||
## OCR model list(V1.1, updated on 9.22)
|
||||
|
||||
- [1. Text Detection Model](#Detection)
|
||||
- [2. Text Recognition Model](#Recognition)
|
||||
- [Chinese Recognition Model](#Chinese)
|
||||
- [English Recognition Model](#English)
|
||||
- [Multilingual Recognition Model](#Multilingual)
|
||||
- [3. Text Angle Classification Model](#Angle)
|
||||
|
||||
The downloadable models provided by PaddleOCR include `inference model`, `trained model`, `pre-trained model` and `slim model`. The differences between the models are as follows:
|
||||
|
||||
|model type|model format|description|
|
||||
|-|-|-|
|
||||
|inference model|model、params|Used for reasoning based on Python prediction engine. [detail](./inference_en.md)|
|
||||
|trained model / pre-trained model|\*.pdmodel、\*.pdopt、\*.pdparams|The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|
||||
|slim model|\*.nb|Generally used for Lite deployment|
|
||||
|
||||
|
||||
<a name="Detection"></a>
|
||||
### 1. Text Detection Model
|
||||
|model name|description|model size|download|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|
|
||||
|ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|
|
||||
|ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)|
|
||||
|
||||
|
||||
<a name="Recognition"></a>
|
||||
### 2. Text Recognition Model
|
||||
|
||||
<a name="Chinese"></a>
|
||||
#### Chinese Recognition Model
|
||||
|model name|description|model size|download|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|
|
||||
|ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar)|
|
||||
|ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar)|
|
||||
|
||||
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
|
||||
|
||||
<a name="English"></a>
|
||||
#### English Recognition Model
|
||||
|model name|description|model size|download|
|
||||
|-|-|-|-|
|
||||
|en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb)|
|
||||
|en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
|
||||
<a name="Multilingual"></a>
|
||||
#### Multilingual Recognition Model(Updating...)
|
||||
|model name|description|model size|download|
|
||||
|-|-|-|-|
|
||||
| french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| german_ppocr_mobile_v1.1_rec |German model for French recognition|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
| japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar)|
|
||||
|
||||
|
||||
<a name="Angle"></a>
|
||||
### 3. Text Angle Classification Model
|
||||
|model name|description|model size|download|
|
||||
|-|-|-|-|
|
||||
|ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model]()|
|
||||
|ch_ppocr_mobile_v1.1_cls|Original model|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar)|
|
||||
|
||||
|
||||
## OCR model list(V1.0, updated on 7.16)
|
||||
|model name|description|detection model|recognition model|recognition model supporting space recognition|
|
||||
|-|-|-|-|-|
|
||||
|chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|
||||
|chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
|
|
@ -130,8 +130,8 @@ Start training:
|
|||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
# Training icdar15 English data
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
# Training icdar15 English data and saving the log as train_rec.log
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
|
||||
```
|
||||
|
||||
- Data Augmentation
|
||||
|
@ -201,7 +201,19 @@ Optimizer:
|
|||
```
|
||||
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
|
||||
|
||||
-Minor language
|
||||
|
||||
PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are:
|
||||
|
||||
| Configuration file | Algorithm name | backbone | trans | seq | pred | language |
|
||||
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
|
||||
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English |
|
||||
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French |
|
||||
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German |
|
||||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
|
||||
|
||||
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk]().
|
||||
|
||||
### EVALUATION
|
||||
|
||||
|
|
|
@ -0,0 +1,208 @@
|
|||
# Overall directory structure
|
||||
|
||||
The overall directory structure of PaddleOCR is introduced as follows:
|
||||
|
||||
```
|
||||
PaddleOCR
|
||||
├── configs // configuration file, you can select model structure and modify hyperparameters through yml file
|
||||
│ ├── cls // Related configuration files of direction classifier
|
||||
│ │ ├── cls_mv3.yml // training configuration related, including backbone network, head, loss, optimizer
|
||||
│ │ └── cls_reader.yml // Data reading related, data reading method, data storage path
|
||||
│ ├── det // Detection related configuration files
|
||||
│ │ ├── det_db_icdar15_reader.yml // data read
|
||||
│ │ ├── det_mv3_db.yml // training configuration
|
||||
│ │ ...
|
||||
│ └── rec // Identify related configuration files
|
||||
│ ├── rec_benchmark_reader.yml // LMDB format data reading related
|
||||
│ ├── rec_chinese_common_train.yml // General Chinese training configuration
|
||||
│ ├── rec_icdar15_reader.yml // simple data reading related, including data reading function, data path, label file
|
||||
│ ...
|
||||
├── deploy // deployment related
|
||||
│ ├── android_demo // android_demo
|
||||
│ │ ...
|
||||
│ ├── cpp_infer // C++ infer
|
||||
│ │ ├── CMakeLists.txt // Cmake file
|
||||
│ │ ├── docs // documentation
|
||||
│ │ │ └── windows_vs2019_build.md
|
||||
│ │ ├── include
|
||||
│ │ │ ├── clipper.h // clipper library
|
||||
│ │ │ ├── config.h // infer configuration
|
||||
│ │ │ ├── ocr_cls.h // direction classifier
|
||||
│ │ │ ├── ocr_det.h // text detection
|
||||
│ │ │ ├── ocr_rec.h // text recognition
|
||||
│ │ │ ├── postprocess_op.h // postprocess after detection
|
||||
│ │ │ ├── preprocess_op.h // preprocess detection
|
||||
│ │ │ └── utility.h // tools
|
||||
│ │ ├── readme.md // documentation
|
||||
│ │ ├── ...
|
||||
│ │ ├── src // source file
|
||||
│ │ │ ├── clipper.cpp
|
||||
│ │ │ ├── config.cpp
|
||||
│ │ │ ├── main.cpp
|
||||
│ │ │ ├── ocr_cls.cpp
|
||||
│ │ │ ├── ocr_det.cpp
|
||||
│ │ │ ├── ocr_rec.cpp
|
||||
│ │ │ ├── postprocess_op.cpp
|
||||
│ │ │ ├── preprocess_op.cpp
|
||||
│ │ │ └── utility.cpp
|
||||
│ │ └── tools // compile and execute script
|
||||
│ │ ├── build.sh // compile script
|
||||
│ │ ├── config.txt // configuration file
|
||||
│ │ └── run.sh // Test startup script
|
||||
│ ├── docker
|
||||
│ │ └── hubserving
|
||||
│ │ ├── cpu
|
||||
│ │ │ └── Dockerfile
|
||||
│ │ ├── gpu
|
||||
│ │ │ └── Dockerfile
|
||||
│ │ ├── README_cn.md
|
||||
│ │ ├── README.md
|
||||
│ │ └── sample_request.txt
|
||||
│ ├── hubserving // hubserving
|
||||
│ │ ├── ocr_det // text detection
|
||||
│ │ │ ├── config.json // serving configuration
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── module.py // prediction model
|
||||
│ │ │ └── params.py // prediction parameters
|
||||
│ │ ├── ocr_rec // text recognition
|
||||
│ │ │ ├── config.json
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── module.py
|
||||
│ │ │ └── params.py
|
||||
│ │ └── ocr_system // system forecast
|
||||
│ │ ├── config.json
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── module.py
|
||||
│ │ └── params.py
|
||||
│ ├── imgs // prediction picture
|
||||
│ │ ├── cpp_infer_pred_12.png
|
||||
│ │ └── demo.png
|
||||
│ ├── ios_demo // ios demo
|
||||
│ │ ...
|
||||
│ ├── lite // lite deployment
|
||||
│ │ ├── cls_process.cc // direction classifier data processing
|
||||
│ │ ├── cls_process.h
|
||||
│ │ ├── config.txt // check configuration parameters
|
||||
│ │ ├── crnn_process.cc // crnn data processing
|
||||
│ │ ├── crnn_process.h
|
||||
│ │ ├── db_post_process.cc // db data processing
|
||||
│ │ ├── db_post_process.h
|
||||
│ │ ├── Makefile // compile file
|
||||
│ │ ├── ocr_db_crnn.cc // series prediction
|
||||
│ │ ├── prepare.sh // data preparation
|
||||
│ │ ├── readme.md // documentation
|
||||
│ │ ...
|
||||
│ ├── pdserving // pdserving deployment
|
||||
│ │ ├── det_local_server.py // fast detection version, easy deployment and fast prediction
|
||||
│ │ ├── det_web_server.py // Full version of detection, high stability and distributed deployment
|
||||
│ │ ├── ocr_local_server.py // detection + identification quick version
|
||||
│ │ ├── ocr_web_client.py // client
|
||||
│ │ ├── ocr_web_server.py // detection + identification full version
|
||||
│ │ ├── readme.md // documentation
|
||||
│ │ ├── rec_local_server.py // recognize quick version
|
||||
│ │ └── rec_web_server.py // Identify the full version
|
||||
│ └── slim
|
||||
│ └── quantization // quantization related
|
||||
│ ├── export_model.py // export model
|
||||
│ ├── quant.py // quantization
|
||||
│ └── README.md // Documentation
|
||||
├── doc // Documentation tutorial
|
||||
│ ...
|
||||
├── paddleocr.py
|
||||
├── ppocr // network core code
|
||||
│ ├── data // data processing
|
||||
│ │ ├── cls // direction classifier
|
||||
│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch
|
||||
│ │ │ └── randaugment.py // Random data augmentation operation
|
||||
│ │ ├── det // detection
|
||||
│ │ │ ├── data_augment.py // data augmentation operation
|
||||
│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch
|
||||
│ │ │ ├── db_process.py // db data processing
|
||||
│ │ │ ├── east_process.py // east data processing
|
||||
│ │ │ ├── make_border_map.py // Generate boundary map
|
||||
│ │ │ ├── make_shrink_map.py // Generate shrink map
|
||||
│ │ │ ├── random_crop_data.py // random crop
|
||||
│ │ │ └── sast_process.py // sast data processing
|
||||
│ │ ├── reader_main.py // main function of data reader
|
||||
│ │ └── rec // recognation
|
||||
│ │ ├── dataset_traversal.py // Data transmission, define data reader, including LMDB_Reader and Simple_Reader
|
||||
│ │ └── img_tools.py // Data processing related, including data normalization and disturbance
|
||||
│ ├── __init__.py
|
||||
│ ├── modeling // networking related
|
||||
│ │ ├── architectures // Model architecture, which defines the various modules required by the model
|
||||
│ │ │ ├── cls_model.py // direction classifier
|
||||
│ │ │ ├── det_model.py // detection
|
||||
│ │ │ └── rec_model.py // recognition
|
||||
│ │ ├── backbones // backbone network
|
||||
│ │ │ ├── det_mobilenet_v3.py // detect mobilenet_v3
|
||||
│ │ │ ├── det_resnet_vd.py
|
||||
│ │ │ ├── det_resnet_vd_sast.py
|
||||
│ │ │ ├── rec_mobilenet_v3.py // recognize mobilenet_v3
|
||||
│ │ │ ├── rec_resnet_fpn.py
|
||||
│ │ │ └── rec_resnet_vd.py
|
||||
│ │ ├── common_functions.py // common functions
|
||||
│ │ ├── heads
|
||||
│ │ │ ├── cls_head.py // class header
|
||||
│ │ │ ├── det_db_head.py // db detection head
|
||||
│ │ │ ├── det_east_head.py // east detection head
|
||||
│ │ │ ├── det_sast_head.py // sast detection head
|
||||
│ │ │ ├── rec_attention_head.py // recognition attention
|
||||
│ │ │ ├── rec_ctc_head.py // recognition ctc
|
||||
│ │ │ ├── rec_seq_encoder.py // recognition sequence code
|
||||
│ │ │ ├── rec_srn_all_head.py // srn related
|
||||
│ │ │ └── self_attention // srn attention
|
||||
│ │ │ └── model.py
|
||||
│ │ ├── losses // loss function
|
||||
│ │ │ ├── cls_loss.py // Directional classifier loss function
|
||||
│ │ │ ├── det_basic_loss.py // detect basic loss
|
||||
│ │ │ ├── det_db_loss.py // DB loss
|
||||
│ │ │ ├── det_east_loss.py // EAST loss
|
||||
│ │ │ ├── det_sast_loss.py // SAST loss
|
||||
│ │ │ ├── rec_attention_loss.py // attention loss
|
||||
│ │ │ ├── rec_ctc_loss.py // ctc loss
|
||||
│ │ │ └── rec_srn_loss.py // srn loss
|
||||
│ │ └── stns // Spatial transformation network
|
||||
│ │ └── tps.py // TPS conversion
|
||||
│ ├── optimizer.py // optimizer
|
||||
│ ├── postprocess // post-processing
|
||||
│ │ ├── db_postprocess.py // DB postprocess
|
||||
│ │ ├── east_postprocess.py // East postprocess
|
||||
│ │ ├── lanms // lanms related
|
||||
│ │ │ ...
|
||||
│ │ ├── locality_aware_nms.py // nms
|
||||
│ │ └── sast_postprocess.py // sast post-processing
|
||||
│ └── utils // tools
|
||||
│ ├── character.py // Character processing, including text encoding and decoding, and calculation of prediction accuracy
|
||||
│ ├── check.py // parameter loading check
|
||||
│ ├── ic15_dict.txt // English number dictionary, case sensitive
|
||||
│ ├── ppocr_keys_v1.txt // Chinese dictionary, used to train Chinese models
|
||||
│ ├── save_load.py // model save and load function
|
||||
│ ├── stats.py // Statistics
|
||||
│ └── utility.py // Tool functions, including related check tools such as whether the input parameters are legal
|
||||
├── README_en.md // documentation
|
||||
├── README.md
|
||||
├── requirments.txt // installation dependencies
|
||||
├── setup.py // whl package packaging script
|
||||
└── tools // start tool
|
||||
├── eval.py // evaluation function
|
||||
├── eval_utils // evaluation tools
|
||||
│ ├── eval_cls_utils.py // category related
|
||||
│ ├── eval_det_iou.py // detect iou related
|
||||
│ ├── eval_det_utils.py // detection related
|
||||
│ ├── eval_rec_utils.py // recognition related
|
||||
│ └── __init__.py
|
||||
├── export_model.py // export infer model
|
||||
├── infer // Forecast based on prediction engine
|
||||
│ ├── predict_cls.py
|
||||
│ ├── predict_det.py
|
||||
│ ├── predict_rec.py
|
||||
│ ├── predict_system.py
|
||||
│ └── utility.py
|
||||
├── infer_cls.py // Predict classification based on training engine
|
||||
├── infer_det.py // Predictive detection based on training engine
|
||||
├── infer_rec.py // Predictive recognition based on training engine
|
||||
├── program.py // overall process
|
||||
├── test_hubserving.py
|
||||
└── train.py // start training
|
||||
|
||||
```
|
|
@ -10,14 +10,52 @@ pip install paddleocr
|
|||
build own whl package and install
|
||||
```bash
|
||||
python setup.py bdist_wheel
|
||||
pip install dist/paddleocr-0.0.3-py3-none-any.whl
|
||||
pip install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
|
||||
```
|
||||
### 1. Use by code
|
||||
|
||||
* detection classification and recognition
|
||||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
|
||||
# You can set the parameter `lang` as `zh`, `en`, `french`, `german`, `korean`, `japan`
|
||||
# to switch the language model in order.
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
|
||||
result = ocr.ocr(img_path, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
|
||||
# draw result
|
||||
from PIL import Image
|
||||
image = Image.open(img_path).convert('RGB')
|
||||
boxes = [line[0] for line in result]
|
||||
txts = [line[1][0] for line in result]
|
||||
scores = [line[1][1] for line in result]
|
||||
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
|
||||
im_show = Image.fromarray(im_show)
|
||||
im_show.save('result.jpg')
|
||||
```
|
||||
|
||||
Output will be a list, each item contains bounding box, text and recognition confidence
|
||||
```bash
|
||||
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
|
||||
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
|
||||
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
|
||||
......
|
||||
```
|
||||
|
||||
Visualization of results
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/whl/12_det_rec.jpg" width="800">
|
||||
</div>
|
||||
|
||||
* detection and recognition
|
||||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
ocr = PaddleOCR() # need to run only once to download and load model into memory
|
||||
ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
|
||||
result = ocr.ocr(img_path)
|
||||
for line in result:
|
||||
|
@ -48,6 +86,21 @@ Visualization of results
|
|||
<img src="../imgs_results/whl/12_det_rec.jpg" width="800">
|
||||
</div>
|
||||
|
||||
* classification and recognition
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
|
||||
result = ocr.ocr(img_path, det=False, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
|
||||
Output will be a list, each item contains recognition text and confidence
|
||||
```bash
|
||||
['PAIN', 0.990372]
|
||||
```
|
||||
|
||||
* only detection
|
||||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
|
@ -83,18 +136,33 @@ Visualization of results
|
|||
* only recognition
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR() # need to run only once to load model into memory
|
||||
ocr = PaddleOCR(lang='en') # need to run only once to load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
|
||||
result = ocr.ocr(img_path,det=False)
|
||||
result = ocr.ocr(img_path, det=False, cls=False)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
|
||||
Output will be a list, each item contains text and recognition confidence
|
||||
Output will be a list, each item contains recognition text and confidence
|
||||
```bash
|
||||
['PAIN', 0.990372]
|
||||
```
|
||||
|
||||
* only classification
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory
|
||||
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
|
||||
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
```
|
||||
|
||||
Output will be a list, each item contains classification result and confidence
|
||||
```bash
|
||||
['0', 0.99999964]
|
||||
```
|
||||
|
||||
### Use by command line
|
||||
|
||||
show help information
|
||||
|
@ -102,9 +170,9 @@ show help information
|
|||
paddleocr -h
|
||||
```
|
||||
|
||||
* detection and recognition
|
||||
* detection classification and recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true -cls true --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains bounding box, text and recognition confidence
|
||||
|
@ -115,6 +183,29 @@ Output will be a list, each item contains bounding box, text and recognition con
|
|||
......
|
||||
```
|
||||
|
||||
* detection and recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains bounding box, text and recognition confidence
|
||||
```bash
|
||||
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
|
||||
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
|
||||
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
|
||||
......
|
||||
```
|
||||
|
||||
* classification and recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains text and recognition confidence
|
||||
```bash
|
||||
['PAIN', 0.990372]
|
||||
```
|
||||
|
||||
* only detection
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
|
||||
|
@ -130,7 +221,7 @@ Output will be a list, each item only contains bounding box
|
|||
|
||||
* only recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --cls false --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains text and recognition confidence
|
||||
|
@ -138,6 +229,16 @@ Output will be a list, each item contains text and recognition confidence
|
|||
['PAIN', 0.990372]
|
||||
```
|
||||
|
||||
* only classification
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --rec false
|
||||
```
|
||||
|
||||
Output will be a list, each item contains classification result and confidence
|
||||
```bash
|
||||
['0', 0.99999964]
|
||||
```
|
||||
|
||||
## Use custom model
|
||||
When the built-in model cannot meet the needs, you need to use your own trained model.
|
||||
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
|
||||
|
@ -147,9 +248,9 @@ First, refer to the first section of [inference_en.md](./inference_en.md) to con
|
|||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
# The path of detection and recognition model must contain model and params files
|
||||
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}å')
|
||||
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
|
||||
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
|
||||
result = ocr.ocr(img_path)
|
||||
result = ocr.ocr(img_path, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
|
@ -167,7 +268,7 @@ im_show.save('result.jpg')
|
|||
### Use by command line
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
|
||||
```
|
||||
|
||||
## Parameter Description
|
||||
|
@ -194,6 +295,14 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
|
|||
| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 |
|
||||
| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt |
|
||||
| use_space_char | Whether to recognize spaces | TRUE |
|
||||
| use_angle_cls | Whether to load classification model | FALSE |
|
||||
| cls_model_dir | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
|
||||
| cls_image_shape | image shape of classification algorithm | "3,48,192" |
|
||||
| label_list | label list of classification algorithm | ['0','180'] |
|
||||
| cls_batch_num | When performing classification, the batchsize of forward images | 30 |
|
||||
| enable_mkldnn | Whether to enable mkldnn | FALSE |
|
||||
| use_zero_copy_run | Whether to forward by zero_copy_run | FALSE |
|
||||
| lang | The support language, now only chinese(ch) and english(en) are supported | ch |
|
||||
| det | Enable detction when `ppocr.ocr` func exec | TRUE |
|
||||
| rec | Enable detction when `ppocr.ocr` func exec | TRUE |
|
||||
| rec | Enable recognition when `ppocr.ocr` func exec | TRUE |
|
||||
| cls | Enable classification when `ppocr.ocr` func exec | FALSE |
|
||||
|
|
After Width: | Height: | Size: 82 KiB |
After Width: | Height: | Size: 147 KiB |
After Width: | Height: | Size: 124 KiB |
After Width: | Height: | Size: 164 KiB |
After Width: | Height: | Size: 137 KiB |
After Width: | Height: | Size: 284 KiB |
After Width: | Height: | Size: 244 KiB |
After Width: | Height: | Size: 146 KiB |
After Width: | Height: | Size: 9.5 KiB |
After Width: | Height: | Size: 13 KiB |
After Width: | Height: | Size: 8.2 KiB |
After Width: | Height: | Size: 8.7 KiB |