Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into develop-nblib

This commit is contained in:
nblib 2020-09-23 13:26:21 +08:00
commit f7ae9a5fb3
286 changed files with 36530 additions and 1662 deletions

4
.gitignore vendored
View File

@ -21,3 +21,7 @@ output/
*.log *.log
.clang-format .clang-format
.clang_format.hook .clang_format.hook
build/
dist/
paddleocr.egg-info/

8
MANIFEST.in Normal file
View File

@ -0,0 +1,8 @@
include LICENSE.txt
include README.md
recursive-include ppocr/utils *.txt utility.py character.py check.py
recursive-include ppocr/data/det *.py
recursive-include ppocr/postprocess *.py
recursive-include ppocr/postprocess/lanms *.*
recursive-include tools/infer *.py

282
README.md
View File

@ -1,209 +1,139 @@
[English](README_en.md) | 简体中文 English | [简体中文](README_ch.md)
## 简介 ## Introduction
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力使用者训练出更好的模型并应用落地。 PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**近期更新** **Recent updates**
- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO支持iOS和Android系统 - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.7.15 完善预测部署添加基于C++预测引擎推理、服务化部署和端侧部署方案以及超轻量级中文OCR模型预测耗时Benchmark - 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具 - 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档 - 2020.8.24 Support the use of PaddleOCR through whl package installationpelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md) - 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- [more](./doc/doc_ch/update.md) - [more](./doc/doc_en/update_en.md)
## 特性 ## Features
- 超轻量级中文OCR模型总模型仅8.6M - PPOCR series of high-quality pre-trained models, comparable to commercial effects
- 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 - Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M
- 检测模型DB4.1M+识别模型CRNN4.5M - General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M
- 实用通用中文OCR模型 - Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M
- 多种预测推理部署方案,包括服务部署和端侧部署 - Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- 多种文本检测训练算法EAST、DB - Support multi-language recognition: Korean, Japanese, German, French
- 多种文本识别训练算法Rosetta、CRNN、STAR-Net、RARE - Support user-defined training, provides rich predictive inference deployment solutions
- 可运行于Linux、Windows、MacOS等多种系统 - Support PIP installation, easy to use
- Support Linux, Windows, MacOS and other systems
## 快速体验 ## Visualization
<div align="center"> <div align="center">
<img src="doc/imgs_results/11.jpg" width="800"> <img src="doc/imgs_results/1101.jpg" width="800">
<img src="doc/imgs_results/1103.jpg" width="800">
</div> </div>
上图是超轻量级中文OCR模型效果展示更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。 The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md).
- 超轻量级中文OCR在线体验地址https://www.paddlepaddle.org.cn/hub/scene/ocr ## Quick Experience
- 移动端DEMO体验(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)[安装包二维码获取地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
Android手机也可以扫描下面二维码安装体验。 You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
Also, you can scan the QR code below to install the App (**Android support only**)
<div align="center"> <div align="center">
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" /> <img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
</div> </div>
- [**中文OCR模型快速使用**](./doc/doc_ch/quickstart.md) - [**OCR Quick Start**](./doc/doc_en/quickstart_en.md)
<a name="Supported-Chinese-model-list"></a>
## PP-OCR 1.1 series model listUpdate on Sep 17
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md)
## 中文OCR模型列表 ## Tutorials
- [Installation](./doc/doc_en/installation_en.md)
- [Quick Start](./doc/doc_en/quickstart_en.md)
- [Code Structure](./doc/doc_en/tree_en.md)
- Algorithm introduction
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [PP-OCR Pipline](#PP-OCR-Pipline)
- Model training/evaluation
- [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment
- [Quick inference based on pip](./doc/doc_en/whl_en.md)
- [Python Inference](./doc/doc_en/inference_en.md)
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/hubserving/readme_en.md)
- [Mobile](./deploy/lite/readme_en.md)
- [Model Quantization](./deploy/slim/quantization/README_en.md)
- [Model Compression](./deploy/slim/prune/README_en.md)
- [Benchmark](./doc/doc_en/benchmark_en.md)
- Datasets
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
- [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
- [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
- [Visualization](#Visualization)
- [FAQ](./doc/doc_en/FAQ_en.md)
- [Community](#Community)
- [References](./doc/doc_en/reference_en.md)
- [License](#LICENSE)
- [Contribution](#CONTRIBUTION)
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址| <a name="PP-OCR-Pipline"></a>
|-|-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
## 文档教程 ## PP-OCR Pipline
- [快速安装](./doc/doc_ch/installation.md)
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
- 算法介绍
- [文本检测](#文本检测算法)
- [文本识别](#文本识别算法)
- [端到端OCR](#端到端OCR算法)
- 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 中文OCR训练预测技巧
- 预测部署
- [基于Python预测引擎推理](./doc/doc_ch/inference.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./doc/doc_ch/serving.md)
- [端侧部署](./deploy/lite/readme.md)
- 模型量化压缩
- [Benchmark](./doc/doc_ch/benchmark.md)
- 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
- 垂类多语言OCR数据集
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
- [FAQ](#FAQ)
- 效果展示
- [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)
- [通用中文OCR效果展示](#通用中文OCR效果展示)
- [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
<a name="算法介绍"></a>
## 算法介绍
<a name="文本检测算法"></a>
### 1.文本检测算法
PaddleOCR开源的文本检测算法列表
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, comming soon)
在ICDAR2015文本检测公开数据集上算法效果如下
|模型|骨干网络|precision|recall|Hmean|下载链接|
|-|-|-|-|-|-|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据训练中文检测模型的相关配置和预训练文件如下
|模型|骨干网络|配置文件|预训练模型|
|-|-|-|-|
|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* 注: 上述DB模型的训练和评估需设置后处理参数box_thresh=0.6unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)。
<a name="文本识别算法"></a>
### 2.文本识别算法
PaddleOCR开源的文本识别算法列表
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, comming soon)
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程使用MJSynth和SynthText两个文字识别数据集训练在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估算法效果如下
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|-|-|-|-|-|
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型相关配置和预训练文件如下
|模型|骨干网络|配置文件|预训练模型|
|-|-|-|-|
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)。
<a name="端到端OCR算法"></a>
### 3.端到端OCR算法
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon)
## 效果展示
<a name="超轻量级中文OCR效果展示"></a>
### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
<div align="center"> <div align="center">
<img src="doc/imgs_results/1.jpg" width="800"> <img src="./doc/ppocr_framework.png" width="800">
</div> </div>
<a name="通用中文OCR效果展示"></a> PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
## Visualization [more](./doc/doc_en/visualization_en.md)
<div align="center"> <div align="center">
<img src="doc/imgs_results/chinese_db_crnn_server/11.jpg" width="800"> <img src="./doc/imgs_results/1102.jpg" width="800">
<img src="./doc/imgs_results/1104.jpg" width="800">
<img src="./doc/imgs_results/1106.jpg" width="800">
<img src="./doc/imgs_results/1105.jpg" width="800">
<img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div> </div>
<a name="支持空格的中文OCR效果展示"></a> <a name="Community"></a>
### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md) ## Community
Scan the QR code below with your Wechat and completing the questionnaire, you can access to offical technical exchange group.
<div align="center"> <div align="center">
<img src="doc/imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800"> <img src="./doc/joinus.PNG" width = "200" height = "200" />
</div> </div>
<a name="FAQ"></a> <a name="LICENSE"></a>
## FAQ ## License
1. **转换attention识别模型时报错KeyError: 'predict'** This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
问题已解,请更新到最新代码。
2. **关于推理速度** <a name="CONTRIBUTION"></a>
图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num默认值为30可以改为10或其他数值。 ## Contribution
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
3. **服务部署与移动端部署** - Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation.
预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案欢迎持续关注。 - Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
4. **自研算法发布时间** - Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets.
自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布敬请期待。 - Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
[more](./doc/doc_ch/FAQ.md) - Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
请扫描下面二维码完成问卷填写获取加群二维码和OCR方向的炼丹秘籍
<div align="center">
<img src="./doc/joinus.jpg" width = "200" height = "200" />
</div>
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
<a name="贡献代码"></a>
## 贡献代码
我们非常欢迎你为PaddleOCR贡献代码也十分感谢你的反馈。
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 贡献了英文文档。
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集

141
README_ch.md Normal file
View File

@ -0,0 +1,141 @@
[English](README.md) | 简体中文
## 简介
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力使用者训练出更好的模型并应用落地。
**近期更新**
- 2020.9.22 更新PP-OCR技术文章https://arxiv.org/abs/2009.09941
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型整体模型3.5M(详见[PP-OCR Pipline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型媲美商业效果。[模型下载](#模型下载)
- 2020.8.26 更新OCR相关的84个常见问题及解答具体参考[FAQ](./doc/doc_ch/FAQ.md)
- 2020.8.24 支持通过whl包安装使用PaddleOCR具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
- 2020.8.21 更新8月18日B站直播课回放和PPT课节2易学易用的OCR工具大礼包[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- [More](./doc/doc_ch/update.md)
## 特性
- PPOCR系列高质量预训练模型准确的识别效果
- 超轻量ppocr_mobile移动端系列检测2.6M+方向分类器0.9M+ 识别4.6M= 8.1M
- 通用ppocr_server系列检测47.2M+方向分类器0.9M+ 识别107M= 155.1M
- 超轻量压缩ppocr_mobile_slim系列检测1.4M+方向分类器0.5M+ 识别1.6M= 3.5M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语
- 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统
## 效果展示
<div align="center">
<img src="doc/imgs_results/1101.jpg" width="800">
<img src="doc/imgs_results/1103.jpg" width="800">
</div>
上图是通用ppocr_server模型效果展示更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
## 快速体验
- PC端超轻量级中文OCR在线体验地址https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)Android手机也可以直接扫描下面二维码安装体验。
<div align="center">
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
</div>
- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始
<a name="模型下载"></a>
## PP-OCR 1.1系列模型列表9月17日更新
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型8.1M | ch_ppocr_mobile_v1.1_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| 中英文通用OCR模型155.1M |ch_ppocr_server_v1.1_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
| 中英文超轻量压缩OCR模型3.5M | ch_ppocr_mobile_slim_v1.1_xx| 移动端 |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_opt.nb) |[推理模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_cls_quant_opt.nb)| [推理模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim模型](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_opt.nb)|
更多模型下载(包括多语言),可以参考[PP-OCR v1.1 系列模型下载](./doc/doc_ch/models_list.md)
## 文档教程
- [快速安装](./doc/doc_ch/installation.md)
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
- [代码组织结构](./doc/doc_ch/tree.md)
- 算法介绍
- [文本检测](./doc/doc_ch/algorithm_overview.md)
- [文本识别](./doc/doc_ch/algorithm_overview.md)
- [PP-OCR Pipline](#PP-OCR)
- 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [方向分类器](./doc/doc_ch/angle_class.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 预测部署
- [基于pip安装whl包快速推理](./doc/doc_ch/whl.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/hubserving/readme.md)
- [端侧部署](./deploy/lite/readme.md)
- [模型量化](./deploy/slim/quantization/README.md)
- [模型裁剪](./deploy/slim/prune/README.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
- [效果展示](#效果展示)
- FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
<a name="PP-OCR"></a>
## PP-OCR Pipline
<div align="center">
<img src="./doc/ppocr_framework.png" width="800">
</div>
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面采用19个有效策略对各个模块的模型进行效果调优和瘦身最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。
<a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md)
<div align="center">
<img src="./doc/imgs_results/1102.jpg" width="800">
<img src="./doc/imgs_results/1104.jpg" width="800">
<img src="./doc/imgs_results/1106.jpg" width="800">
<img src="./doc/imgs_results/1105.jpg" width="800">
<img src="./doc/imgs_results/1110.jpg" width="800">
<img src="./doc/imgs_results/1112.jpg" width="800">
</div>
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
请扫描下面二维码完成问卷填写获取加群二维码和OCR方向的炼丹秘籍
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
<a name="贡献代码"></a>
## 贡献代码
我们非常欢迎你为PaddleOCR贡献代码也十分感谢你的反馈。
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitgnore、处理手动设置PYTHONPATH环境变量的问题
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议并简化了PaddleOCR的部分代码风格。
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务支持快速发布可调用的Restful API服务。

View File

@ -1,302 +0,0 @@
English | [简体中文](README.md)
## INTRODUCTION
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates**、
- 2020.7.9 Add recognition model to support space, [recognition result](#space Chinese OCR results). For more information: [Recognition](./doc/doc_ch/recognition.md) and [quickstart](./doc/doc_ch/quickstart.md)
- 2020.7.9 Add data auguments and learning rate decay strategies,please read [config](./doc/doc_en/config_en.md)
- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- [more](./doc/doc_en/update_en.md)
## FEATURES
- Lightweight Chinese OCR model, total model size is only 8.6M
- Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
<a name="Supported-Chinese-model-list"></a>
### Supported Chinese models list:
|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link|
|-|-|-|-|-|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
For testing our Chinese OCR onlinehttps://www.paddlepaddle.org.cn/hub/scene/ocr
**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:**
## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE**
![](doc/imgs_results/11.jpg)
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) , [General Chinese OCR results](#General-Chinese-OCR-results) and [Support for space Recognition Model](#Space-Chinese-OCR-results).
#### 1. ENVIRONMENT CONFIGURATION
Please see [Quick installation](./doc/doc_en/installation_en.md)
#### 2. DOWNLOAD INFERENCE MODELS
#### (1) Download lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
Copy the detection and recognition 'inference model' address in [Chinese model List](#Supported-Chinese-model-list), download and unpack:
```
mkdir inference && cd inference
# Download the detection part of the Chinese OCR and decompress it
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition part of the Chinese OCR and decompress it
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
cd ..
```
Take lightweight Chinese OCR model as an example:
```
mkdir inference && cd inference
# Download the detection part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
# Download the space-recognized part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar && tar xf ch_rec_mv3_crnn_enhance_infer.tar
cd ..
```
After the decompression is completed, the file structure should be as follows:
```
|-inference
|-ch_rec_mv3_crnn
|- model
|- params
|-ch_det_mv3_db
|- model
|- params
...
```
#### 3. SINGLE IMAGE AND BATCH PREDICTION
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default.
```bash
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# Prediction on a batch of images by specifying image folder path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# If you want to use CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
To run inference of the space-Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/"
```
For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)
## DOCUMENTATION
- [Quick installation](./doc/doc_en/installation_en.md)
- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/doc_en/inference_en.md)
- [Introduction of yml file](./doc/doc_en/config_en.md)
- [Dataset](./doc/doc_en/datasets_en.md)
- [FAQ]((#FAQ)
## TEXT DETECTION ALGORITHM
PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link|
|-|-|-|-|-|-|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training datathe related configuration and pre-trained models for Chinese detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
## TEXT RECOGNITION ALGORITHM
PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|-|-|-|-|-|
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
## END-TO-END OCR ALGORITHM
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
<a name="lightweight-Chinese-OCR-results"></a>
## LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
<a name="General-Chinese-OCR-results"></a>
## General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)
<a name="Space-Chinese-OCR-results"></a>
## space Chinese OCR results
### LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/img_11.jpg)
### General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg)
<a name="FAQ"></a>
## FAQ
1. Error when using attention-based recognition model: KeyError: 'predict'
The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss.
2. About inference speed
When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values.
3. Service deployment and mobile deployment
It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates.
4. Release time of self-developed algorithm
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
[more](./doc/doc_en/FAQ_en.md)
## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP
WeChat: paddlehelp, note OCR, our assistant will get you into the group~
<img src="./doc/paddlehelp.jpg" width = "200" height = "200" />
## REFERENCES
```
1. EAST:
@inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
## LICENSE
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
## CONTRIBUTION
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation.
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.

17
__init__.py Normal file
View File

@ -0,0 +1,17 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
__all__ = ['PaddleOCR', 'draw_ocr']
from .paddleocr import PaddleOCR
from .tools.infer.utility import draw_ocr

44
configs/cls/cls_mv3.yml Executable file
View File

@ -0,0 +1,44 @@
Global:
algorithm: CLS
use_gpu: False
epoch_num: 100
log_smooth_window: 20
print_batch_step: 100
save_model_dir: output/cls_mv3
save_epoch_step: 3
eval_batch_step: 500
train_batch_size_per_card: 512
test_batch_size_per_card: 512
image_shape: [3, 48, 192]
label_list: ['0','180']
distort: True
reader_yml: ./configs/cls/cls_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.cls_model,ClsModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.35
model_name: small
Head:
function: ppocr.modeling.heads.cls_head,ClsHead
class_dim: 2
Loss:
function: ppocr.modeling.losses.cls_loss,ClsLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 1169
total_epoch: 100

13
configs/cls/cls_reader.yml Executable file
View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data/cls
label_file_path: ./train_data/cls/train.txt
EvalReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader
img_set_dir: ./train_data/cls
label_file_path: ./train_data/cls/test.txt
TestReader:
reader_function: ppocr.data.cls.dataset_traversal,SimpleReader

View File

@ -49,6 +49,6 @@ Optimizer:
PostProcess: PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3 thresh: 0.3
box_thresh: 0.7 box_thresh: 0.6
max_candidates: 1000 max_candidates: 1000
unclip_ratio: 2.0 unclip_ratio: 1.5

59
configs/det/det_mv3_db_v1.1.yml Executable file
View File

@ -0,0 +1,59 @@
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_db/
save_epoch_step: 200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
train_batch_size_per_card: 16
test_batch_size_per_card: 16
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
checkpoints:
save_res_path: ./output/det_db/predicts_db.txt
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 96
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 16
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5

View File

@ -0,0 +1,57 @@
Global:
algorithm: DB
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_r_18_vd_db/
save_epoch_step: 200
eval_batch_step: [3000, 2000]
train_batch_size_per_card: 8
test_batch_size_per_card: 1
image_shape: [3, 640, 640]
reader_yml: ./configs/det/det_db_icdar15_reader.yml
pretrain_weights: ./pretrain_models/ResNet18_vd_pretrained/
save_res_path: ./output/det_r18_vd_db/predicts_db.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd,ResNet
layers: 18
Head:
function: ppocr.modeling.heads.det_db_head,DBHead
model_name: large
k: 50
inner_channels: 256
out_channels: 2
Loss:
function: ppocr.modeling.losses.det_db_loss,DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 32
total_epoch: 1200
PostProcess:
function: ppocr.postprocess.db_postprocess,DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5

View File

@ -0,0 +1,50 @@
Global:
algorithm: SAST
use_gpu: true
epoch_num: 2000
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_sast/
save_epoch_step: 20
eval_batch_step: 5000
train_batch_size_per_card: 8
test_batch_size_per_card: 8
image_shape: [3, 512, 512]
reader_yml: ./configs/det/det_sast_icdar15_reader.yml
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
save_res_path: ./output/det_sast/predicts_sast.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.det_sast_head,SASTHead
model_name: large
only_fpn_up: False
# with_cab: False
with_cab: True
Loss:
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
Optimizer:
function: ppocr.optimizer,RMSProp
base_lr: 0.001
decay:
function: piecewise_decay
boundaries: [30000, 50000, 80000, 100000, 150000]
decay_rate: 0.3
PostProcess:
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
score_thresh: 0.5
sample_pts_num: 2
nms_thresh: 0.2
expand_scale: 1.0
shrink_ratio_of_width: 0.3

View File

@ -0,0 +1,50 @@
Global:
algorithm: SAST
use_gpu: true
epoch_num: 2000
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/det_sast/
save_epoch_step: 20
eval_batch_step: 5000
train_batch_size_per_card: 8
test_batch_size_per_card: 1
image_shape: [3, 512, 512]
reader_yml: ./configs/det/det_sast_totaltext_reader.yml
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
save_res_path: ./output/det_sast/predicts_sast.txt
checkpoints:
save_inference_dir:
Architecture:
function: ppocr.modeling.architectures.det_model,DetModel
Backbone:
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.det_sast_head,SASTHead
model_name: large
only_fpn_up: False
# with_cab: False
with_cab: True
Loss:
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
Optimizer:
function: ppocr.optimizer,RMSProp
base_lr: 0.001
decay:
function: piecewise_decay
boundaries: [30000, 50000, 80000, 100000, 150000]
decay_rate: 0.3
PostProcess:
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
score_thresh: 0.5
sample_pts_num: 6
nms_thresh: 0.2
expand_scale: 1.2
shrink_ratio_of_width: 0.2

View File

@ -0,0 +1,24 @@
TrainReader:
reader_function: ppocr.data.det.dataset_traversal,TrainReader
process_function: ppocr.data.det.sast_process,SASTProcessTrain
num_workers: 8
img_set_dir: ./train_data/
label_file_path: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
data_ratio_list: [0.1, 0.45, 0.3, 0.15]
min_crop_side_ratio: 0.3
min_crop_size: 24
min_text_size: 4
max_text_size: 512
EvalReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
img_set_dir: ./train_data/icdar2015/text_localization/
label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
max_side_len: 1536
TestReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
infer_img: ./train_data/icdar2015/text_localization/ch4_test_images/img_11.jpg
max_side_len: 1536

View File

@ -0,0 +1,24 @@
TrainReader:
reader_function: ppocr.data.det.dataset_traversal,TrainReader
process_function: ppocr.data.det.sast_process,SASTProcessTrain
num_workers: 8
img_set_dir: ./train_data/
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
data_ratio_list: [0.5, 0.5]
min_crop_side_ratio: 0.3
min_crop_size: 24
min_text_size: 4
max_text_size: 512
EvalReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
img_set_dir: ./train_data/
label_file_path: ./train_data/total_text_icdar_14pt/test_label_json.txt
max_side_len: 768
TestReader:
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
process_function: ppocr.data.det.sast_process,SASTProcessTest
infer_img: ./train_data/afs/total_text/Images/Test/img623.jpg
max_side_len: 768

View File

@ -0,0 +1,52 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 128
test_batch_size_per_card: 128
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_vd,ResNet
layers: 34
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00004
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00004
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000

View File

@ -0,0 +1,54 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_CRNN
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: true
use_space_char: true
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
fc_decay: 0.00001
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0005
l2_decay: 0.00001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
step_each_epoch: 254
total_epoch: 500
warmup_minibatch: 1000

View File

@ -0,0 +1,53 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/en_number
save_epoch_step: 3
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 30
character_type: ch
character_dict_path: ./ppocr/utils/ic15_dict.txt
loss_type: ctc
distort: false
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_en_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay_warmup
warmup_minibatch: 1000
step_each_epoch: 6530
total_epoch: 500

View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/en_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/en_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

View File

@ -0,0 +1,52 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_french
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: french
character_dict_path: ./ppocr/utils/french_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500

View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/french_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/french_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

View File

@ -0,0 +1,52 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_german
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: german
character_dict_path: ./ppocr/utils/german_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_ger_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500

View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/de_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/de_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

View File

@ -0,0 +1,52 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_japan
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: japan
character_dict_path: ./ppocr/utils/japan_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_japan_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500

View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/japan_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/japan_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

View File

@ -0,0 +1,52 @@
Global:
algorithm: CRNN
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_korean
save_epoch_step: 1
eval_batch_step: 2000
train_batch_size_per_card: 256
test_batch_size_per_card: 256
image_shape: [3, 32, 320]
max_text_length: 25
character_type: korean
character_dict_path: ./ppocr/utils/korean_dict.txt
loss_type: ctc
distort: true
use_space_char: false
reader_yml: ./configs/rec/multi_languages/rec_korean_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Head:
function: ppocr.modeling.heads.rec_ctc_head,CTCPredict
encoder_type: rnn
SeqRNN:
hidden_size: 48
Loss:
function: ppocr.modeling.losses.rec_ctc_loss,CTCLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
l2_decay: 0.00001
base_lr: 0.001
beta1: 0.9
beta2: 0.999
decay:
function: cosine_decay
step_each_epoch: 254
total_epoch: 500

View File

@ -0,0 +1,13 @@
TrainReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
num_workers: 8
img_set_dir: ./train_data
label_file_path: ./train_data/korean_train.txt
EvalReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader
img_set_dir: ./train_data
label_file_path: ./train_data/korean_eval.txt
TestReader:
reader_function: ppocr.data.rec.dataset_traversal,SimpleReader

View File

@ -0,0 +1,49 @@
Global:
algorithm: SRN
use_gpu: true
epoch_num: 72
log_smooth_window: 20
print_batch_step: 10
save_model_dir: output/rec_pvam_withrotate
save_epoch_step: 1
eval_batch_step: 8000
train_batch_size_per_card: 64
test_batch_size_per_card: 1
image_shape: [1, 64, 256]
max_text_length: 25
character_type: en
loss_type: srn
num_heads: 8
average_window: 0.15
max_average_window: 15625
min_average_window: 10000
reader_yml: ./configs/rec/rec_benchmark_reader.yml
pretrain_weights:
checkpoints:
save_inference_dir:
infer_img:
Architecture:
function: ppocr.modeling.architectures.rec_model,RecModel
Backbone:
function: ppocr.modeling.backbones.rec_resnet_fpn,ResNet
layers: 50
Head:
function: ppocr.modeling.heads.rec_srn_all_head,SRNPredict
encoder_type: rnn
num_encoder_TUs: 2
num_decoder_TUs: 4
hidden_dims: 512
SeqRNN:
hidden_size: 256
Loss:
function: ppocr.modeling.losses.rec_srn_loss,SRNLoss
Optimizer:
function: ppocr.optimizer,AdamDecay
base_lr: 0.0001
beta1: 0.9
beta2: 0.999

View File

@ -1,6 +1,6 @@
# 如何快速测试 # 如何快速测试
### 1. 安装最新版本的Android Studio ### 1. 安装最新版本的Android Studio
可以从https://developer.android.com/studio下载。本Demo使用是4.0版本Android Studio编写。 可以从https://developer.android.com/studio 下载。本Demo使用是4.0版本Android Studio编写。
### 2. 按照NDK 20 以上版本 ### 2. 按照NDK 20 以上版本
Demo测试的时候使用的是NDK 20b版本20版本以上均可以支持编译成功。 Demo测试的时候使用的是NDK 20b版本20版本以上均可以支持编译成功。

View File

@ -3,11 +3,11 @@ import java.security.MessageDigest
apply plugin: 'com.android.application' apply plugin: 'com.android.application'
android { android {
compileSdkVersion 28 compileSdkVersion 29
defaultConfig { defaultConfig {
applicationId "com.baidu.paddle.lite.demo.ocr" applicationId "com.baidu.paddle.lite.demo.ocr"
minSdkVersion 15 minSdkVersion 23
targetSdkVersion 28 targetSdkVersion 29
versionCode 1 versionCode 1
versionName "1.0" versionName "1.0"
testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner" testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
@ -39,9 +39,8 @@ android {
dependencies { dependencies {
implementation fileTree(include: ['*.jar'], dir: 'libs') implementation fileTree(include: ['*.jar'], dir: 'libs')
implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'com.android.support.constraint:constraint-layout:1.1.3' implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
implementation 'com.android.support:design:28.0.0'
testImplementation 'junit:junit:4.12' testImplementation 'junit:junit:4.12'
androidTestImplementation 'com.android.support.test:runner:1.0.2' androidTestImplementation 'com.android.support.test:runner:1.0.2'
androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2' androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'

View File

@ -14,10 +14,10 @@
android:roundIcon="@mipmap/ic_launcher_round" android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true" android:supportsRtl="true"
android:theme="@style/AppTheme"> android:theme="@style/AppTheme">
<!-- to test MiniActivity, change this to com.baidu.paddle.lite.demo.ocr.MiniActivity -->
<activity android:name="com.baidu.paddle.lite.demo.ocr.MainActivity"> <activity android:name="com.baidu.paddle.lite.demo.ocr.MainActivity">
<intent-filter> <intent-filter>
<action android:name="android.intent.action.MAIN"/> <action android:name="android.intent.action.MAIN"/>
<category android:name="android.intent.category.LAUNCHER"/> <category android:name="android.intent.category.LAUNCHER"/>
</intent-filter> </intent-filter>
</activity> </activity>
@ -25,6 +25,15 @@
android:name="com.baidu.paddle.lite.demo.ocr.SettingsActivity" android:name="com.baidu.paddle.lite.demo.ocr.SettingsActivity"
android:label="Settings"> android:label="Settings">
</activity> </activity>
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="com.baidu.paddle.lite.demo.ocr.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths"></meta-data>
</provider>
</application> </application>
</manifest> </manifest>

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

View File

@ -4,112 +4,111 @@
#include "native.h" #include "native.h"
#include "ocr_ppredictor.h" #include "ocr_ppredictor.h"
#include <string>
#include <algorithm> #include <algorithm>
#include <paddle_api.h> #include <paddle_api.h>
#include <string>
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode); static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode);
extern "C" extern "C" JNIEXPORT jlong JNICALL
JNIEXPORT jlong JNICALL Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_init(JNIEnv *env, jobject thiz, JNIEnv *env, jobject thiz, jstring j_det_model_path,
jstring j_det_model_path, jstring j_rec_model_path, jstring j_cls_model_path, jint j_thread_num,
jstring j_rec_model_path, jstring j_cpu_mode) {
jint j_thread_num, std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path);
jstring j_cpu_mode) { std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path);
std::string det_model_path = jstring_to_cpp_string(env, j_det_model_path); std::string cls_model_path = jstring_to_cpp_string(env, j_cls_model_path);
std::string rec_model_path = jstring_to_cpp_string(env, j_rec_model_path); int thread_num = j_thread_num;
int thread_num = j_thread_num; std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode);
std::string cpu_mode = jstring_to_cpp_string(env, j_cpu_mode); ppredictor::OCR_Config conf;
ppredictor::OCR_Config conf; conf.thread_num = thread_num;
conf.thread_num = thread_num; conf.mode = str_to_cpu_mode(cpu_mode);
conf.mode = str_to_cpu_mode(cpu_mode); ppredictor::OCR_PPredictor *orc_predictor =
ppredictor::OCR_PPredictor *orc_predictor = new ppredictor::OCR_PPredictor{conf}; new ppredictor::OCR_PPredictor{conf};
orc_predictor->init_from_file(det_model_path, rec_model_path); orc_predictor->init_from_file(det_model_path, rec_model_path, cls_model_path);
return reinterpret_cast<jlong>(orc_predictor); return reinterpret_cast<jlong>(orc_predictor);
} }
/** /**
* "LITE_POWER_HIGH" paddle::lite_api::LITE_POWER_HIGH * "LITE_POWER_HIGH" convert to paddle::lite_api::LITE_POWER_HIGH
* @param cpu_mode * @param cpu_mode
* @return * @return
*/ */
static paddle::lite_api::PowerMode str_to_cpu_mode(const std::string &cpu_mode) { static paddle::lite_api::PowerMode
static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{ str_to_cpu_mode(const std::string &cpu_mode) {
{"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH}, static std::map<std::string, paddle::lite_api::PowerMode> cpu_mode_map{
{"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH}, {"LITE_POWER_HIGH", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL}, {"LITE_POWER_LOW", paddle::lite_api::LITE_POWER_HIGH},
{"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND}, {"LITE_POWER_FULL", paddle::lite_api::LITE_POWER_FULL},
{"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH}, {"LITE_POWER_NO_BIND", paddle::lite_api::LITE_POWER_NO_BIND},
{"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW} {"LITE_POWER_RAND_HIGH", paddle::lite_api::LITE_POWER_RAND_HIGH},
}; {"LITE_POWER_RAND_LOW", paddle::lite_api::LITE_POWER_RAND_LOW}};
std::string upper_key; std::string upper_key;
std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(), ::toupper); std::transform(cpu_mode.cbegin(), cpu_mode.cend(), upper_key.begin(),
auto index = cpu_mode_map.find(upper_key); ::toupper);
if (index == cpu_mode_map.end()) { auto index = cpu_mode_map.find(upper_key);
LOGE("cpu_mode not found %s", upper_key.c_str()); if (index == cpu_mode_map.end()) {
return paddle::lite_api::LITE_POWER_HIGH; LOGE("cpu_mode not found %s", upper_key.c_str());
} else { return paddle::lite_api::LITE_POWER_HIGH;
return index->second; } else {
} return index->second;
}
} }
extern "C" extern "C" JNIEXPORT jfloatArray JNICALL
JNIEXPORT jfloatArray JNICALL Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_forward(JNIEnv *env, jobject thiz, JNIEnv *env, jobject thiz, jlong java_pointer, jfloatArray buf,
jlong java_pointer, jfloatArray buf, jfloatArray ddims, jobject original_image) {
jfloatArray ddims, LOGI("begin to run native forward");
jobject original_image) { if (java_pointer == 0) {
LOGI("begin to run native forward"); LOGE("JAVA pointer is NULL");
if (java_pointer == 0) { return cpp_array_to_jfloatarray(env, nullptr, 0);
LOGE("JAVA pointer is NULL"); }
return cpp_array_to_jfloatarray(env, nullptr, 0); cv::Mat origin = bitmap_to_cv_mat(env, original_image);
} if (origin.size == 0) {
cv::Mat origin = bitmap_to_cv_mat(env, original_image); LOGE("origin bitmap cannot convert to CV Mat");
if (origin.size == 0) { return cpp_array_to_jfloatarray(env, nullptr, 0);
LOGE("origin bitmap cannot convert to CV Mat"); }
return cpp_array_to_jfloatarray(env, nullptr, 0); ppredictor::OCR_PPredictor *ppredictor =
} (ppredictor::OCR_PPredictor *)java_pointer;
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer; std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims);
std::vector<float> dims_float_arr = jfloatarray_to_float_vector(env, ddims); std::vector<int64_t> dims_arr;
std::vector<int64_t> dims_arr; dims_arr.resize(dims_float_arr.size());
dims_arr.resize(dims_float_arr.size()); std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
std::copy(dims_float_arr.cbegin(), dims_float_arr.cend(), dims_arr.begin());
// 这里值有点大就不调用jfloatarray_to_float_vector了 // 这里值有点大就不调用jfloatarray_to_float_vector了
int64_t buf_len = (int64_t) env->GetArrayLength(buf); int64_t buf_len = (int64_t)env->GetArrayLength(buf);
jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE); jfloat *buf_data = env->GetFloatArrayElements(buf, JNI_FALSE);
float *data = (jfloat *) buf_data; float *data = (jfloat *)buf_data;
std::vector<ppredictor::OCRPredictResult> results = ppredictor->infer_ocr(dims_arr, data, std::vector<ppredictor::OCRPredictResult> results =
buf_len, ppredictor->infer_ocr(dims_arr, data, buf_len, NET_OCR, origin);
NET_OCR, origin); LOGI("infer_ocr finished with boxes %ld", results.size());
LOGI("infer_ocr finished with boxes %ld", results.size()); // 这里将std::vector<ppredictor::OCRPredictResult> 序列化成
// 这里将std::vector<ppredictor::OCRPredictResult> 序列化成 float数组传输到java层再反序列化 // float数组传输到java层再反序列化
std::vector<float> float_arr; std::vector<float> float_arr;
for (const ppredictor::OCRPredictResult &r :results) { for (const ppredictor::OCRPredictResult &r : results) {
float_arr.push_back(r.points.size()); float_arr.push_back(r.points.size());
float_arr.push_back(r.word_index.size()); float_arr.push_back(r.word_index.size());
float_arr.push_back(r.score); float_arr.push_back(r.score);
for (const std::vector<int> &point : r.points) { for (const std::vector<int> &point : r.points) {
float_arr.push_back(point.at(0)); float_arr.push_back(point.at(0));
float_arr.push_back(point.at(1)); float_arr.push_back(point.at(1));
}
for (int index: r.word_index) {
float_arr.push_back(index);
}
} }
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size()); for (int index : r.word_index) {
float_arr.push_back(index);
}
}
return cpp_array_to_jfloatarray(env, float_arr.data(), float_arr.size());
} }
extern "C" extern "C" JNIEXPORT void JNICALL
JNIEXPORT void JNICALL Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(
Java_com_baidu_paddle_lite_demo_ocr_OCRPredictorNative_release(JNIEnv *env, jobject thiz, JNIEnv *env, jobject thiz, jlong java_pointer) {
jlong java_pointer){ if (java_pointer == 0) {
if (java_pointer == 0) { LOGE("JAVA pointer is NULL");
LOGE("JAVA pointer is NULL"); return;
return; }
} ppredictor::OCR_PPredictor *ppredictor =
ppredictor::OCR_PPredictor *ppredictor = (ppredictor::OCR_PPredictor *) java_pointer; (ppredictor::OCR_PPredictor *)java_pointer;
delete ppredictor; delete ppredictor;
} }

View File

@ -0,0 +1,46 @@
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "ocr_cls_process.h"
#include <cmath>
#include <cstring>
#include <fstream>
#include <iostream>
#include <iostream>
#include <vector>
const std::vector<int> CLS_IMAGE_SHAPE = {3, 32, 100};
cv::Mat cls_resize_img(const cv::Mat &img) {
int imgC = CLS_IMAGE_SHAPE[0];
int imgW = CLS_IMAGE_SHAPE[2];
int imgH = CLS_IMAGE_SHAPE[1];
float ratio = float(img.cols) / float(img.rows);
int resize_w = 0;
if (ceilf(imgH * ratio) > imgW)
resize_w = imgW;
else
resize_w = int(ceilf(imgH * ratio));
cv::Mat resize_img;
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
cv::INTER_CUBIC);
if (resize_w < imgW) {
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, int(imgW - resize_w),
cv::BORDER_CONSTANT, {0, 0, 0});
}
return resize_img;
}

View File

@ -0,0 +1,23 @@
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "common.h"
#include <opencv2/opencv.hpp>
#include <vector>
extern const std::vector<int> CLS_IMAGE_SHAPE;
cv::Mat cls_resize_img(const cv::Mat &img);

View File

@ -3,184 +3,237 @@
// //
#include "ocr_ppredictor.h" #include "ocr_ppredictor.h"
#include "preprocess.h"
#include "common.h" #include "common.h"
#include "ocr_db_post_process.h" #include "ocr_cls_process.h"
#include "ocr_crnn_process.h" #include "ocr_crnn_process.h"
#include "ocr_db_post_process.h"
#include "preprocess.h"
namespace ppredictor { namespace ppredictor {
OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) { OCR_PPredictor::OCR_PPredictor(const OCR_Config &config) : _config(config) {}
int OCR_PPredictor::init(const std::string &det_model_content,
const std::string &rec_model_content,
const std::string &cls_model_content) {
_det_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_nb(det_model_content);
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_nb(rec_model_content);
_cls_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_cls_predictor->init_nb(cls_model_content);
return RETURN_OK;
} }
int int OCR_PPredictor::init_from_file(const std::string &det_model_path,
OCR_PPredictor::init(const std::string &det_model_content, const std::string &rec_model_content) { const std::string &rec_model_path,
_det_predictor = std::unique_ptr<PPredictor>( const std::string &cls_model_path) {
new PPredictor{_config.thread_num, NET_OCR, _config.mode}); _det_predictor = std::unique_ptr<PPredictor>(
_det_predictor->init_nb(det_model_content); new PPredictor{_config.thread_num, NET_OCR, _config.mode});
_det_predictor->init_from_file(det_model_path);
_rec_predictor = std::unique_ptr<PPredictor>( _rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode}); new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_nb(rec_model_content); _rec_predictor->init_from_file(rec_model_path);
return RETURN_OK;
}
int OCR_PPredictor::init_from_file(const std::string &det_model_path, const std::string &rec_model_path){ _cls_predictor = std::unique_ptr<PPredictor>(
_det_predictor = std::unique_ptr<PPredictor>( new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
new PPredictor{_config.thread_num, NET_OCR, _config.mode}); _cls_predictor->init_from_file(cls_model_path);
_det_predictor->init_from_file(det_model_path); return RETURN_OK;
_rec_predictor = std::unique_ptr<PPredictor>(
new PPredictor{_config.thread_num, NET_OCR_INTERNAL, _config.mode});
_rec_predictor->init_from_file(rec_model_path);
return RETURN_OK;
} }
/** /**
* * for debug use, show result of First Step
* @param filter_boxes * @param filter_boxes
* @param boxes * @param boxes
* @param srcimg * @param srcimg
*/ */
static void visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes, static void
const std::vector<std::vector<std::vector<int>>> &boxes, visual_img(const std::vector<std::vector<std::vector<int>>> &filter_boxes,
const cv::Mat &srcimg) { const std::vector<std::vector<std::vector<int>>> &boxes,
// visualization const cv::Mat &srcimg) {
cv::Point rook_points[filter_boxes.size()][4]; // visualization
for (int n = 0; n < filter_boxes.size(); n++) { cv::Point rook_points[filter_boxes.size()][4];
for (int m = 0; m < filter_boxes[0].size(); m++) { for (int n = 0; n < filter_boxes.size(); n++) {
rook_points[n][m] = cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1])); for (int m = 0; m < filter_boxes[0].size(); m++) {
} rook_points[n][m] =
cv::Point(int(filter_boxes[n][m][0]), int(filter_boxes[n][m][1]));
} }
}
cv::Mat img_vis; cv::Mat img_vis;
srcimg.copyTo(img_vis); srcimg.copyTo(img_vis);
for (int n = 0; n < boxes.size(); n++) { for (int n = 0; n < boxes.size(); n++) {
const cv::Point *ppt[1] = {rook_points[n]}; const cv::Point *ppt[1] = {rook_points[n]};
int npt[] = {4}; int npt[] = {4};
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0); cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
} }
// 调试用,自行替换需要修改的路径 // 调试用,自行替换需要修改的路径
cv::imwrite("/sdcard/1/vis.png", img_vis); cv::imwrite("/sdcard/1/vis.png", img_vis);
} }
std::vector<OCRPredictResult> std::vector<OCRPredictResult>
OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len, OCR_PPredictor::infer_ocr(const std::vector<int64_t> &dims,
int net_flag, cv::Mat &origin) { const float *input_data, int input_len, int net_flag,
cv::Mat &origin) {
PredictorInput input = _det_predictor->get_first_input();
input.set_dims(dims);
input.set_data(input_data, input_len);
std::vector<PredictorOutput> results = _det_predictor->infer();
PredictorOutput &res = results.at(0);
std::vector<std::vector<std::vector<int>>> filtered_box = calc_filtered_boxes(
res.get_float_data(), res.get_size(), (int)dims[2], (int)dims[3], origin);
LOGI("Filter_box size %ld", filtered_box.size());
return infer_rec(filtered_box, origin);
}
PredictorInput input = _det_predictor->get_first_input(); std::vector<OCRPredictResult> OCR_PPredictor::infer_rec(
const std::vector<std::vector<std::vector<int>>> &boxes,
const cv::Mat &origin_img) {
std::vector<float> mean = {0.5f, 0.5f, 0.5f};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<OCRPredictResult> ocr_results;
PredictorInput input = _rec_predictor->get_first_input();
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
const std::vector<std::vector<int>> &box = *bp;
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
crop_img = infer_cls(crop_img);
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
dims[2] = input_image.rows;
dims[3] = input_image.cols;
input.set_dims(dims); input.set_dims(dims);
input.set_data(input_data, input_len);
std::vector<PredictorOutput> results = _det_predictor->infer(); neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
PredictorOutput &res = results.at(0); scale);
std::vector<std::vector<std::vector<int>>> filtered_box
= calc_filtered_boxes(res.get_float_data(), res.get_size(), (int) dims[2], (int) dims[3], std::vector<PredictorOutput> results = _rec_predictor->infer();
origin);
LOGI("Filter_box size %ld", filtered_box.size()); OCRPredictResult res;
return infer_rec(filtered_box, origin); res.word_index = postprocess_rec_word_index(results.at(0));
if (res.word_index.empty()) {
continue;
}
res.score = postprocess_rec_score(results.at(1));
res.points = box;
ocr_results.emplace_back(std::move(res));
}
LOGI("ocr_results finished %lu", ocr_results.size());
return ocr_results;
} }
std::vector<OCRPredictResult> cv::Mat OCR_PPredictor::infer_cls(const cv::Mat &img, float thresh) {
OCR_PPredictor::infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes, std::vector<float> mean = {0.5f, 0.5f, 0.5f};
const cv::Mat &origin_img) { std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
std::vector<float> mean = {0.5f, 0.5f, 0.5f}; std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<float> scale = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f}; std::vector<OCRPredictResult> ocr_results;
std::vector<int64_t> dims = {1, 3, 0, 0};
std::vector<OCRPredictResult> ocr_results;
PredictorInput input = _rec_predictor->get_first_input(); PredictorInput input = _cls_predictor->get_first_input();
for (auto bp = boxes.crbegin(); bp != boxes.crend(); ++bp) {
const std::vector<std::vector<int>> &box = *bp;
cv::Mat crop_img = get_rotate_crop_image(origin_img, box);
float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
cv::Mat input_image = crnn_resize_img(crop_img, wh_ratio);
input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
dims[2] = input_image.rows; cv::Mat input_image = cls_resize_img(img);
dims[3] = input_image.cols; input_image.convertTo(input_image, CV_32FC3, 1 / 255.0f);
input.set_dims(dims); const float *dimg = reinterpret_cast<const float *>(input_image.data);
int input_size = input_image.rows * input_image.cols;
neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean, scale); dims[2] = input_image.rows;
dims[3] = input_image.cols;
input.set_dims(dims);
std::vector<PredictorOutput> results = _rec_predictor->infer(); neon_mean_scale(dimg, input.get_mutable_float_data(), input_size, mean,
scale);
OCRPredictResult res; std::vector<PredictorOutput> results = _cls_predictor->infer();
res.word_index = postprocess_rec_word_index(results.at(0));
if (res.word_index.empty()) { const float *scores = results.at(0).get_float_data();
continue; const int *labels = results.at(1).get_int_data();
} for (int64_t i = 0; i < results.at(0).get_size(); i++) {
res.score = postprocess_rec_score(results.at(1)); LOGI("output scores [%f]", scores[i]);
res.points = box; }
ocr_results.emplace_back(std::move(res)); for (int64_t i = 0; i < results.at(1).get_size(); i++) {
} LOGI("output label [%d]", labels[i]);
LOGI("ocr_results finished %lu", ocr_results.size()); }
return ocr_results; int label_idx = labels[0];
float score = scores[label_idx];
cv::Mat srcimg;
img.copyTo(srcimg);
if (label_idx % 2 == 1 && score > thresh) {
cv::rotate(srcimg, srcimg, 1);
}
return srcimg;
} }
std::vector<std::vector<std::vector<int>>> std::vector<std::vector<std::vector<int>>>
OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size, int output_height, OCR_PPredictor::calc_filtered_boxes(const float *pred, int pred_size,
int output_width, const cv::Mat &origin) { int output_height, int output_width,
const double threshold = 0.3; const cv::Mat &origin) {
const double maxvalue = 1; const double threshold = 0.3;
const double maxvalue = 1;
cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F); cv::Mat pred_map = cv::Mat::zeros(output_height, output_width, CV_32F);
memcpy(pred_map.data, pred, pred_size * sizeof(float)); memcpy(pred_map.data, pred, pred_size * sizeof(float));
cv::Mat cbuf_map; cv::Mat cbuf_map;
pred_map.convertTo(cbuf_map, CV_8UC1); pred_map.convertTo(cbuf_map, CV_8UC1);
cv::Mat bit_map; cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
std::vector<std::vector<std::vector<int>>> boxes = boxes_from_bitmap(pred_map, bit_map); std::vector<std::vector<std::vector<int>>> boxes =
float ratio_h = output_height * 1.0f / origin.rows; boxes_from_bitmap(pred_map, bit_map);
float ratio_w = output_width * 1.0f / origin.cols; float ratio_h = output_height * 1.0f / origin.rows;
std::vector<std::vector<std::vector<int>>> filter_boxes = filter_tag_det_res(boxes, ratio_h, float ratio_w = output_width * 1.0f / origin.cols;
ratio_w, origin); std::vector<std::vector<std::vector<int>>> filter_boxes =
return filter_boxes; filter_tag_det_res(boxes, ratio_h, ratio_w, origin);
return filter_boxes;
} }
std::vector<int> OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) { std::vector<int>
const int *rec_idx = res.get_int_data(); OCR_PPredictor::postprocess_rec_word_index(const PredictorOutput &res) {
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod(); const int *rec_idx = res.get_int_data();
const std::vector<std::vector<uint64_t>> rec_idx_lod = res.get_lod();
std::vector<int> pred_idx; std::vector<int> pred_idx;
for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) { for (int n = int(rec_idx_lod[0][0]); n < int(rec_idx_lod[0][1] * 2); n += 2) {
pred_idx.emplace_back(rec_idx[n]); pred_idx.emplace_back(rec_idx[n]);
} }
return pred_idx; return pred_idx;
} }
float OCR_PPredictor::postprocess_rec_score(const PredictorOutput &res) { float OCR_PPredictor::postprocess_rec_score(const PredictorOutput &res) {
const float *predict_batch = res.get_float_data(); const float *predict_batch = res.get_float_data();
const std::vector<int64_t> predict_shape = res.get_shape(); const std::vector<int64_t> predict_shape = res.get_shape();
const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod(); const std::vector<std::vector<uint64_t>> predict_lod = res.get_lod();
int blank = predict_shape[1]; int blank = predict_shape[1];
float score = 0.f; float score = 0.f;
int count = 0; int count = 0;
for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) { for (int n = predict_lod[0][0]; n < predict_lod[0][1] - 1; n++) {
int argmax_idx = argmax(predict_batch + n * predict_shape[1], int argmax_idx = argmax(predict_batch + n * predict_shape[1],
predict_batch + (n + 1) * predict_shape[1]); predict_batch + (n + 1) * predict_shape[1]);
float max_value = predict_batch[n * predict_shape[1] + argmax_idx]; float max_value = predict_batch[n * predict_shape[1] + argmax_idx];
if (blank - 1 - argmax_idx > 1e-5) { if (blank - 1 - argmax_idx > 1e-5) {
score += max_value; score += max_value;
count += 1; count += 1;
}
} }
if (count == 0) { }
LOGE("calc score count 0"); if (count == 0) {
} else { LOGE("calc score count 0");
score /= count; } else {
} score /= count;
LOGI("calc score: %f", score); }
return score; LOGI("calc score: %f", score);
return score;
} }
NET_TYPE OCR_PPredictor::get_net_flag() const { return NET_OCR; }
NET_TYPE OCR_PPredictor::get_net_flag() const {
return NET_OCR;
}
} }

View File

@ -4,109 +4,119 @@
#pragma once #pragma once
#include <string> #include "ppredictor.h"
#include <opencv2/opencv.hpp> #include <opencv2/opencv.hpp>
#include <paddle_api.h> #include <paddle_api.h>
#include "ppredictor.h" #include <string>
namespace ppredictor { namespace ppredictor {
/** /**
* * Config
*/ */
struct OCR_Config { struct OCR_Config {
int thread_num = 4; // 线程数 int thread_num = 4; // Thread num
paddle::lite_api::PowerMode mode = paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode paddle::lite_api::PowerMode mode =
paddle::lite_api::LITE_POWER_HIGH; // PaddleLite Mode
}; };
/** /**
* , * PolyGone Result
*/ */
struct OCRPredictResult { struct OCRPredictResult {
std::vector<int> word_index; // std::vector<int> word_index;
std::vector<std::vector<int>> points; std::vector<std::vector<int>> points;
float score; float score;
}; };
/** /**
* OCR 2 * OCR there are 2 models
* 1. 使det * 1. First modeldetselect polygones to show where are the texts
* 2. 使rec * 2. crop from the origin images, use these polygones to infer
*/ */
class OCR_PPredictor : public PPredictor_Interface { class OCR_PPredictor : public PPredictor_Interface {
public: public:
OCR_PPredictor(const OCR_Config &config); OCR_PPredictor(const OCR_Config &config);
virtual ~OCR_PPredictor() { virtual ~OCR_PPredictor() {}
} /**
* Predictor
/** * @param det_model_content
* Predictor * @param rec_model_content
* @param det_model_content * @return
* @param rec_model_content */
* @return int init(const std::string &det_model_content,
*/ const std::string &rec_model_content,
int init(const std::string &det_model_content, const std::string &rec_model_content); const std::string &cls_model_content);
int init_from_file(const std::string &det_model_path, const std::string &rec_model_path); int init_from_file(const std::string &det_model_path,
/** const std::string &rec_model_path,
* OCR结果 const std::string &cls_model_path);
* @param dims /**
* @param input_data * Return OCR result
* @param input_len * @param dims
* @param net_flag * @param input_data
* @param origin * @param input_len
* @return * @param net_flag
*/ * @param origin
virtual std::vector<OCRPredictResult> * @return
infer_ocr(const std::vector<int64_t> &dims, const float *input_data, int input_len, */
int net_flag, cv::Mat &origin); virtual std::vector<OCRPredictResult>
infer_ocr(const std::vector<int64_t> &dims, const float *input_data,
int input_len, int net_flag, cv::Mat &origin);
virtual NET_TYPE get_net_flag() const;
virtual NET_TYPE get_net_flag() const;
private: private:
/**
* calcul Polygone from the result image of first model
* @param pred
* @param output_height
* @param output_width
* @param origin
* @return
*/
std::vector<std::vector<std::vector<int>>>
calc_filtered_boxes(const float *pred, int pred_size, int output_height,
int output_width, const cv::Mat &origin);
/** /**
* * infer for second model
* @param pred *
* @param output_height * @param boxes
* @param output_width * @param origin
* @param origin * @return
* @return */
*/ std::vector<OCRPredictResult>
std::vector<std::vector<std::vector<int>>> infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes,
calc_filtered_boxes(const float *pred, int pred_size, int output_height, int output_width, const cv::Mat &origin);
const cv::Mat &origin);
/** /**
* * infer for cls model
* *
* @param boxes * @param boxes
* @param origin * @param origin
* @return * @return
*/ */
std::vector<OCRPredictResult> cv::Mat infer_cls(const cv::Mat &origin, float thresh = 0.5);
infer_rec(const std::vector<std::vector<std::vector<int>>> &boxes, const cv::Mat &origin);
/** /**
* * Postprocess or sencod model to extract text
* @param res * @param res
* @return * @return
*/ */
std::vector<int> postprocess_rec_word_index(const PredictorOutput &res); std::vector<int> postprocess_rec_word_index(const PredictorOutput &res);
/** /**
* * calculate confidence of second model text result
* @param res * @param res
* @return * @return
*/ */
float postprocess_rec_score(const PredictorOutput &res); float postprocess_rec_score(const PredictorOutput &res);
std::unique_ptr<PPredictor> _det_predictor;
std::unique_ptr<PPredictor> _rec_predictor;
OCR_Config _config;
std::unique_ptr<PPredictor> _det_predictor;
std::unique_ptr<PPredictor> _rec_predictor;
std::unique_ptr<PPredictor> _cls_predictor;
OCR_Config _config;
}; };
} }

View File

@ -7,7 +7,7 @@
namespace ppredictor { namespace ppredictor {
/** /**
* PaddleLite Preditor * PaddleLite Preditor Common Interface
*/ */
class PPredictor_Interface { class PPredictor_Interface {
public: public:
@ -21,7 +21,7 @@ public:
}; };
/** /**
* * Common Predictor
*/ */
class PPredictor : public PPredictor_Interface { class PPredictor : public PPredictor_Interface {
public: public:
@ -33,9 +33,9 @@ public:
} }
/** /**
* paddlitelite的opt模型nb格式init_paddle二选一 * init paddlitelite opt modelnb format or use ini_paddle
* @param model_content * @param model_content
* @return 0 0 * @return 0
*/ */
virtual int init_nb(const std::string &model_content); virtual int init_nb(const std::string &model_content);

View File

@ -21,10 +21,10 @@ public:
const std::vector<std::vector<uint64_t>> get_lod() const; const std::vector<std::vector<uint64_t>> get_lod() const;
const std::vector<int64_t> get_shape() const; const std::vector<int64_t> get_shape() const;
std::vector<float> data; // 通常是float返回与下面的data_int二选一 std::vector<float> data; // return float, or use data_int
std::vector<int> data_int; // 少数层是int返回与 data二选一 std::vector<int> data_int; // several layers return int or use data
std::vector<int64_t> shape; // PaddleLite输出层的shape std::vector<int64_t> shape; // PaddleLite output shape
std::vector<std::vector<uint64_t>> lod; // PaddleLite输出层的lod std::vector<std::vector<uint64_t>> lod; // PaddleLite output lod
private: private:
std::unique_ptr<const paddle::lite_api::Tensor> _tensor; std::unique_ptr<const paddle::lite_api::Tensor> _tensor;

View File

@ -19,15 +19,16 @@ package com.baidu.paddle.lite.demo.ocr;
import android.content.res.Configuration; import android.content.res.Configuration;
import android.os.Bundle; import android.os.Bundle;
import android.preference.PreferenceActivity; import android.preference.PreferenceActivity;
import android.support.annotation.LayoutRes;
import android.support.annotation.Nullable;
import android.support.v7.app.ActionBar;
import android.support.v7.app.AppCompatDelegate;
import android.support.v7.widget.Toolbar;
import android.view.MenuInflater; import android.view.MenuInflater;
import android.view.View; import android.view.View;
import android.view.ViewGroup; import android.view.ViewGroup;
import androidx.annotation.LayoutRes;
import androidx.annotation.Nullable;
import androidx.appcompat.app.ActionBar;
import androidx.appcompat.app.AppCompatDelegate;
import androidx.appcompat.widget.Toolbar;
/** /**
* A {@link PreferenceActivity} which implements and proxies the necessary calls * A {@link PreferenceActivity} which implements and proxies the necessary calls
* to be used with AppCompat. * to be used with AppCompat.

View File

@ -3,23 +3,22 @@ package com.baidu.paddle.lite.demo.ocr;
import android.Manifest; import android.Manifest;
import android.app.ProgressDialog; import android.app.ProgressDialog;
import android.content.ContentResolver; import android.content.ContentResolver;
import android.content.Context;
import android.content.Intent; import android.content.Intent;
import android.content.SharedPreferences; import android.content.SharedPreferences;
import android.content.pm.PackageManager; import android.content.pm.PackageManager;
import android.database.Cursor; import android.database.Cursor;
import android.graphics.Bitmap; import android.graphics.Bitmap;
import android.graphics.BitmapFactory; import android.graphics.BitmapFactory;
import android.media.ExifInterface;
import android.net.Uri; import android.net.Uri;
import android.os.Bundle; import android.os.Bundle;
import android.os.Environment;
import android.os.Handler; import android.os.Handler;
import android.os.HandlerThread; import android.os.HandlerThread;
import android.os.Message; import android.os.Message;
import android.preference.PreferenceManager; import android.preference.PreferenceManager;
import android.provider.MediaStore; import android.provider.MediaStore;
import android.support.annotation.NonNull;
import android.support.v4.app.ActivityCompat;
import android.support.v4.content.ContextCompat;
import android.support.v7.app.AppCompatActivity;
import android.text.method.ScrollingMovementMethod; import android.text.method.ScrollingMovementMethod;
import android.util.Log; import android.util.Log;
import android.view.Menu; import android.view.Menu;
@ -29,9 +28,17 @@ import android.widget.ImageView;
import android.widget.TextView; import android.widget.TextView;
import android.widget.Toast; import android.widget.Toast;
import androidx.annotation.NonNull;
import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;
import androidx.core.content.FileProvider;
import java.io.File; import java.io.File;
import java.io.IOException; import java.io.IOException;
import java.io.InputStream; import java.io.InputStream;
import java.text.SimpleDateFormat;
import java.util.Date;
public class MainActivity extends AppCompatActivity { public class MainActivity extends AppCompatActivity {
private static final String TAG = MainActivity.class.getSimpleName(); private static final String TAG = MainActivity.class.getSimpleName();
@ -69,6 +76,7 @@ public class MainActivity extends AppCompatActivity {
protected float[] inputMean = new float[]{}; protected float[] inputMean = new float[]{};
protected float[] inputStd = new float[]{}; protected float[] inputStd = new float[]{};
protected float scoreThreshold = 0.1f; protected float scoreThreshold = 0.1f;
private String currentPhotoPath;
protected Predictor predictor = new Predictor(); protected Predictor predictor = new Predictor();
@ -368,18 +376,56 @@ public class MainActivity extends AppCompatActivity {
} }
private void takePhoto() { private void takePhoto() {
Intent takePhotoIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE); Intent takePictureIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
if (takePhotoIntent.resolveActivity(getPackageManager()) != null) { // Ensure that there's a camera activity to handle the intent
startActivityForResult(takePhotoIntent, TAKE_PHOTO_REQUEST_CODE); if (takePictureIntent.resolveActivity(getPackageManager()) != null) {
// Create the File where the photo should go
File photoFile = null;
try {
photoFile = createImageFile();
} catch (IOException ex) {
Log.e("MainActitity", ex.getMessage(), ex);
Toast.makeText(MainActivity.this,
"Create Camera temp file failed: " + ex.getMessage(), Toast.LENGTH_SHORT).show();
}
// Continue only if the File was successfully created
if (photoFile != null) {
Log.i(TAG, "FILEPATH " + getExternalFilesDir("Pictures").getAbsolutePath());
Uri photoURI = FileProvider.getUriForFile(this,
"com.baidu.paddle.lite.demo.ocr.fileprovider",
photoFile);
currentPhotoPath = photoFile.getAbsolutePath();
takePictureIntent.putExtra(MediaStore.EXTRA_OUTPUT, photoURI);
startActivityForResult(takePictureIntent, TAKE_PHOTO_REQUEST_CODE);
Log.i(TAG, "startActivityForResult finished");
}
} }
}
private File createImageFile() throws IOException {
// Create an image file name
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(new Date());
String imageFileName = "JPEG_" + timeStamp + "_";
File storageDir = getExternalFilesDir(Environment.DIRECTORY_PICTURES);
File image = File.createTempFile(
imageFileName, /* prefix */
".bmp", /* suffix */
storageDir /* directory */
);
return image;
} }
@Override @Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) { protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data); super.onActivityResult(requestCode, resultCode, data);
if (resultCode == RESULT_OK && data != null) { if (resultCode == RESULT_OK) {
switch (requestCode) { switch (requestCode) {
case OPEN_GALLERY_REQUEST_CODE: case OPEN_GALLERY_REQUEST_CODE:
if (data == null) {
break;
}
try { try {
ContentResolver resolver = getContentResolver(); ContentResolver resolver = getContentResolver();
Uri uri = data.getData(); Uri uri = data.getData();
@ -393,9 +439,22 @@ public class MainActivity extends AppCompatActivity {
} }
break; break;
case TAKE_PHOTO_REQUEST_CODE: case TAKE_PHOTO_REQUEST_CODE:
Bundle extras = data.getExtras(); if (currentPhotoPath != null) {
Bitmap image = (Bitmap) extras.get("data"); ExifInterface exif = null;
onImageChanged(image); try {
exif = new ExifInterface(currentPhotoPath);
} catch (IOException e) {
e.printStackTrace();
}
int orientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION,
ExifInterface.ORIENTATION_UNDEFINED);
Log.i(TAG, "rotation " + orientation);
Bitmap image = BitmapFactory.decodeFile(currentPhotoPath);
image = Utils.rotateBitmap(image, orientation);
onImageChanged(image);
} else {
Log.e(TAG, "currentPhotoPath is null");
}
break; break;
default: default:
break; break;

View File

@ -0,0 +1,157 @@
package com.baidu.paddle.lite.demo.ocr;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Build;
import android.os.Bundle;
import android.os.Handler;
import android.os.HandlerThread;
import android.os.Message;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import android.widget.ImageView;
import android.widget.TextView;
import android.widget.Toast;
import androidx.appcompat.app.AppCompatActivity;
import java.io.IOException;
import java.io.InputStream;
public class MiniActivity extends AppCompatActivity {
public static final int REQUEST_LOAD_MODEL = 0;
public static final int REQUEST_RUN_MODEL = 1;
public static final int REQUEST_UNLOAD_MODEL = 2;
public static final int RESPONSE_LOAD_MODEL_SUCCESSED = 0;
public static final int RESPONSE_LOAD_MODEL_FAILED = 1;
public static final int RESPONSE_RUN_MODEL_SUCCESSED = 2;
public static final int RESPONSE_RUN_MODEL_FAILED = 3;
private static final String TAG = "MiniActivity";
protected Handler receiver = null; // Receive messages from worker thread
protected Handler sender = null; // Send command to worker thread
protected HandlerThread worker = null; // Worker thread to load&run model
protected volatile Predictor predictor = null;
private String assetModelDirPath = "models/ocr_v1_for_cpu";
private String assetlabelFilePath = "labels/ppocr_keys_v1.txt";
private Button button;
private ImageView imageView; // image result
private TextView textView; // text result
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_mini);
Log.i(TAG, "SHOW in Logcat");
// Prepare the worker thread for mode loading and inference
worker = new HandlerThread("Predictor Worker");
worker.start();
sender = new Handler(worker.getLooper()) {
public void handleMessage(Message msg) {
switch (msg.what) {
case REQUEST_LOAD_MODEL:
// Load model and reload test image
if (!onLoadModel()) {
runOnUiThread(new Runnable() {
@Override
public void run() {
Toast.makeText(MiniActivity.this, "Load model failed!", Toast.LENGTH_SHORT).show();
}
});
}
break;
case REQUEST_RUN_MODEL:
// Run model if model is loaded
final boolean isSuccessed = onRunModel();
runOnUiThread(new Runnable() {
@Override
public void run() {
if (isSuccessed){
onRunModelSuccessed();
}else{
Toast.makeText(MiniActivity.this, "Run model failed!", Toast.LENGTH_SHORT).show();
}
}
});
break;
}
}
};
sender.sendEmptyMessage(REQUEST_LOAD_MODEL); // corresponding to REQUEST_LOAD_MODEL to call onLoadModel()
imageView = findViewById(R.id.imageView);
textView = findViewById(R.id.sample_text);
button = findViewById(R.id.button);
button.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
sender.sendEmptyMessage(REQUEST_RUN_MODEL);
}
});
}
@Override
protected void onDestroy() {
onUnloadModel();
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN_MR2) {
worker.quitSafely();
} else {
worker.quit();
}
super.onDestroy();
}
/**
* call in onCreate, model init
*
* @return
*/
private boolean onLoadModel() {
if (predictor == null) {
predictor = new Predictor();
}
return predictor.init(this, assetModelDirPath, assetlabelFilePath);
}
/**
* init engine
* call in onCreate
*
* @return
*/
private boolean onRunModel() {
try {
String assetImagePath = "images/5.jpg";
InputStream imageStream = getAssets().open(assetImagePath);
Bitmap image = BitmapFactory.decodeStream(imageStream);
// Input is Bitmap
predictor.setInputImage(image);
return predictor.isLoaded() && predictor.runModel();
} catch (IOException e) {
e.printStackTrace();
return false;
}
}
private void onRunModelSuccessed() {
Log.i(TAG, "onRunModelSuccessed");
textView.setText(predictor.outputResult);
imageView.setImageBitmap(predictor.outputImage);
}
private void onUnloadModel() {
if (predictor != null) {
predictor.releaseModel();
}
}
}

View File

@ -29,16 +29,16 @@ public class OCRPredictorNative {
public OCRPredictorNative(Config config) { public OCRPredictorNative(Config config) {
this.config = config; this.config = config;
loadLibrary(); loadLibrary();
nativePointer = init(config.detModelFilename, config.recModelFilename, nativePointer = init(config.detModelFilename, config.recModelFilename,config.clsModelFilename,
config.cpuThreadNum, config.cpuPower); config.cpuThreadNum, config.cpuPower);
Log.i("OCRPredictorNative", "load success " + nativePointer); Log.i("OCRPredictorNative", "load success " + nativePointer);
} }
public void release(){ public void release() {
if (nativePointer != 0){ if (nativePointer != 0) {
nativePointer = 0; nativePointer = 0;
destory(nativePointer); // destory(nativePointer);
} }
} }
@ -55,10 +55,11 @@ public class OCRPredictorNative {
public String cpuPower; public String cpuPower;
public String detModelFilename; public String detModelFilename;
public String recModelFilename; public String recModelFilename;
public String clsModelFilename;
} }
protected native long init(String detModelPath, String recModelPath, int threadNum, String cpuMode); protected native long init(String detModelPath, String recModelPath,String clsModelPath, int threadNum, String cpuMode);
protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage); protected native float[] forward(long pointer, float[] buf, float[] ddims, Bitmap originalImage);

View File

@ -38,7 +38,7 @@ public class Predictor {
protected float scoreThreshold = 0.1f; protected float scoreThreshold = 0.1f;
protected Bitmap inputImage = null; protected Bitmap inputImage = null;
protected Bitmap outputImage = null; protected Bitmap outputImage = null;
protected String outputResult = ""; protected volatile String outputResult = "";
protected float preprocessTime = 0; protected float preprocessTime = 0;
protected float postprocessTime = 0; protected float postprocessTime = 0;
@ -46,6 +46,16 @@ public class Predictor {
public Predictor() { public Predictor() {
} }
public boolean init(Context appCtx, String modelPath, String labelPath) {
isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode);
if (!isLoaded) {
return false;
}
isLoaded = loadLabel(appCtx, labelPath);
return isLoaded;
}
public boolean init(Context appCtx, String modelPath, String labelPath, int cpuThreadNum, String cpuPowerMode, public boolean init(Context appCtx, String modelPath, String labelPath, int cpuThreadNum, String cpuPowerMode,
String inputColorFormat, String inputColorFormat,
long[] inputShape, float[] inputMean, long[] inputShape, float[] inputMean,
@ -76,11 +86,7 @@ public class Predictor {
Log.e(TAG, "Only BGR color format is supported."); Log.e(TAG, "Only BGR color format is supported.");
return false; return false;
} }
isLoaded = loadModel(appCtx, modelPath, cpuThreadNum, cpuPowerMode); boolean isLoaded = init(appCtx, modelPath, labelPath);
if (!isLoaded) {
return false;
}
isLoaded = loadLabel(appCtx, labelPath);
if (!isLoaded) { if (!isLoaded) {
return false; return false;
} }
@ -115,7 +121,8 @@ public class Predictor {
config.cpuThreadNum = cpuThreadNum; config.cpuThreadNum = cpuThreadNum;
config.detModelFilename = realPath + File.separator + "ch_det_mv3_db_opt.nb"; config.detModelFilename = realPath + File.separator + "ch_det_mv3_db_opt.nb";
config.recModelFilename = realPath + File.separator + "ch_rec_mv3_crnn_opt.nb"; config.recModelFilename = realPath + File.separator + "ch_rec_mv3_crnn_opt.nb";
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename); config.clsModelFilename = realPath + File.separator + "cls_opt_arm.nb";
Log.e("Predictor", "model path" + config.detModelFilename + " ; " + config.recModelFilename + ";" + config.clsModelFilename);
config.cpuPower = cpuPowerMode; config.cpuPower = cpuPowerMode;
paddlePredictor = new OCRPredictorNative(config); paddlePredictor = new OCRPredictorNative(config);
@ -127,12 +134,12 @@ public class Predictor {
} }
public void releaseModel() { public void releaseModel() {
if (paddlePredictor != null){ if (paddlePredictor != null) {
paddlePredictor.release(); paddlePredictor.release();
paddlePredictor = null; paddlePredictor = null;
} }
isLoaded = false; isLoaded = false;
cpuThreadNum = 4; cpuThreadNum = 1;
cpuPowerMode = "LITE_POWER_HIGH"; cpuPowerMode = "LITE_POWER_HIGH";
modelPath = ""; modelPath = "";
modelName = ""; modelName = "";
@ -222,7 +229,7 @@ public class Predictor {
for (int i = 0; i < warmupIterNum; i++) { for (int i = 0; i < warmupIterNum; i++) {
paddlePredictor.runImage(inputData, width, height, channels, inputImage); paddlePredictor.runImage(inputData, width, height, channels, inputImage);
} }
warmupIterNum = 0; // 之后不要再warm了 warmupIterNum = 0; // do not need warm
// Run inference // Run inference
start = new Date(); start = new Date();
ArrayList<OcrResultModel> results = paddlePredictor.runImage(inputData, width, height, channels, inputImage); ArrayList<OcrResultModel> results = paddlePredictor.runImage(inputData, width, height, channels, inputImage);
@ -287,9 +294,7 @@ public class Predictor {
if (image == null) { if (image == null) {
return; return;
} }
// Scale image to the size of input tensor this.inputImage = image.copy(Bitmap.Config.ARGB_8888, true);
Bitmap rgbaImage = image.copy(Bitmap.Config.ARGB_8888, true);
this.inputImage = rgbaImage;
} }
private ArrayList<OcrResultModel> postprocess(ArrayList<OcrResultModel> results) { private ArrayList<OcrResultModel> postprocess(ArrayList<OcrResultModel> results) {
@ -310,7 +315,7 @@ public class Predictor {
private void drawResults(ArrayList<OcrResultModel> results) { private void drawResults(ArrayList<OcrResultModel> results) {
StringBuffer outputResultSb = new StringBuffer(""); StringBuffer outputResultSb = new StringBuffer("");
for (int i=0;i<results.size();i++) { for (int i = 0; i < results.size(); i++) {
OcrResultModel result = results.get(i); OcrResultModel result = results.get(i);
StringBuilder sb = new StringBuilder(""); StringBuilder sb = new StringBuilder("");
sb.append(result.getLabel()); sb.append(result.getLabel());
@ -319,8 +324,8 @@ public class Predictor {
for (Point p : result.getPoints()) { for (Point p : result.getPoints()) {
sb.append("(").append(p.x).append(",").append(p.y).append(") "); sb.append("(").append(p.x).append(",").append(p.y).append(") ");
} }
Log.i(TAG, sb.toString()); Log.i(TAG, sb.toString()); // show LOG in Logcat panel
outputResultSb.append(i+1).append(": ").append(result.getLabel()).append("\n"); outputResultSb.append(i + 1).append(": ").append(result.getLabel()).append("\n");
} }
outputResult = outputResultSb.toString(); outputResult = outputResultSb.toString();
outputImage = inputImage; outputImage = inputImage;

View File

@ -5,7 +5,8 @@ import android.os.Bundle;
import android.preference.CheckBoxPreference; import android.preference.CheckBoxPreference;
import android.preference.EditTextPreference; import android.preference.EditTextPreference;
import android.preference.ListPreference; import android.preference.ListPreference;
import android.support.v7.app.ActionBar;
import androidx.appcompat.app.ActionBar;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.List; import java.util.List;

View File

@ -2,6 +2,8 @@ package com.baidu.paddle.lite.demo.ocr;
import android.content.Context; import android.content.Context;
import android.graphics.Bitmap; import android.graphics.Bitmap;
import android.graphics.Matrix;
import android.media.ExifInterface;
import android.os.Environment; import android.os.Environment;
import java.io.*; import java.io.*;
@ -110,4 +112,48 @@ public class Utils {
} }
return Bitmap.createScaledBitmap(bitmap, newWidth, newHeight, true); return Bitmap.createScaledBitmap(bitmap, newWidth, newHeight, true);
} }
public static Bitmap rotateBitmap(Bitmap bitmap, int orientation) {
Matrix matrix = new Matrix();
switch (orientation) {
case ExifInterface.ORIENTATION_NORMAL:
return bitmap;
case ExifInterface.ORIENTATION_FLIP_HORIZONTAL:
matrix.setScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_180:
matrix.setRotate(180);
break;
case ExifInterface.ORIENTATION_FLIP_VERTICAL:
matrix.setRotate(180);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_TRANSPOSE:
matrix.setRotate(90);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_90:
matrix.setRotate(90);
break;
case ExifInterface.ORIENTATION_TRANSVERSE:
matrix.setRotate(-90);
matrix.postScale(-1, 1);
break;
case ExifInterface.ORIENTATION_ROTATE_270:
matrix.setRotate(-90);
break;
default:
return bitmap;
}
try {
Bitmap bmRotated = Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), matrix, true);
bitmap.recycle();
return bmRotated;
}
catch (OutOfMemoryError e) {
e.printStackTrace();
return null;
}
}
} }

View File

@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android" <androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto" xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools" xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent" android:layout_width="match_parent"
@ -96,4 +96,4 @@
</RelativeLayout> </RelativeLayout>
</android.support.constraint.ConstraintLayout> </androidx.constraintlayout.widget.ConstraintLayout>

View File

@ -0,0 +1,46 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- for MiniActivity Use Only -->
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintLeft_toRightOf="parent"
tools:context=".MainActivity">
<TextView
android:id="@+id/sample_text"
android:layout_width="0dp"
android:layout_height="wrap_content"
android:text="Hello World!"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
app:layout_constraintTop_toBottomOf="@id/imageView"
android:scrollbars="vertical"
/>
<ImageView
android:id="@+id/imageView"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:paddingTop="20dp"
android:paddingBottom="20dp"
app:layout_constraintBottom_toTopOf="@id/imageView"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
app:layout_constraintTop_toTopOf="parent"
tools:srcCompat="@tools:sample/avatars" />
<Button
android:id="@+id/button"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginBottom="4dp"
android:text="Button"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
tools:layout_editor_absoluteX="161dp" />
</androidx.constraintlayout.widget.ConstraintLayout>

View File

@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<paths xmlns:android="http://schemas.android.com/apk/res/android">
<external-files-path name="my_images" path="Pictures" />
</paths>

View File

@ -1,4 +1,4 @@
#Thu Aug 22 15:05:37 CST 2019 #Wed Jul 22 23:48:44 CST 2020
distributionBase=GRADLE_USER_HOME distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME zipStoreBase=GRADLE_USER_HOME

View File

@ -1,8 +1,17 @@
project(ocr_system CXX C) project(ocr_system CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON) option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF) option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON) option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF) option(WITH_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_LIB "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
SET(CUDNN_LIB "" CACHE PATH "Location of libraries")
SET(TENSORRT_DIR "" CACHE PATH "Compile demo with TensorRT")
set(DEMO_NAME "ocr_system")
macro(safe_set_static_flag) macro(safe_set_static_flag)
@ -15,24 +24,60 @@ macro(safe_set_static_flag)
endforeach(flag_var) endforeach(flag_var)
endmacro() endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g -fpermissive") if (WITH_MKL)
set(CMAKE_STATIC_LIBRARY_PREFIX "") ADD_DEFINITIONS(-DUSE_MKL)
message("flags" ${CMAKE_CXX_FLAGS}) endif()
set(CMAKE_CXX_FLAGS_RELEASE "-O3")
if(NOT DEFINED PADDLE_LIB) if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib") message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif() endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name") if(NOT DEFINED OPENCV_DIR)
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif() endif()
set(OPENCV_DIR ${OPENCV_DIR}) if (WIN32)
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH) include_directories("${PADDLE_LIB}/paddle/fluid/inference")
include_directories("${PADDLE_LIB}/paddle/include")
link_directories("${PADDLE_LIB}/paddle/fluid/inference")
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/build/ NO_DEFAULT_PATH)
else ()
find_package(OpenCV REQUIRED PATHS ${OPENCV_DIR}/share/OpenCV NO_DEFAULT_PATH)
include_directories("${PADDLE_LIB}/paddle/include")
link_directories("${PADDLE_LIB}/paddle/lib")
endif ()
include_directories(${OpenCV_INCLUDE_DIRS}) include_directories(${OpenCV_INCLUDE_DIRS})
include_directories("${PADDLE_LIB}/paddle/include") if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o3 -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
message("flags" ${CMAKE_CXX_FLAGS})
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include") include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include") include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include") include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
@ -43,10 +88,12 @@ include_directories("${PADDLE_LIB}/third_party/eigen3")
include_directories("${CMAKE_SOURCE_DIR}/") include_directories("${CMAKE_SOURCE_DIR}/")
if (USE_TENSORRT AND WITH_GPU) if (NOT WIN32)
include_directories("${TENSORRT_ROOT}/include") if (WITH_TENSORRT AND WITH_GPU)
link_directories("${TENSORRT_ROOT}/lib") include_directories("${TENSORRT_DIR}/include")
endif() link_directories("${TENSORRT_DIR}/lib")
endif()
endif(NOT WIN32)
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib") link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
@ -57,17 +104,24 @@ link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib") link_directories("${PADDLE_LIB}/paddle/lib")
AUX_SOURCE_DIRECTORY(./src SRCS)
add_executable(${DEMO_NAME} ${SRCS})
if(WITH_MKL) if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include") include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} if (WIN32)
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX}) set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.lib
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
execute_process(COMMAND cp -r ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX} /usr/lib)
endif ()
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn") set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH}) if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include") include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0) if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif() endif()
else() else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX}) set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
@ -82,24 +136,66 @@ else()
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX}) ${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif() endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread -lm") if (NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
)
if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags_static libprotobuf xxhash)
set(DEPS ${DEPS} libcmt shlwapi)
if (EXISTS "${PADDLE_LIB}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
if(EXISTS "${PADDLE_LIB}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
endif(NOT WIN32)
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB} ${OpenCV_LIBS})
if(WITH_GPU) if(WITH_GPU)
if (USE_TENSORRT) if(NOT WIN32)
set(DEPS ${DEPS} if (WITH_TENSORRT)
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}) set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} set(DEPS ${DEPS} ${TENSORRT_DIR}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}) endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif() endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif() endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-ldl -lrt -lgomp -lz -lm -lpthread")
set(DEPS ${DEPS} ${EXTERNAL_LIB})
endif()
set(DEPS ${DEPS} ${OpenCV_LIBS})
AUX_SOURCE_DIRECTORY(./src SRCS)
add_executable(${DEMO_NAME} ${SRCS})
target_link_libraries(${DEMO_NAME} ${DEPS}) target_link_libraries(${DEMO_NAME} ${DEPS})
if (WIN32 AND WITH_MKL)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_LIB}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
)
endif()

View File

@ -0,0 +1,95 @@
# Visual Studio 2019 Community CMake 编译指南
PaddleOCR在Windows 平台下基于`Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目,但是直到`2019`才提供了稳定和完全的支持所以如果你想使用CMake管理项目编译构建我们推荐你使用`Visual Studio 2019`环境下构建。
## 前置条件
* Visual Studio 2019
* CUDA 9.0 / CUDA 10.0cudnn 7+ 仅在使用GPU版本的预测库时需要
* CMake 3.0+
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
**下面所有示例以工作目录为 `D:\projects`演示**。
### Step1: 下载PaddlePaddle C++ 预测库 fluid_inference
PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html)
解压后`D:\projects\fluid_inference`目录包含内容为:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
### Step2: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件将OpenCV解压至指定目录如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path如没有自行创建并双击编辑
- 新建将opencv路径填入并保存如`D:\projects\opencv\build\x64\vc14\bin`
### Step3: 使用Visual Studio 2019直接编译CMake
1. 打开Visual Studio 2019 Community点击`继续但无需代码`
![step2](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
2. 点击: `文件`->`打开`->`CMake`
![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
选择项目代码所在路径,并打开`CMakeList.txt`
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
3. 点击:`项目`->`cpp_inference_demo的CMake设置`
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
4. 点击`浏览`,分别设置编译选项指定`CUDA`、`CUDNN_LIB`、`OpenCV`、`Paddle预测库`的路径
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐**使用9.0、10.0版本不使用9.2、10.1等版本CUDA库**
| 参数名 | 含义 |
| ---- | ---- |
| *CUDA_LIB | CUDA的库路径 |
| *CUDNN_LIB | CUDNN的库路径 |
| OPENCV_DIR | OpenCV的安装路径 |
| PADDLE_LIB | Paddle预测库的路径 |
**注意:**
1. 使用`CPU`版预测库,请把`WITH_GPU`的勾去掉
2. 如果使用的是`openblas`版本,请把`WITH_MKL`勾去掉
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`。
5. 点击`生成`->`全部生成`
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
### Step4: 预测及可视化
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
```
cd D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release
```
可执行文件`ocr_system.exe`即为样例的预测程序,其主要使用方法如下
```shell
#预测图片 `D:\projects\PaddleOCR\doc\imgs\10.jpg`
.\ocr_system.exe D:\projects\PaddleOCR\deploy\cpp_infer\tools\config.txt D:\projects\PaddleOCR\doc\imgs\10.jpg
```
第一个参数为配置文件路径,第二个参数为需要预测的图片路径。
### 注意
* 在Windows下的终端中执行文件exe时可能会发生乱码的现象此时需要在终端中输入`CHCP 65001`将终端的编码方式由GBK编码(默认)改为UTF-8编码更加具体的解释可以参考这篇博客[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)。

View File

@ -41,13 +41,15 @@ public:
this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"])); this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"]));
this->use_zero_copy_run = bool(stoi(config_map_["use_zero_copy_run"]));
this->max_side_len = stoi(config_map_["max_side_len"]); this->max_side_len = stoi(config_map_["max_side_len"]);
this->det_db_thresh = stod(config_map_["det_db_thresh"]); this->det_db_thresh = stod(config_map_["det_db_thresh"]);
this->det_db_box_thresh = stod(config_map_["det_db_box_thresh"]); this->det_db_box_thresh = stod(config_map_["det_db_box_thresh"]);
this->det_db_box_thresh = stod(config_map_["det_db_box_thresh"]); this->det_db_unclip_ratio = stod(config_map_["det_db_unclip_ratio"]);
this->det_model_dir.assign(config_map_["det_model_dir"]); this->det_model_dir.assign(config_map_["det_model_dir"]);
@ -55,6 +57,12 @@ public:
this->char_list_file.assign(config_map_["char_list_file"]); this->char_list_file.assign(config_map_["char_list_file"]);
this->use_angle_cls = bool(stoi(config_map_["use_angle_cls"]));
this->cls_model_dir.assign(config_map_["cls_model_dir"]);
this->cls_thresh = stod(config_map_["cls_thresh"]);
this->visualize = bool(stoi(config_map_["visualize"])); this->visualize = bool(stoi(config_map_["visualize"]));
} }
@ -68,6 +76,8 @@ public:
bool use_mkldnn = false; bool use_mkldnn = false;
bool use_zero_copy_run = false;
int max_side_len = 960; int max_side_len = 960;
double det_db_thresh = 0.3; double det_db_thresh = 0.3;
@ -80,8 +90,14 @@ public:
std::string rec_model_dir; std::string rec_model_dir;
bool use_angle_cls;
std::string char_list_file; std::string char_list_file;
std::string cls_model_dir;
double cls_thresh;
bool visualize = true; bool visualize = true;
void PrintConfigInfo(); void PrintConfigInfo();

View File

@ -0,0 +1,81 @@
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include "paddle_api.h"
#include "paddle_inference_api.h"
#include <chrono>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <numeric>
#include <include/preprocess_op.h>
#include <include/utility.h>
namespace PaddleOCR {
class Classifier {
public:
explicit Classifier(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const bool &use_zero_copy_run,
const double &cls_thresh) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->cls_thresh = cls_thresh;
LoadModel(model_dir);
}
// Load Paddle inference model
void LoadModel(const std::string &model_dir);
cv::Mat Run(cv::Mat &img);
private:
std::shared_ptr<PaddlePredictor> predictor_;
bool use_gpu_ = false;
int gpu_id_ = 0;
int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
double cls_thresh = 0.5;
std::vector<float> mean_ = {0.5f, 0.5f, 0.5f};
std::vector<float> scale_ = {1 / 0.5f, 1 / 0.5f, 1 / 0.5f};
bool is_scale_ = true;
// pre-process
ClsResizeImg resize_op_;
Normalize normalize_op_;
Permute permute_op_;
}; // class Classifier
} // namespace PaddleOCR

View File

@ -39,8 +39,8 @@ public:
explicit DBDetector(const std::string &model_dir, const bool &use_gpu, explicit DBDetector(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem, const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads, const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const int &max_side_len, const bool &use_mkldnn, const bool &use_zero_copy_run,
const double &det_db_thresh, const int &max_side_len, const double &det_db_thresh,
const double &det_db_box_thresh, const double &det_db_box_thresh,
const double &det_db_unclip_ratio, const double &det_db_unclip_ratio,
const bool &visualize) { const bool &visualize) {
@ -49,6 +49,7 @@ public:
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn; this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->max_side_len_ = max_side_len; this->max_side_len_ = max_side_len;
@ -75,6 +76,7 @@ private:
int gpu_mem_ = 4000; int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4; int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false; bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
int max_side_len_ = 960; int max_side_len_ = 960;

View File

@ -27,6 +27,7 @@
#include <fstream> #include <fstream>
#include <numeric> #include <numeric>
#include <include/ocr_cls.h>
#include <include/postprocess_op.h> #include <include/postprocess_op.h>
#include <include/preprocess_op.h> #include <include/preprocess_op.h>
#include <include/utility.h> #include <include/utility.h>
@ -38,14 +39,17 @@ public:
explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu, explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem, const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads, const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const string &label_path) { const bool &use_mkldnn, const bool &use_zero_copy_run,
const string &label_path) {
this->use_gpu_ = use_gpu; this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id; this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn; this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->label_list_ = Utility::ReadDict(label_path); this->label_list_ = Utility::ReadDict(label_path);
this->label_list_.push_back(" ");
LoadModel(model_dir); LoadModel(model_dir);
} }
@ -53,7 +57,8 @@ public:
// Load Paddle inference model // Load Paddle inference model
void LoadModel(const std::string &model_dir); void LoadModel(const std::string &model_dir);
void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img); void Run(std::vector<std::vector<std::vector<int>>> boxes, cv::Mat &img,
Classifier *cls);
private: private:
std::shared_ptr<PaddlePredictor> predictor_; std::shared_ptr<PaddlePredictor> predictor_;
@ -63,6 +68,7 @@ private:
int gpu_mem_ = 4000; int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4; int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false; bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
std::vector<std::string> label_list_; std::vector<std::string> label_list_;
@ -83,4 +89,4 @@ private:
}; // class CrnnRecognizer }; // class CrnnRecognizer
} // namespace PaddleOCR } // namespace PaddleOCR

View File

@ -56,4 +56,10 @@ public:
const std::vector<int> &rec_image_shape = {3, 32, 320}); const std::vector<int> &rec_image_shape = {3, 32, 320});
}; };
class ClsResizeImg {
public:
virtual void Run(const cv::Mat &img, cv::Mat &resize_img,
const std::vector<int> &rec_image_shape = {3, 32, 320});
};
} // namespace PaddleOCR } // namespace PaddleOCR

View File

@ -7,6 +7,9 @@
### 运行准备 ### 运行准备
- Linux环境推荐使用docker。 - Linux环境推荐使用docker。
- Windows环境目前支持基于`Visual Studio 2019 Community`进行编译。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
### 1.1 编译opencv库 ### 1.1 编译opencv库
@ -184,12 +187,15 @@ make -j
### 运行demo ### 运行demo
* 执行以下命令完成对一幅图像的OCR识别与检测,最终输出 * 执行以下命令完成对一幅图像的OCR识别与检测
```shell ```shell
sh tools/run.sh sh tools/run.sh
``` ```
* 若需要使用方向分类器,则需要将`tools/config.txt`中的`use_angle_cls`参数修改为1表示开启方向分类器的预测。
最终屏幕上会输出检测结果如下。 最终屏幕上会输出检测结果如下。
<div align="center"> <div align="center">

View File

@ -0,0 +1,215 @@
# Server-side C++ inference
In this tutorial, we will introduce the detailed steps of deploying PaddleOCR ultra-lightweight Chinese detection and recognition models on the server side.
## 1. Prepare the environment
### Environment
- Linux, docker is recommended.
### 1.1 Compile opencv
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
```
wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
tar -xf 3.4.7.tar.gz
```
Finally, you can see the folder of `opencv-3.4.7/` in the current directory.
* Compile opencv, the opencv source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the opencv source code path and compile it in the following way.
```shell
root_path=your_opencv_root_path
install_path=${root_path}/opencv3
rm -rf build
mkdir build
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_IPP=OFF \
-DBUILD_IPP_IW=OFF \
-DWITH_LAPACK=OFF \
-DWITH_EIGEN=OFF \
-DCMAKE_INSTALL_LIBDIR=lib64 \
-DWITH_ZLIB=ON \
-DBUILD_ZLIB=ON \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON \
-DWITH_PNG=ON \
-DBUILD_PNG=ON \
-DWITH_TIFF=ON \
-DBUILD_TIFF=ON
make -j
make install
```
Among them, `root_path` is the downloaded opencv source code path, and `install_path` is the installation path of opencv. After `make install` is completed, the opencv header file and library file will be generated in this folder for later OCR source code compilation.
The final file structure under the opencv installation path is as follows.
```
opencv3/
|-- bin
|-- include
|-- lib
|-- lib64
|-- share
```
### 1.2 Compile or download or the Paddle inference library
* There are 2 ways to obtain the Paddle inference library, described in detail below.
#### 1.2.1 Compile from the source code
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle github repository and compile the inference library from the source code.
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from github, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
```shell
git clone https://github.com/PaddlePaddle/Paddle.git
```
* After entering the Paddle directory, the compilation method is as follows.
```shell
rm -rf build
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_INFERENCE_API_TEST=OFF \
-DON_INFER=ON \
-DWITH_PYTHON=ON
make -j
make inference_lib_dist
```
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
* After the compilation process, you can see the following files in the folder of `build/fluid_inference_install_dir/`.
```
build/fluid_inference_install_dir/
|-- CMakeCache.txt
|-- paddle
|-- third_party
|-- version.txt
```
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
#### 1.2.2 Direct download and installation
* Different cuda versions of the Linux inference library (based on GCC 4.8.2) are provided on the
[Paddle inference library official website](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html). You can view and select the appropriate version of the inference library on the official website.
* After downloading, use the following method to uncompress.
```
tar -xf fluid_inference.tgz
```
Finally you can see the following files in the folder of `fluid_inference/`.
## 2. Compile and run the demo
### 2.1 Export the inference model
* You can refer to [Model inference](../../doc/doc_ch/inference.md)export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
```
inference/
|-- det_db
| |--model
| |--params
|-- rec_rcnn
| |--model
| |--params
```
### 2.2 Compile PaddleOCR C++ inference demo
* The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines.
```shell
sh tools/build.sh
```
Specifically, the content in `tools/build.sh` is as follows.
```shell
OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=your_cudnn_lib_dir
BUILD_DIR=build
rm -rf ${BUILD_DIR}
mkdir ${BUILD_DIR}
cd ${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \
-DDEMO_NAME=ocr_system \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=OFF \
-DOPENCV_DIR=${OPENCV_DIR} \
-DCUDNN_LIB=${CUDNN_LIB_DIR} \
-DCUDA_LIB=${CUDA_LIB_DIR} \
make -j
```
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`fluid_inference` folder) or the generated Paddle inference library path (`build/fluid_inference_install_dir` folder); `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
### Run the demo
* Execute the following command to complete the OCR recognition and detection of an image.
```shell
sh tools/run.sh
```
* If you want to orientation classifier to correct the detected boxes, you can set `use_angle_cls` in the file `tools/config.txt` as 1 to enable the function.
The detection results will be shown on the screen, which is as follows.
<div align="center">
<img src="../imgs/cpp_infer_pred_12.png" width="600">
</div>
### 2.3 Note
* `MKLDNN` is disabled by default for C++ inference (`use_mkldnn` in `tools/config.txt` is set to 0), if you need to use MKLDNN for inference acceleration, you need to modify `use_mkldnn` to 1, and use the latest version of the Paddle source code to compile the inference library. When using MKLDNN for CPU prediction, if multiple images are predicted at the same time, there will be a memory leak problem (the problem is not present if MKLDNN is disabled). The problem is currently being fixed, and the temporary solution is: when predicting multiple pictures, Re-initialize the recognition (`CRNNRecognizer`) and detection class (`DBDetector`) every 30 pictures or so.

View File

@ -44,7 +44,7 @@ Config::LoadConfig(const std::string &config_path) {
std::map<std::string, std::string> dict; std::map<std::string, std::string> dict;
for (int i = 0; i < config.size(); i++) { for (int i = 0; i < config.size(); i++) {
// pass for empty line or comment // pass for empty line or comment
if (config[i].size() <= 1 or config[i][0] == '#') { if (config[i].size() <= 1 || config[i][0] == '#') {
continue; continue;
} }
std::vector<std::string> res = split(config[i], " "); std::vector<std::string> res = split(config[i], " ");

View File

@ -48,20 +48,30 @@ int main(int argc, char **argv) {
cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR); cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR);
DBDetector det(config.det_model_dir, config.use_gpu, config.gpu_id, DBDetector det(
config.gpu_mem, config.cpu_math_library_num_threads, config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem,
config.use_mkldnn, config.max_side_len, config.det_db_thresh, config.cpu_math_library_num_threads, config.use_mkldnn,
config.det_db_box_thresh, config.det_db_unclip_ratio, config.use_zero_copy_run, config.max_side_len, config.det_db_thresh,
config.visualize); config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize);
Classifier *cls = nullptr;
if (config.use_angle_cls == true) {
cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.use_zero_copy_run,
config.cls_thresh);
}
CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id, CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads, config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.char_list_file); config.use_mkldnn, config.use_zero_copy_run,
config.char_list_file);
auto start = std::chrono::system_clock::now(); auto start = std::chrono::system_clock::now();
std::vector<std::vector<std::vector<int>>> boxes; std::vector<std::vector<std::vector<int>>> boxes;
det.Run(srcimg, boxes); det.Run(srcimg, boxes);
rec.Run(boxes, srcimg); rec.Run(boxes, srcimg, cls);
auto end = std::chrono::system_clock::now(); auto end = std::chrono::system_clock::now();
auto duration = auto duration =

View File

@ -0,0 +1,110 @@
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <include/ocr_cls.h>
namespace PaddleOCR {
cv::Mat Classifier::Run(cv::Mat &img) {
cv::Mat src_img;
img.copyTo(src_img);
cv::Mat resize_img;
std::vector<int> rec_image_shape = {3, 32, 100};
int index = 0;
float wh_ratio = float(img.cols) / float(img.rows);
this->resize_op_.Run(img, resize_img, rec_image_shape);
this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
this->is_scale_);
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
this->permute_op_.Run(&resize_img, input.data());
// Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> softmax_out;
std::vector<int64_t> label_out;
auto output_names = this->predictor_->GetOutputNames();
auto softmax_out_t = this->predictor_->GetOutputTensor(output_names[0]);
auto label_out_t = this->predictor_->GetOutputTensor(output_names[1]);
auto softmax_shape_out = softmax_out_t->shape();
auto label_shape_out = label_out_t->shape();
int softmax_out_num =
std::accumulate(softmax_shape_out.begin(), softmax_shape_out.end(), 1,
std::multiplies<int>());
int label_out_num =
std::accumulate(label_shape_out.begin(), label_shape_out.end(), 1,
std::multiplies<int>());
softmax_out.resize(softmax_out_num);
label_out.resize(label_out_num);
softmax_out_t->copy_to_cpu(softmax_out.data());
label_out_t->copy_to_cpu(label_out.data());
int label = label_out[0];
float score = softmax_out[label];
// std::cout << "\nlabel "<<label<<" score: "<<score;
if (label % 2 == 1 && score > this->cls_thresh) {
cv::rotate(src_img, src_img, 1);
}
return src_img;
}
void Classifier::LoadModel(const std::string &model_dir) {
AnalysisConfig config;
config.SetModel(model_dir + "/model", model_dir + "/params");
if (this->use_gpu_) {
config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
// false for zero copy tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input
config.SwitchSpecifyInputNames(true);
config.SwitchIrOptim(true);
config.EnableMemoryOptim();
config.DisableGlogInfo();
this->predictor_ = CreatePaddlePredictor(config);
}
} // namespace PaddleOCR

View File

@ -26,12 +26,15 @@ void DBDetector::LoadModel(const std::string &model_dir) {
config.DisableGpu(); config.DisableGpu();
if (this->use_mkldnn_) { if (this->use_mkldnn_) {
config.EnableMKLDNN(); config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
} }
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
} }
// false for zero copy tensor // false for zero copy tensor
config.SwitchUseFeedFetchOps(false); // true for commom tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input // true for multiple input
config.SwitchSpecifyInputNames(true); config.SwitchSpecifyInputNames(true);
@ -59,12 +62,22 @@ void DBDetector::Run(cv::Mat &img,
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f); std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
this->permute_op_.Run(&resize_img, input.data()); this->permute_op_.Run(&resize_img, input.data());
auto input_names = this->predictor_->GetInputNames(); // Inference.
auto input_t = this->predictor_->GetInputTensor(input_names[0]); if (this->use_zero_copy_run_) {
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); auto input_names = this->predictor_->GetInputNames();
input_t->copy_from_cpu(input.data()); auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
this->predictor_->ZeroCopyRun(); input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> out_data; std::vector<float> out_data;
auto output_names = this->predictor_->GetOutputNames(); auto output_names = this->predictor_->GetOutputNames();
@ -95,9 +108,11 @@ void DBDetector::Run(cv::Mat &img,
const double maxvalue = 255; const double maxvalue = 255;
cv::Mat bit_map; cv::Mat bit_map;
cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY); cv::threshold(cbuf_map, bit_map, threshold, maxvalue, cv::THRESH_BINARY);
cv::Mat dilation_map;
cv::Mat dila_ele = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(2,2));
cv::dilate(bit_map, dilation_map, dila_ele);
boxes = post_processor_.BoxesFromBitmap( boxes = post_processor_.BoxesFromBitmap(
pred_map, bit_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_); pred_map, dilation_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_);
boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg); boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg);

View File

@ -17,7 +17,7 @@
namespace PaddleOCR { namespace PaddleOCR {
void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes, void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
cv::Mat &img) { cv::Mat &img, Classifier *cls) {
cv::Mat srcimg; cv::Mat srcimg;
img.copyTo(srcimg); img.copyTo(srcimg);
cv::Mat crop_img; cv::Mat crop_img;
@ -27,6 +27,9 @@ void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
int index = 0; int index = 0;
for (int i = boxes.size() - 1; i >= 0; i--) { for (int i = boxes.size() - 1; i >= 0; i--) {
crop_img = GetRotateCropImage(srcimg, boxes[i]); crop_img = GetRotateCropImage(srcimg, boxes[i]);
if (cls != nullptr) {
crop_img = cls->Run(crop_img);
}
float wh_ratio = float(crop_img.cols) / float(crop_img.rows); float wh_ratio = float(crop_img.cols) / float(crop_img.rows);
@ -39,18 +42,29 @@ void CRNNRecognizer::Run(std::vector<std::vector<std::vector<int>>> boxes,
this->permute_op_.Run(&resize_img, input.data()); this->permute_op_.Run(&resize_img, input.data());
auto input_names = this->predictor_->GetInputNames(); // Inference.
auto input_t = this->predictor_->GetInputTensor(input_names[0]); if (this->use_zero_copy_run_) {
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); auto input_names = this->predictor_->GetInputNames();
input_t->copy_from_cpu(input.data()); auto input_t = this->predictor_->GetInputTensor(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
this->predictor_->ZeroCopyRun(); input_t->copy_from_cpu(input.data());
this->predictor_->ZeroCopyRun();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<int64_t> rec_idx; std::vector<int64_t> rec_idx;
auto output_names = this->predictor_->GetOutputNames(); auto output_names = this->predictor_->GetOutputNames();
auto output_t = this->predictor_->GetOutputTensor(output_names[0]); auto output_t = this->predictor_->GetOutputTensor(output_names[0]);
auto rec_idx_lod = output_t->lod(); auto rec_idx_lod = output_t->lod();
auto shape_out = output_t->shape(); auto shape_out = output_t->shape();
int out_num = std::accumulate(shape_out.begin(), shape_out.end(), 1, int out_num = std::accumulate(shape_out.begin(), shape_out.end(), 1,
std::multiplies<int>()); std::multiplies<int>());
@ -115,12 +129,15 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
config.DisableGpu(); config.DisableGpu();
if (this->use_mkldnn_) { if (this->use_mkldnn_) {
config.EnableMKLDNN(); config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
} }
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
} }
// false for zero copy tensor // false for zero copy tensor
config.SwitchUseFeedFetchOps(false); // true for commom tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input // true for multiple input
config.SwitchSpecifyInputNames(true); config.SwitchSpecifyInputNames(true);

View File

@ -219,7 +219,7 @@ PostProcessor::BoxesFromBitmap(const cv::Mat pred, const cv::Mat bitmap,
std::vector<std::vector<std::vector<int>>> boxes; std::vector<std::vector<std::vector<int>>> boxes;
for (int _i = 0; _i < num_contours; _i++) { for (int _i = 0; _i < num_contours; _i++) {
if (contours[_i].size() <= 0) { if (contours[_i].size() <= 2) {
continue; continue;
} }
float ssid; float ssid;
@ -294,7 +294,7 @@ PostProcessor::FilterTagDetRes(std::vector<std::vector<std::vector<int>>> boxes,
pow(boxes[n][0][1] - boxes[n][1][1], 2))); pow(boxes[n][0][1] - boxes[n][1][1], 2)));
rect_height = int(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) + rect_height = int(sqrt(pow(boxes[n][0][0] - boxes[n][3][0], 2) +
pow(boxes[n][0][1] - boxes[n][3][1], 2))); pow(boxes[n][0][1] - boxes[n][3][1], 2)));
if (rect_width <= 10 || rect_height <= 10) if (rect_width <= 4 || rect_height <= 4)
continue; continue;
root_points.push_back(boxes[n]); root_points.push_back(boxes[n]);
} }

View File

@ -116,4 +116,26 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio,
cv::INTER_LINEAR); cv::INTER_LINEAR);
} }
void ClsResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
const std::vector<int> &rec_image_shape) {
int imgC, imgH, imgW;
imgC = rec_image_shape[0];
imgH = rec_image_shape[1];
imgW = rec_image_shape[2];
float ratio = float(img.cols) / float(img.rows);
int resize_w, resize_h;
if (ceilf(imgH * ratio) > imgW)
resize_w = imgW;
else
resize_w = int(ceilf(imgH * ratio));
cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
cv::INTER_LINEAR);
if (resize_w < imgW) {
cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0, imgW - resize_w,
cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
}
}
} // namespace PaddleOCR } // namespace PaddleOCR

View File

@ -39,22 +39,21 @@ std::vector<std::string> Utility::ReadDict(const std::string &path) {
void Utility::VisualizeBboxes( void Utility::VisualizeBboxes(
const cv::Mat &srcimg, const cv::Mat &srcimg,
const std::vector<std::vector<std::vector<int>>> &boxes) { const std::vector<std::vector<std::vector<int>>> &boxes) {
cv::Point rook_points[boxes.size()][4];
for (int n = 0; n < boxes.size(); n++) {
for (int m = 0; m < boxes[0].size(); m++) {
rook_points[n][m] = cv::Point(int(boxes[n][m][0]), int(boxes[n][m][1]));
}
}
cv::Mat img_vis; cv::Mat img_vis;
srcimg.copyTo(img_vis); srcimg.copyTo(img_vis);
for (int n = 0; n < boxes.size(); n++) { for (int n = 0; n < boxes.size(); n++) {
const cv::Point *ppt[1] = {rook_points[n]}; cv::Point rook_points[4];
for (int m = 0; m < boxes[n].size(); m++) {
rook_points[m] = cv::Point(int(boxes[n][m][0]), int(boxes[n][m][1]));
}
const cv::Point *ppt[1] = {rook_points};
int npt[] = {4}; int npt[] = {4};
cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0); cv::polylines(img_vis, ppt, npt, 1, 1, CV_RGB(0, 255, 0), 2, 8, 0);
} }
cv::imwrite("./ocr_vis.png", img_vis); cv::imwrite("./ocr_vis.png", img_vis);
std::cout << "The detection visualized image saved in ./ocr_vis.png.pn" std::cout << "The detection visualized image saved in ./ocr_vis.png"
<< std::endl; << std::endl;
} }

View File

@ -1,8 +1,7 @@
OPENCV_DIR=your_opencv_dir OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=/your_cudnn_lib_dir CUDNN_LIB_DIR=your_cudnn_lib_dir
BUILD_DIR=build BUILD_DIR=build
rm -rf ${BUILD_DIR} rm -rf ${BUILD_DIR}
@ -11,7 +10,6 @@ cd ${BUILD_DIR}
cmake .. \ cmake .. \
-DPADDLE_LIB=${LIB_DIR} \ -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \ -DWITH_MKL=ON \
-DDEMO_NAME=ocr_system \
-DWITH_GPU=OFF \ -DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \ -DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=OFF \ -DUSE_TENSORRT=OFF \

View File

@ -3,20 +3,25 @@ use_gpu 0
gpu_id 0 gpu_id 0
gpu_mem 4000 gpu_mem 4000
cpu_math_library_num_threads 10 cpu_math_library_num_threads 10
use_mkldnn 0 use_mkldnn 1
use_zero_copy_run 1
# det config # det config
max_side_len 960 max_side_len 960
det_db_thresh 0.3 det_db_thresh 0.3
det_db_box_thresh 0.5 det_db_box_thresh 0.5
det_db_unclip_ratio 2.0 det_db_unclip_ratio 1.6
det_model_dir ./inference/det_db det_model_dir ./inference/det_db
# cls config
use_angle_cls 0
cls_model_dir ./inference/cls
cls_thresh 0.9
# rec config # rec config
rec_model_dir ./inference/rec_crnn rec_model_dir ./inference/rec_crnn
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt char_list_file ../../ppocr/utils/ppocr_keys_v1.txt
img_path ../../doc/imgs/11.jpg
# show the detection results # show the detection results
visualize 0 visualize 1

View File

@ -0,0 +1,58 @@
English | [简体中文](README_cn.md)
## Introduction
Many user hopes package the PaddleOCR service into an docker image, so that it can be quickly released and used in the docker or k8s environment.
This page provide some standardized code to achieve this goal. You can quickly publish the PaddleOCR project into a callable Restful API service through the following steps. (At present, the deployment based on the HubServing mode is implemented first, and author plans to increase the deployment of the PaddleServing mode in the futrue)
## 1. Prerequisites
You need to install the following basic components first
a. Docker
b. Graphics driver and CUDA 10.0+GPU
c. NVIDIA Container ToolkitGPUDocker 19.03+ can skip this
d. cuDNN 7.6+GPU
## 2. Build Image
a. Download PaddleOCR sourcecode
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
```
b. Goto Dockerfile directorypsNeed to distinguish between cpu and gpu version, the following takes cpu as an example, gpu version needs to replace the keyword
```
cd deploy/docker/cpu
```
c. Build image
```
docker build -t paddleocr:cpu .
```
## 3. Start container
a. CPU version
```
sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
```
b. GPU version (base on NVIDIA Container Toolkit)
```
sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
```
c. GPU version (Docker 19.03++)
```
sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
```
d. Check service statusIf you can see the following statement then it means completedSuccessfully installed ocr_system && Running on http://0.0.0.0:8866/
```
docker logs -f paddle_ocr
```
## 4. Test
a. Calculate the Base64 encoding of the picture to be recognized (if you just test, you can use a free online tool, likehttps://freeonlinetools24.com/base64-image/
b. Post a service requestsample request in sample_request.txt
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"Input image Base64 encode(need to delete the code 'data:image/jpg;base64,'\"]}" http://localhost:8866/predict/ocr_system
```
c. Get resposneIf the call is successful, the following result will be returned
```
{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
```

View File

@ -0,0 +1,57 @@
[English](README.md) | 简体中文
## Docker化部署服务
在日常项目应用中相信大家一般都会希望能通过Docker技术把PaddleOCR服务打包成一个镜像以便在Docker或k8s环境里快速发布上线使用。
本文将提供一些标准化的代码来实现这样的目标。大家通过如下步骤可以把PaddleOCR项目快速发布成可调用的Restful API服务。目前暂时先实现了基于HubServing模式的部署后续作者计划增加PaddleServing模式的部署
## 1.实施前提准备
需要先完成如下基本组件的安装:
a. Docker环境
b. 显卡驱动和CUDA 10.0+GPU
c. NVIDIA Container ToolkitGPUDocker 19.03以上版本可以跳过此步)
d. cuDNN 7.6+GPU
## 2.制作镜像
a.下载PaddleOCR项目代码
```
git clone https://github.com/PaddlePaddle/PaddleOCR.git
```
b.切换至Dockerfile目录需要区分cpu或gpu版本下文以cpu为例gpu版本需要替换一下关键字即可
```
cd deploy/docker/cpu
```
c.生成镜像
```
docker build -t paddleocr:cpu .
```
## 3.启动Docker容器
a. CPU 版本
```
sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
```
b. GPU 版本 (通过NVIDIA Container Toolkit)
```
sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
```
c. GPU 版本 (Docker 19.03以上版本,可以直接用如下命令)
```
sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
```
d. 检查服务运行情况出现Successfully installed ocr_system和Running on http://0.0.0.0:8866/等信息,表示运行成功)
```
docker logs -f paddle_ocr
```
## 4.测试服务
a. 计算待识别图片的Base64编码如果只是测试一下效果可以通过免费的在线工具实现http://tool.chinaz.com/tools/imgtobase/
b. 发送服务请求可参见sample_request.txt中的值
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,'\"]}" http://localhost:8866/predict/ocr_system
```
c. 返回结果(如果调用成功,会返回如下结果)
```
{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
```

View File

@ -0,0 +1,28 @@
# Version: 1.0.0
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
# PaddleOCR base on Python3.7
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3.7 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
WORKDIR /PaddleOCR
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN mkdir -p /PaddleOCR/inference
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
EXPOSE 8866
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]

View File

@ -0,0 +1,28 @@
# Version: 1.0.0
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
# PaddleOCR base on Python3.7
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3.7 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
WORKDIR /home/PaddleOCR
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN mkdir -p /PaddleOCR/inference
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
EXPOSE 8866
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]

File diff suppressed because one or more lines are too long

View File

@ -31,7 +31,7 @@ from tools.infer.predict_det import TextDetector
author_email="paddle-dev@baidu.com", author_email="paddle-dev@baidu.com",
type="cv/text_recognition") type="cv/text_recognition")
class OCRDet(hub.Module): class OCRDet(hub.Module):
def _initialize(self, use_gpu=False): def _initialize(self, use_gpu=False, enable_mkldnn=False):
""" """
initialize with the necessary elements initialize with the necessary elements
""" """
@ -51,6 +51,7 @@ class OCRDet(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
) )
cfg.ir_optim = True cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_detector = TextDetector(cfg) self.text_detector = TextDetector(cfg)

View File

@ -13,7 +13,7 @@ def read_params():
#params for text detector #params for text detector
cfg.det_algorithm = "DB" cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_det_mv3_db/" cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
cfg.det_max_side_len = 960 cfg.det_max_side_len = 960
#DB parmas #DB parmas
@ -36,4 +36,6 @@ def read_params():
# cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" # cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
# cfg.use_space_char = True # cfg.use_space_char = True
return cfg cfg.use_zero_copy_run = False
return cfg

View File

@ -31,7 +31,7 @@ from tools.infer.predict_rec import TextRecognizer
author_email="paddle-dev@baidu.com", author_email="paddle-dev@baidu.com",
type="cv/text_recognition") type="cv/text_recognition")
class OCRRec(hub.Module): class OCRRec(hub.Module):
def _initialize(self, use_gpu=False): def _initialize(self, use_gpu=False, enable_mkldnn=False):
""" """
initialize with the necessary elements initialize with the necessary elements
""" """
@ -51,6 +51,7 @@ class OCRRec(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
) )
cfg.ir_optim = True cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_recognizer = TextRecognizer(cfg) self.text_recognizer = TextRecognizer(cfg)

View File

@ -28,12 +28,24 @@ def read_params():
#params for text recognizer #params for text recognizer
cfg.rec_algorithm = "CRNN" cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/" cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
cfg.rec_image_shape = "3, 32, 320" cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch' cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30 cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
cfg.use_space_char = True cfg.use_space_char = True
return cfg #params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False
return cfg

View File

@ -31,7 +31,7 @@ from tools.infer.predict_system import TextSystem
author_email="paddle-dev@baidu.com", author_email="paddle-dev@baidu.com",
type="cv/text_recognition") type="cv/text_recognition")
class OCRSystem(hub.Module): class OCRSystem(hub.Module):
def _initialize(self, use_gpu=False): def _initialize(self, use_gpu=False, enable_mkldnn=False):
""" """
initialize with the necessary elements initialize with the necessary elements
""" """
@ -51,7 +51,8 @@ class OCRSystem(hub.Module):
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
) )
cfg.ir_optim = True cfg.ir_optim = True
cfg.enable_mkldnn = enable_mkldnn
self.text_sys = TextSystem(cfg) self.text_sys = TextSystem(cfg)
def read_images(self, paths=[]): def read_images(self, paths=[]):

View File

@ -10,10 +10,10 @@ class Config(object):
def read_params(): def read_params():
cfg = Config() cfg = Config()
#params for text detector #params for text detector
cfg.det_algorithm = "DB" cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_det_mv3_db/" cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
cfg.det_max_side_len = 960 cfg.det_max_side_len = 960
#DB parmas #DB parmas
@ -28,12 +28,24 @@ def read_params():
#params for text recognizer #params for text recognizer
cfg.rec_algorithm = "CRNN" cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_rec_mv3_crnn/" cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
cfg.rec_image_shape = "3, 32, 320" cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch' cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30 cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt" cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
cfg.use_space_char = True cfg.use_space_char = True
return cfg #params for text classifier
cfg.use_angle_cls = True
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
cfg.use_zero_copy_run = False
return cfg

View File

@ -1,10 +1,12 @@
# 服务部署 [English](readme_en.md) | 简体中文
PaddleOCR提供2种服务部署方式 PaddleOCR提供2种服务部署方式
- 基于HubServing的部署已集成到PaddleOCR中[code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)),按照本教程使用; - 基于PaddleHub Serving的部署代码路径为"`./deploy/hubserving`",按照本教程使用;
- 基于PaddleServing的部署详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)后续也将集成到PaddleOCR。 - 基于PaddleServing的部署代码路径为"`./deploy/pdserving`",使用方法参考[文档](../pdserving/readme.md)。
服务部署目录下包括检测、识别、2阶段串联三种服务包根据需求选择相应的服务包进行安装和启动。目录如下 # 基于PaddleHub Serving的服务部署
hubserving服务部署目录下包括检测、识别、2阶段串联三种服务包请根据需求选择相应的服务包进行安装和启动。目录结构如下
``` ```
deploy/hubserving/ deploy/hubserving/
└─ ocr_det 检测模块服务包 └─ ocr_det 检测模块服务包
@ -28,23 +30,51 @@ deploy/hubserving/ocr_system/
# 安装paddlehub # 安装paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# 设置环境变量 # 在Linux下设置环境变量
export PYTHONPATH=. export PYTHONPATH=.
```
### 2. 安装服务模块 # 或者在Windows下设置环境变量
PaddleOCR提供3种服务模块根据需要安装所需模块。如 SET PYTHONPATH=.
```
安装检测服务模块: ### 2. 下载推理模型
```hub install deploy/hubserving/ocr_det/``` 安装服务模块前需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认模型路径为:
```
检测模型:./inference/ch_ppocr_mobile_v1.1_det_infer/
识别模型:./inference/ch_ppocr_mobile_v1.1_rec_infer/
方向分类器:./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
或,安装识别服务模块: **模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。
```hub install deploy/hubserving/ocr_rec/```
或,安装检测+识别串联服务模块: ### 3. 安装服务模块
```hub install deploy/hubserving/ocr_system/``` PaddleOCR提供3种服务模块根据需要安装所需模块。
### 3. 启动服务 * 在Linux环境下安装示例如下
```shell
# 安装检测服务模块:
hub install deploy/hubserving/ocr_det/
# 或,安装识别服务模块:
hub install deploy/hubserving/ocr_rec/
# 或,安装检测+识别串联服务模块:
hub install deploy/hubserving/ocr_system/
```
* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下:
```shell
# 安装检测服务模块:
hub install deploy\hubserving\ocr_det\
# 或,安装识别服务模块:
hub install deploy\hubserving\ocr_rec\
# 或,安装检测+识别串联服务模块:
hub install deploy\hubserving\ocr_system\
```
### 4. 启动服务
#### 方式1. 命令行命令启动仅支持CPU #### 方式1. 命令行命令启动仅支持CPU
**启动命令:** **启动命令:**
```shell ```shell
@ -69,9 +99,9 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
#### 方式2. 配置文件启动支持CPU、GPU #### 方式2. 配置文件启动支持CPU、GPU
**启动命令:** **启动命令:**
```hub serving start --config/-c config.json``` ```hub serving start -c config.json```
其中,`config.json`格式如下: 其中,`config.json`格式如下:
```python ```python
{ {
"modules_info": { "modules_info": {
@ -96,6 +126,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
**注意:** **注意:**
- 使用配置文件启动服务时,其他参数会被忽略。 - 使用配置文件启动服务时,其他参数会被忽略。
- 如果使用GPU预测(即,`use_gpu`置为`true`)则需要在启动服务之前设置CUDA_VISIBLE_DEVICES环境变量```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。 - 如果使用GPU预测(即,`use_gpu`置为`true`)则需要在启动服务之前设置CUDA_VISIBLE_DEVICES环境变量```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。
- **`use_gpu`不可与`use_multiprocess`同时为`true`**。
使用GPU 3号卡启动串联服务 使用GPU 3号卡启动串联服务
```shell ```shell
@ -120,6 +151,25 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
访问示例: 访问示例:
```python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/``` ```python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/```
## 返回结果格式说明
返回结果为列表list列表中的每一项为词典dict词典一共可能包含3种字段信息如下
|字段名称|数据类型|意义|
|-|-|-|
|text|str|文本内容|
|confidence|float| 文本识别置信度|
|text_region|list|文本位置坐标|
不同模块返回的字段不同,如,文本识别服务模块返回结果不含`text_region`字段,具体信息如下:
|字段名/模块名|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。
## 自定义修改服务模块 ## 自定义修改服务模块
如果需要修改服务逻辑,你一般需要操作以下步骤(以修改`ocr_system`为例): 如果需要修改服务逻辑,你一般需要操作以下步骤(以修改`ocr_system`为例):
@ -127,7 +177,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
```hub serving stop --port/-p XXXX``` ```hub serving stop --port/-p XXXX```
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。 - 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`,当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。 例如,如果需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`det_model_dir`和`rec_model_dir`如果需要关闭文本方向分类器,则将参数`use_angle_cls`置为`False`当然,同时可能还需要修改其他相关参数,请根据实际情况修改调试。 **强烈建议修改后先直接运行`module.py`调试,能正确运行预测后再启动服务测试。**
- 3、 卸载旧服务包 - 3、 卸载旧服务包
```hub uninstall ocr_system``` ```hub uninstall ocr_system```
@ -137,4 +187,3 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
- 5、重新启动服务 - 5、重新启动服务
```hub serving start -m ocr_system``` ```hub serving start -m ocr_system```

View File

@ -0,0 +1,200 @@
English | [简体中文](readme.md)
PaddleOCR provides 2 service deployment methods:
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial.
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../pdserving/readme_en.md) for usage.
# Service deployment based on PaddleHub Serving
The hubserving service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:
```
deploy/hubserving/
└─ ocr_det detection module service package
└─ ocr_rec recognition module service package
└─ ocr_system two-stage series connection service package
```
Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows:
```
deploy/hubserving/ocr_system/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
```
## Quick start service
The following steps take the 2-stage series service as an example. If only the detection service or recognition service is needed, replace the corresponding file path.
### 1. Prepare the environment
```shell
# Install paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables on Linux
export PYTHONPATH=.
# Set environment variables on Windows
SET PYTHONPATH=.
```
### 2. Download inference model
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default model path is:
```
detection model: ./inference/ch_ppocr_mobile_v1.1_det_infer/
recognition model: ./inference/ch_ppocr_mobile_v1.1_rec_infer/
text direction classifier: ./inference/ch_ppocr_mobile_v1.1_cls_infer/
```
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
### 3. Install Service Module
PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs.
* On Linux platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy/hubserving/ocr_det/
# Or, install the recognition service module:
hub install deploy/hubserving/ocr_rec/
# Or, install the 2-stage series service module:
hub install deploy/hubserving/ocr_system/
```
* On Windows platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy\hubserving\ocr_det\
# Or, install the recognition service module:
hub install deploy\hubserving\ocr_rec\
# Or, install the 2-stage series service module:
hub install deploy\hubserving\ocr_system\
```
### 4. Start service
#### Way 1. Start with command line parameters (CPU only)
**start command**
```shell
$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
--port XXXX \
--use_multiprocess \
--workers \
```
**parameters**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start the 2-stage series service:
```shell
hub serving start -m ocr_system
```
This completes the deployment of a service API, using the default port number 8866.
#### Way 2. Start with configuration fileCPU、GPU
**start command**
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
```python
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true
},
"predict_args": {
}
}
},
"port": 8868,
"use_multiprocess": false,
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
For example, use GPU card No. 3 to start the 2-stage series service:
```shell
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/ocr_system/config.json
```
## Send prediction requests
After the service starts, you can use the following command to send a prediction request to obtain the prediction result:
```shell
python tools/test_hubserving.py server_url image_path
```
Two parameters need to be passed to the script:
- **server_url**service addressformat of which is
`http://[ip_address]:[port]/predict/[module_name]`
For example, if the detection, recognition and 2-stage serial services are started with provided configuration files, the respective `server_url` would be:
`http://127.0.0.1:8866/predict/ocr_det`
`http://127.0.0.1:8867/predict/ocr_rec`
`http://127.0.0.1:8868/predict/ocr_system`
- **image_path**Test image path, can be a single image path or an image directory path
**Eg.**
```shell
python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/
```
## Returned result format
The returned result is a list. Each item in the list is a dict. The dict may contain three fields. The information is as follows:
|field name|data type|description|
|-|-|-|
|text|str|text content|
|confidence|float|text recognition confidence|
|text_region|list|text location coordinates|
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
|field name/module name|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**Note** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
## User defined service module modification
If you need to modify the service logic, the following steps are generally required (take the modification of `ocr_system` for example):
- 1. Stop service
```shell
hub serving stop --port/-p XXXX
```
- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. If you want to turn off the text direction classifier, set the parameter `use_angle_cls` to `False`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test.
- 3. Uninstall old service module
```shell
hub uninstall ocr_system
```
- 4. Install modified service module
```shell
hub install deploy/hubserving/ocr_system/
```
- 5. Restart service
```shell
hub serving start -m ocr_system
```

View File

@ -0,0 +1,32 @@
#!/bin/bash
set -e
OCR_MODEL_URL="https://paddleocr.bj.bcebos.com/deploy/lite/ocr_v1_for_cpu.tar.gz"
PADDLE_LITE_LIB_URL="https://paddlelite-demo.bj.bcebos.com/libs/ios/paddle_lite_libs_v2_6_0.tar.gz"
OPENCV3_FRAMEWORK_URL="https://paddlelite-demo.bj.bcebos.com/libs/ios/opencv3.framework.tar.gz"
download_and_extract() {
local url="$1"
local dst_dir="$2"
local tempdir=$(mktemp -d)
echo "Downloading ${url} ..."
curl -L ${url} > ${tempdir}/temp.tar.gz
echo "Download ${url} done "
if [ ! -d ${dst_dir} ];then
mkdir -p ${dst_dir}
fi
echo "Extracting ..."
tar -zxvf ${tempdir}/temp.tar.gz -C ${dst_dir}
echo "Extract done "
rm -rf ${tempdir}
}
echo -e "[Download ios ocr demo denpendancy]\n"
download_and_extract "${OCR_MODEL_URL}" "./ocr_demo/models"
download_and_extract "${PADDLE_LITE_LIB_URL}" "./ocr_demo"
download_and_extract "${OPENCV3_FRAMEWORK_URL}" "./ocr_demo"
echo -e "[done]\n"

View File

@ -0,0 +1,462 @@
// !$*UTF8*$!
{
archiveVersion = 1;
classes = {
};
objectVersion = 50;
objects = {
/* Begin PBXBuildFile section */
A98EA8B624CD8E85B9EFADA1 /* OcrData.m in Sources */ = {isa = PBXBuildFile; fileRef = A98EAD74A71FE136D084F392 /* OcrData.m */; };
E0A53559219A832A005A6056 /* AppDelegate.m in Sources */ = {isa = PBXBuildFile; fileRef = E0A53558219A832A005A6056 /* AppDelegate.m */; };
E0A5355C219A832A005A6056 /* ViewController.mm in Sources */ = {isa = PBXBuildFile; fileRef = E0A5355B219A832A005A6056 /* ViewController.mm */; };
E0A5355F219A832A005A6056 /* Main.storyboard in Resources */ = {isa = PBXBuildFile; fileRef = E0A5355D219A832A005A6056 /* Main.storyboard */; };
E0A53561219A832E005A6056 /* Assets.xcassets in Resources */ = {isa = PBXBuildFile; fileRef = E0A53560219A832E005A6056 /* Assets.xcassets */; };
E0A53564219A832E005A6056 /* LaunchScreen.storyboard in Resources */ = {isa = PBXBuildFile; fileRef = E0A53562219A832E005A6056 /* LaunchScreen.storyboard */; };
E0A53567219A832E005A6056 /* main.m in Sources */ = {isa = PBXBuildFile; fileRef = E0A53566219A832E005A6056 /* main.m */; };
E0A53576219A89FF005A6056 /* lib in Resources */ = {isa = PBXBuildFile; fileRef = E0A53574219A89FF005A6056 /* lib */; };
E0A5357B219AA3D1005A6056 /* AVFoundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = E0A5357A219AA3D1005A6056 /* AVFoundation.framework */; };
E0A5357D219AA3DF005A6056 /* AssetsLibrary.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = E0A5357C219AA3DE005A6056 /* AssetsLibrary.framework */; };
E0A5357F219AA3E7005A6056 /* CoreMedia.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = E0A5357E219AA3E7005A6056 /* CoreMedia.framework */; };
ED37528F24B88737008DEBDA /* opencv2.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = ED37528E24B88737008DEBDA /* opencv2.framework */; };
ED3752A424B9AD5C008DEBDA /* ocr_clipper.cpp in Sources */ = {isa = PBXBuildFile; fileRef = ED37529E24B9AD5C008DEBDA /* ocr_clipper.cpp */; };
ED3752A524B9AD5C008DEBDA /* ocr_db_post_process.cpp in Sources */ = {isa = PBXBuildFile; fileRef = ED3752A024B9AD5C008DEBDA /* ocr_db_post_process.cpp */; };
ED3752A624B9AD5C008DEBDA /* ocr_crnn_process.cpp in Sources */ = {isa = PBXBuildFile; fileRef = ED3752A124B9AD5C008DEBDA /* ocr_crnn_process.cpp */; };
ED3752A924B9ADB4008DEBDA /* BoxLayer.m in Sources */ = {isa = PBXBuildFile; fileRef = ED3752A824B9ADB4008DEBDA /* BoxLayer.m */; };
ED3752AC24B9B118008DEBDA /* Helpers.m in Sources */ = {isa = PBXBuildFile; fileRef = ED3752AA24B9B118008DEBDA /* Helpers.m */; };
ED3752B324B9D4C6008DEBDA /* ch_det_mv3_db_opt.nb in Resources */ = {isa = PBXBuildFile; fileRef = ED3752B124B9D4C6008DEBDA /* ch_det_mv3_db_opt.nb */; };
ED3752B424B9D4C6008DEBDA /* ch_rec_mv3_crnn_opt.nb in Resources */ = {isa = PBXBuildFile; fileRef = ED3752B224B9D4C6008DEBDA /* ch_rec_mv3_crnn_opt.nb */; };
ED3752BC24B9DAD7008DEBDA /* ocr.png in Resources */ = {isa = PBXBuildFile; fileRef = ED3752BA24B9DAD6008DEBDA /* ocr.png */; };
ED3752BD24B9DAD7008DEBDA /* label_list.txt in Resources */ = {isa = PBXBuildFile; fileRef = ED3752BB24B9DAD7008DEBDA /* label_list.txt */; };
/* End PBXBuildFile section */
/* Begin PBXFileReference section */
A98EAD74A71FE136D084F392 /* OcrData.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = OcrData.m; sourceTree = "<group>"; };
A98EAD76E0669B2BFB628B48 /* OcrData.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = OcrData.h; sourceTree = "<group>"; };
E0846DF2230BC93900031405 /* timer.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = timer.h; sourceTree = "<group>"; };
E0A18EDD219C03000015DC15 /* face.jpg */ = {isa = PBXFileReference; lastKnownFileType = image.jpeg; path = face.jpg; sourceTree = "<group>"; };
E0A53554219A8329005A6056 /* paddle-lite-ocr.app */ = {isa = PBXFileReference; explicitFileType = wrapper.application; includeInIndex = 0; path = "paddle-lite-ocr.app"; sourceTree = BUILT_PRODUCTS_DIR; };
E0A53557219A832A005A6056 /* AppDelegate.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = AppDelegate.h; sourceTree = "<group>"; };
E0A53558219A832A005A6056 /* AppDelegate.m */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.objc; path = AppDelegate.m; sourceTree = "<group>"; };
E0A5355A219A832A005A6056 /* ViewController.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = ViewController.h; sourceTree = "<group>"; };
E0A5355B219A832A005A6056 /* ViewController.mm */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.objcpp; path = ViewController.mm; sourceTree = "<group>"; };
E0A5355E219A832A005A6056 /* Base */ = {isa = PBXFileReference; lastKnownFileType = file.storyboard; name = Base; path = Base.lproj/Main.storyboard; sourceTree = "<group>"; };
E0A53560219A832E005A6056 /* Assets.xcassets */ = {isa = PBXFileReference; lastKnownFileType = folder.assetcatalog; path = Assets.xcassets; sourceTree = "<group>"; };
E0A53563219A832E005A6056 /* Base */ = {isa = PBXFileReference; lastKnownFileType = file.storyboard; name = Base; path = Base.lproj/LaunchScreen.storyboard; sourceTree = "<group>"; };
E0A53565219A832E005A6056 /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; };
E0A53566219A832E005A6056 /* main.m */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.objc; path = main.m; sourceTree = "<group>"; };
E0A53574219A89FF005A6056 /* lib */ = {isa = PBXFileReference; lastKnownFileType = folder; path = lib; sourceTree = "<group>"; };
E0A5357A219AA3D1005A6056 /* AVFoundation.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = AVFoundation.framework; path = System/Library/Frameworks/AVFoundation.framework; sourceTree = SDKROOT; };
E0A5357C219AA3DE005A6056 /* AssetsLibrary.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = AssetsLibrary.framework; path = System/Library/Frameworks/AssetsLibrary.framework; sourceTree = SDKROOT; };
E0A5357E219AA3E7005A6056 /* CoreMedia.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = CoreMedia.framework; path = System/Library/Frameworks/CoreMedia.framework; sourceTree = SDKROOT; };
ED37528E24B88737008DEBDA /* opencv2.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = opencv2.framework; path = ocr_demo/opencv2.framework; sourceTree = "<group>"; };
ED37529E24B9AD5C008DEBDA /* ocr_clipper.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ocr_clipper.cpp; sourceTree = "<group>"; };
ED37529F24B9AD5C008DEBDA /* ocr_db_post_process.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ocr_db_post_process.h; sourceTree = "<group>"; };
ED3752A024B9AD5C008DEBDA /* ocr_db_post_process.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ocr_db_post_process.cpp; sourceTree = "<group>"; };
ED3752A124B9AD5C008DEBDA /* ocr_crnn_process.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ocr_crnn_process.cpp; sourceTree = "<group>"; };
ED3752A224B9AD5C008DEBDA /* ocr_clipper.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; path = ocr_clipper.hpp; sourceTree = "<group>"; };
ED3752A324B9AD5C008DEBDA /* ocr_crnn_process.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ocr_crnn_process.h; sourceTree = "<group>"; };
ED3752A724B9ADB4008DEBDA /* BoxLayer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = BoxLayer.h; sourceTree = "<group>"; };
ED3752A824B9ADB4008DEBDA /* BoxLayer.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = BoxLayer.m; sourceTree = "<group>"; };
ED3752AA24B9B118008DEBDA /* Helpers.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = Helpers.m; sourceTree = "<group>"; };
ED3752AB24B9B118008DEBDA /* Helpers.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Helpers.h; sourceTree = "<group>"; };
ED3752B124B9D4C6008DEBDA /* ch_det_mv3_db_opt.nb */ = {isa = PBXFileReference; lastKnownFileType = file; path = ch_det_mv3_db_opt.nb; sourceTree = "<group>"; };
ED3752B224B9D4C6008DEBDA /* ch_rec_mv3_crnn_opt.nb */ = {isa = PBXFileReference; lastKnownFileType = file; path = ch_rec_mv3_crnn_opt.nb; sourceTree = "<group>"; };
ED3752BA24B9DAD6008DEBDA /* ocr.png */ = {isa = PBXFileReference; lastKnownFileType = image.png; path = ocr.png; sourceTree = "<group>"; };
ED3752BB24B9DAD7008DEBDA /* label_list.txt */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = label_list.txt; sourceTree = "<group>"; };
/* End PBXFileReference section */
/* Begin PBXFrameworksBuildPhase section */
E0A53551219A8329005A6056 /* Frameworks */ = {
isa = PBXFrameworksBuildPhase;
buildActionMask = 2147483647;
files = (
ED37528F24B88737008DEBDA /* opencv2.framework in Frameworks */,
E0A5357F219AA3E7005A6056 /* CoreMedia.framework in Frameworks */,
E0A5357D219AA3DF005A6056 /* AssetsLibrary.framework in Frameworks */,
E0A5357B219AA3D1005A6056 /* AVFoundation.framework in Frameworks */,
);
runOnlyForDeploymentPostprocessing = 0;
};
/* End PBXFrameworksBuildPhase section */
/* Begin PBXGroup section */
E0846DE8230658DC00031405 /* models */ = {
isa = PBXGroup;
children = (
ED3752B124B9D4C6008DEBDA /* ch_det_mv3_db_opt.nb */,
ED3752B224B9D4C6008DEBDA /* ch_rec_mv3_crnn_opt.nb */,
);
path = models;
sourceTree = "<group>";
};
E0A5354B219A8329005A6056 = {
isa = PBXGroup;
children = (
E0A53556219A8329005A6056 /* ocr_demo */,
E0A53555219A8329005A6056 /* Products */,
E0A53570219A8945005A6056 /* Frameworks */,
);
sourceTree = "<group>";
};
E0A53555219A8329005A6056 /* Products */ = {
isa = PBXGroup;
children = (
E0A53554219A8329005A6056 /* paddle-lite-ocr.app */,
);
name = Products;
sourceTree = "<group>";
};
E0A53556219A8329005A6056 /* ocr_demo */ = {
isa = PBXGroup;
children = (
ED3752BB24B9DAD7008DEBDA /* label_list.txt */,
ED3752BA24B9DAD6008DEBDA /* ocr.png */,
ED3752AB24B9B118008DEBDA /* Helpers.h */,
ED3752AA24B9B118008DEBDA /* Helpers.m */,
ED3752A724B9ADB4008DEBDA /* BoxLayer.h */,
ED3752A824B9ADB4008DEBDA /* BoxLayer.m */,
ED37529D24B9AD5C008DEBDA /* pdocr */,
E0846DE8230658DC00031405 /* models */,
E0A18EDD219C03000015DC15 /* face.jpg */,
E0A53574219A89FF005A6056 /* lib */,
E0A53557219A832A005A6056 /* AppDelegate.h */,
E0A53558219A832A005A6056 /* AppDelegate.m */,
E0846DF2230BC93900031405 /* timer.h */,
E0A5355A219A832A005A6056 /* ViewController.h */,
E0A5355B219A832A005A6056 /* ViewController.mm */,
E0A5355D219A832A005A6056 /* Main.storyboard */,
E0A53560219A832E005A6056 /* Assets.xcassets */,
E0A53562219A832E005A6056 /* LaunchScreen.storyboard */,
E0A53565219A832E005A6056 /* Info.plist */,
E0A53566219A832E005A6056 /* main.m */,
A98EAD74A71FE136D084F392 /* OcrData.m */,
A98EAD76E0669B2BFB628B48 /* OcrData.h */,
);
path = ocr_demo;
sourceTree = "<group>";
};
E0A53570219A8945005A6056 /* Frameworks */ = {
isa = PBXGroup;
children = (
ED37528E24B88737008DEBDA /* opencv2.framework */,
E0A5357E219AA3E7005A6056 /* CoreMedia.framework */,
E0A5357C219AA3DE005A6056 /* AssetsLibrary.framework */,
E0A5357A219AA3D1005A6056 /* AVFoundation.framework */,
);
name = Frameworks;
sourceTree = "<group>";
};
ED37529D24B9AD5C008DEBDA /* pdocr */ = {
isa = PBXGroup;
children = (
ED37529E24B9AD5C008DEBDA /* ocr_clipper.cpp */,
ED37529F24B9AD5C008DEBDA /* ocr_db_post_process.h */,
ED3752A024B9AD5C008DEBDA /* ocr_db_post_process.cpp */,
ED3752A124B9AD5C008DEBDA /* ocr_crnn_process.cpp */,
ED3752A224B9AD5C008DEBDA /* ocr_clipper.hpp */,
ED3752A324B9AD5C008DEBDA /* ocr_crnn_process.h */,
);
path = pdocr;
sourceTree = "<group>";
};
/* End PBXGroup section */
/* Begin PBXNativeTarget section */
E0A53553219A8329005A6056 /* ocr_demo */ = {
isa = PBXNativeTarget;
buildConfigurationList = E0A5356A219A832E005A6056 /* Build configuration list for PBXNativeTarget "ocr_demo" */;
buildPhases = (
E0A53550219A8329005A6056 /* Sources */,
E0A53551219A8329005A6056 /* Frameworks */,
E0A53552219A8329005A6056 /* Resources */,
);
buildRules = (
);
dependencies = (
);
name = ocr_demo;
productName = seg_demo;
productReference = E0A53554219A8329005A6056 /* paddle-lite-ocr.app */;
productType = "com.apple.product-type.application";
};
/* End PBXNativeTarget section */
/* Begin PBXProject section */
E0A5354C219A8329005A6056 /* Project object */ = {
isa = PBXProject;
attributes = {
LastUpgradeCheck = 1010;
ORGANIZATIONNAME = "Li,Xiaoyang(SYS)";
TargetAttributes = {
E0A53553219A8329005A6056 = {
CreatedOnToolsVersion = 10.1;
};
};
};
buildConfigurationList = E0A5354F219A8329005A6056 /* Build configuration list for PBXProject "ocr_demo" */;
compatibilityVersion = "Xcode 9.3";
developmentRegion = en;
hasScannedForEncodings = 0;
knownRegions = (
en,
Base,
);
mainGroup = E0A5354B219A8329005A6056;
productRefGroup = E0A53555219A8329005A6056 /* Products */;
projectDirPath = "";
projectRoot = "";
targets = (
E0A53553219A8329005A6056 /* ocr_demo */,
);
};
/* End PBXProject section */
/* Begin PBXResourcesBuildPhase section */
E0A53552219A8329005A6056 /* Resources */ = {
isa = PBXResourcesBuildPhase;
buildActionMask = 2147483647;
files = (
E0A53564219A832E005A6056 /* LaunchScreen.storyboard in Resources */,
ED3752B324B9D4C6008DEBDA /* ch_det_mv3_db_opt.nb in Resources */,
E0A53576219A89FF005A6056 /* lib in Resources */,
ED3752B424B9D4C6008DEBDA /* ch_rec_mv3_crnn_opt.nb in Resources */,
E0A53561219A832E005A6056 /* Assets.xcassets in Resources */,
ED3752BC24B9DAD7008DEBDA /* ocr.png in Resources */,
E0A5355F219A832A005A6056 /* Main.storyboard in Resources */,
ED3752BD24B9DAD7008DEBDA /* label_list.txt in Resources */,
);
runOnlyForDeploymentPostprocessing = 0;
};
/* End PBXResourcesBuildPhase section */
/* Begin PBXSourcesBuildPhase section */
E0A53550219A8329005A6056 /* Sources */ = {
isa = PBXSourcesBuildPhase;
buildActionMask = 2147483647;
files = (
E0A5355C219A832A005A6056 /* ViewController.mm in Sources */,
ED3752A924B9ADB4008DEBDA /* BoxLayer.m in Sources */,
E0A53567219A832E005A6056 /* main.m in Sources */,
ED3752A424B9AD5C008DEBDA /* ocr_clipper.cpp in Sources */,
E0A53559219A832A005A6056 /* AppDelegate.m in Sources */,
ED3752A524B9AD5C008DEBDA /* ocr_db_post_process.cpp in Sources */,
ED3752AC24B9B118008DEBDA /* Helpers.m in Sources */,
ED3752A624B9AD5C008DEBDA /* ocr_crnn_process.cpp in Sources */,
A98EA8B624CD8E85B9EFADA1 /* OcrData.m in Sources */,
);
runOnlyForDeploymentPostprocessing = 0;
};
/* End PBXSourcesBuildPhase section */
/* Begin PBXVariantGroup section */
E0A5355D219A832A005A6056 /* Main.storyboard */ = {
isa = PBXVariantGroup;
children = (
E0A5355E219A832A005A6056 /* Base */,
);
name = Main.storyboard;
sourceTree = "<group>";
};
E0A53562219A832E005A6056 /* LaunchScreen.storyboard */ = {
isa = PBXVariantGroup;
children = (
E0A53563219A832E005A6056 /* Base */,
);
name = LaunchScreen.storyboard;
sourceTree = "<group>";
};
/* End PBXVariantGroup section */
/* Begin XCBuildConfiguration section */
E0A53568219A832E005A6056 /* Debug */ = {
isa = XCBuildConfiguration;
buildSettings = {
ALWAYS_SEARCH_USER_PATHS = NO;
CLANG_ANALYZER_NONNULL = YES;
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE;
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14";
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_ENABLE_OBJC_WEAK = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_DOCUMENTATION_COMMENTS = YES;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
CODE_SIGN_IDENTITY = "iPhone Developer";
COPY_PHASE_STRIP = NO;
DEBUG_INFORMATION_FORMAT = dwarf;
ENABLE_STRICT_OBJC_MSGSEND = YES;
ENABLE_TESTABILITY = YES;
GCC_C_LANGUAGE_STANDARD = gnu11;
GCC_DYNAMIC_NO_PIC = NO;
GCC_NO_COMMON_BLOCKS = YES;
GCC_OPTIMIZATION_LEVEL = 0;
GCC_PREPROCESSOR_DEFINITIONS = (
"DEBUG=1",
"$(inherited)",
);
GCC_WARN_64_TO_32_BIT_CONVERSION = YES;
GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR;
GCC_WARN_UNDECLARED_SELECTOR = YES;
GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE;
GCC_WARN_UNUSED_FUNCTION = YES;
GCC_WARN_UNUSED_VARIABLE = YES;
IPHONEOS_DEPLOYMENT_TARGET = 12.1;
MTL_ENABLE_DEBUG_INFO = INCLUDE_SOURCE;
MTL_FAST_MATH = YES;
ONLY_ACTIVE_ARCH = YES;
SDKROOT = iphoneos;
};
name = Debug;
};
E0A53569219A832E005A6056 /* Release */ = {
isa = XCBuildConfiguration;
buildSettings = {
ALWAYS_SEARCH_USER_PATHS = NO;
CLANG_ANALYZER_NONNULL = YES;
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE;
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14";
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_ENABLE_OBJC_WEAK = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_DOCUMENTATION_COMMENTS = YES;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
CODE_SIGN_IDENTITY = "iPhone Developer";
COPY_PHASE_STRIP = NO;
DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym";
ENABLE_NS_ASSERTIONS = NO;
ENABLE_STRICT_OBJC_MSGSEND = YES;
GCC_C_LANGUAGE_STANDARD = gnu11;
GCC_NO_COMMON_BLOCKS = YES;
GCC_WARN_64_TO_32_BIT_CONVERSION = YES;
GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR;
GCC_WARN_UNDECLARED_SELECTOR = YES;
GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE;
GCC_WARN_UNUSED_FUNCTION = YES;
GCC_WARN_UNUSED_VARIABLE = YES;
IPHONEOS_DEPLOYMENT_TARGET = 12.1;
MTL_ENABLE_DEBUG_INFO = NO;
MTL_FAST_MATH = YES;
SDKROOT = iphoneos;
VALIDATE_PRODUCT = YES;
};
name = Release;
};
E0A5356B219A832E005A6056 /* Debug */ = {
isa = XCBuildConfiguration;
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
CODE_SIGN_IDENTITY = "Apple Development";
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = "";
FRAMEWORK_SEARCH_PATHS = (
"$(inherited)",
"$(PROJECT_DIR)/ocr_demo",
);
INFOPLIST_FILE = "$(SRCROOT)/ocr_demo/Info.plist";
IPHONEOS_DEPLOYMENT_TARGET = 12.0;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"@executable_path/Frameworks",
);
LIBRARY_SEARCH_PATHS = "$(PROJECT_DIR)/ocr_demo/lib/";
OTHER_LDFLAGS = "$(PROJECT_DIR)/ocr_demo/lib/libpaddle_api_light_bundled.a";
PRODUCT_BUNDLE_IDENTIFIER = "com.baidu.paddlelite-ocr-demo";
PRODUCT_NAME = "paddle-lite-ocr";
PROVISIONING_PROFILE_SPECIFIER = "";
TARGETED_DEVICE_FAMILY = "1,2";
USER_HEADER_SEARCH_PATHS = "\"$(SRCROOT)/ocr_demo/\"";
};
name = Debug;
};
E0A5356C219A832E005A6056 /* Release */ = {
isa = XCBuildConfiguration;
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
CODE_SIGN_IDENTITY = "Apple Development";
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = "";
FRAMEWORK_SEARCH_PATHS = (
"$(inherited)",
"$(PROJECT_DIR)/ocr_demo",
);
INFOPLIST_FILE = "$(SRCROOT)/ocr_demo/Info.plist";
IPHONEOS_DEPLOYMENT_TARGET = 12.0;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"@executable_path/Frameworks",
);
LIBRARY_SEARCH_PATHS = "$(PROJECT_DIR)/ocr_demo/lib/";
OTHER_LDFLAGS = "$(PROJECT_DIR)/ocr_demo/lib/libpaddle_api_light_bundled.a";
PRODUCT_BUNDLE_IDENTIFIER = "com.baidu.paddlelite-ocr-demo";
PRODUCT_NAME = "paddle-lite-ocr";
PROVISIONING_PROFILE_SPECIFIER = "";
TARGETED_DEVICE_FAMILY = "1,2";
USER_HEADER_SEARCH_PATHS = "\"$(SRCROOT)/ocr_demo/\"";
};
name = Release;
};
/* End XCBuildConfiguration section */
/* Begin XCConfigurationList section */
E0A5354F219A8329005A6056 /* Build configuration list for PBXProject "ocr_demo" */ = {
isa = XCConfigurationList;
buildConfigurations = (
E0A53568219A832E005A6056 /* Debug */,
E0A53569219A832E005A6056 /* Release */,
);
defaultConfigurationIsVisible = 0;
defaultConfigurationName = Release;
};
E0A5356A219A832E005A6056 /* Build configuration list for PBXNativeTarget "ocr_demo" */ = {
isa = XCConfigurationList;
buildConfigurations = (
E0A5356B219A832E005A6056 /* Debug */,
E0A5356C219A832E005A6056 /* Release */,
);
defaultConfigurationIsVisible = 0;
defaultConfigurationName = Release;
};
/* End XCConfigurationList section */
};
rootObject = E0A5354C219A8329005A6056 /* Project object */;
}

View File

@ -0,0 +1,17 @@
//
// AppDelegate.h
// seg_demo
//
// Created by Li,Xiaoyang(SYS) on 2018/11/13.
// Copyright © 2018年 Li,Xiaoyang(SYS). All rights reserved.
//
#import <UIKit/UIKit.h>
@interface AppDelegate : UIResponder <UIApplicationDelegate>
@property (strong, nonatomic) UIWindow *window;
@end

View File

@ -0,0 +1,51 @@
//
// AppDelegate.m
// seg_demo
//
// Created by Li,Xiaoyang(SYS) on 2018/11/13.
// Copyright © 2018 Li,Xiaoyang(SYS). All rights reserved.
//
#import "AppDelegate.h"
@interface AppDelegate ()
@end
@implementation AppDelegate
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// Override point for customization after application launch.
return YES;
}
- (void)applicationWillResignActive:(UIApplication *)application {
// Sent when the application is about to move from active to inactive state. This can occur for certain types of temporary interruptions (such as an incoming phone call or SMS message) or when the user quits the application and it begins the transition to the background state.
// Use this method to pause ongoing tasks, disable timers, and invalidate graphics rendering callbacks. Games should use this method to pause the game.
}
- (void)applicationDidEnterBackground:(UIApplication *)application {
// Use this method to release shared resources, save user data, invalidate timers, and store enough application state information to restore your application to its current state in case it is terminated later.
// If your application supports background execution, this method is called instead of applicationWillTerminate: when the user quits.
}
- (void)applicationWillEnterForeground:(UIApplication *)application {
// Called as part of the transition from the background to the active state; here you can undo many of the changes made on entering the background.
}
- (void)applicationDidBecomeActive:(UIApplication *)application {
// Restart any tasks that were paused (or not yet started) while the application was inactive. If the application was previously in the background, optionally refresh the user interface.
}
- (void)applicationWillTerminate:(UIApplication *)application {
// Called when the application is about to terminate. Save data if appropriate. See also applicationDidEnterBackground:.
}
@end

View File

@ -0,0 +1,98 @@
{
"images" : [
{
"idiom" : "iphone",
"size" : "20x20",
"scale" : "2x"
},
{
"idiom" : "iphone",
"size" : "20x20",
"scale" : "3x"
},
{
"idiom" : "iphone",
"size" : "29x29",
"scale" : "2x"
},
{
"idiom" : "iphone",
"size" : "29x29",
"scale" : "3x"
},
{
"idiom" : "iphone",
"size" : "40x40",
"scale" : "2x"
},
{
"idiom" : "iphone",
"size" : "40x40",
"scale" : "3x"
},
{
"idiom" : "iphone",
"size" : "60x60",
"scale" : "2x"
},
{
"idiom" : "iphone",
"size" : "60x60",
"scale" : "3x"
},
{
"idiom" : "ipad",
"size" : "20x20",
"scale" : "1x"
},
{
"idiom" : "ipad",
"size" : "20x20",
"scale" : "2x"
},
{
"idiom" : "ipad",
"size" : "29x29",
"scale" : "1x"
},
{
"idiom" : "ipad",
"size" : "29x29",
"scale" : "2x"
},
{
"idiom" : "ipad",
"size" : "40x40",
"scale" : "1x"
},
{
"idiom" : "ipad",
"size" : "40x40",
"scale" : "2x"
},
{
"idiom" : "ipad",
"size" : "76x76",
"scale" : "1x"
},
{
"idiom" : "ipad",
"size" : "76x76",
"scale" : "2x"
},
{
"idiom" : "ipad",
"size" : "83.5x83.5",
"scale" : "2x"
},
{
"idiom" : "ios-marketing",
"size" : "1024x1024",
"scale" : "1x"
}
],
"info" : {
"version" : 1,
"author" : "xcode"
}
}

View File

@ -0,0 +1,6 @@
{
"info" : {
"version" : 1,
"author" : "xcode"
}
}

View File

@ -0,0 +1,25 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<document type="com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB" version="3.0" toolsVersion="13122.16" targetRuntime="iOS.CocoaTouch" propertyAccessControl="none" useAutolayout="YES" launchScreen="YES" useTraitCollections="YES" useSafeAreas="YES" colorMatched="YES" initialViewController="01J-lp-oVM">
<dependencies>
<plugIn identifier="com.apple.InterfaceBuilder.IBCocoaTouchPlugin" version="13104.12"/>
<capability name="Safe area layout guides" minToolsVersion="9.0"/>
<capability name="documents saved in the Xcode 8 format" minToolsVersion="8.0"/>
</dependencies>
<scenes>
<!--View Controller-->
<scene sceneID="EHf-IW-A2E">
<objects>
<viewController id="01J-lp-oVM" sceneMemberID="viewController">
<view key="view" contentMode="scaleToFill" id="Ze5-6b-2t3">
<rect key="frame" x="0.0" y="0.0" width="375" height="667"/>
<autoresizingMask key="autoresizingMask" widthSizable="YES" heightSizable="YES"/>
<color key="backgroundColor" red="1" green="1" blue="1" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
<viewLayoutGuide key="safeArea" id="6Tk-OE-BBY"/>
</view>
</viewController>
<placeholder placeholderIdentifier="IBFirstResponder" id="iYj-Kq-Ea1" userLabel="First Responder" sceneMemberID="firstResponder"/>
</objects>
<point key="canvasLocation" x="53" y="375"/>
</scene>
</scenes>
</document>

View File

@ -0,0 +1,111 @@
<?xml version="1.0" encoding="UTF-8"?>
<document type="com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB" version="3.0" toolsVersion="16097" targetRuntime="iOS.CocoaTouch" propertyAccessControl="none" useAutolayout="YES" useTraitCollections="YES" useSafeAreas="YES" colorMatched="YES" initialViewController="BYZ-38-t0r">
<device id="retina4_7" orientation="portrait" appearance="light"/>
<dependencies>
<deployment identifier="iOS"/>
<plugIn identifier="com.apple.InterfaceBuilder.IBCocoaTouchPlugin" version="16087"/>
<capability name="Safe area layout guides" minToolsVersion="9.0"/>
<capability name="documents saved in the Xcode 8 format" minToolsVersion="8.0"/>
</dependencies>
<scenes>
<!--View Controller-->
<scene sceneID="tne-QT-ifu">
<objects>
<viewController id="BYZ-38-t0r" customClass="ViewController" sceneMemberID="viewController">
<view key="view" contentMode="scaleToFill" id="8bC-Xf-vdC">
<rect key="frame" x="0.0" y="0.0" width="375" height="667"/>
<autoresizingMask key="autoresizingMask" widthSizable="YES" heightSizable="YES"/>
<subviews>
<switch opaque="NO" contentMode="scaleToFill" horizontalHuggingPriority="750" verticalHuggingPriority="750" contentHorizontalAlignment="center" contentVerticalAlignment="center" translatesAutoresizingMaskIntoConstraints="NO" id="yZw-YR-x44">
<rect key="frame" x="114.5" y="624" width="51" height="31"/>
</switch>
<switch opaque="NO" contentMode="scaleToFill" horizontalHuggingPriority="750" verticalHuggingPriority="750" contentHorizontalAlignment="center" contentVerticalAlignment="center" translatesAutoresizingMaskIntoConstraints="NO" id="wN7-2M-FdP">
<rect key="frame" x="16" y="624" width="56" height="31"/>
</switch>
<label opaque="NO" userInteractionEnabled="NO" contentMode="left" horizontalHuggingPriority="251" verticalHuggingPriority="251" text="前/后摄像头" textAlignment="natural" lineBreakMode="tailTruncation" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="lu6-Fq-OIg">
<rect key="frame" x="93" y="594" width="92" height="21"/>
<fontDescription key="fontDescription" type="system" pointSize="17"/>
<nil key="textColor"/>
<nil key="highlightedColor"/>
</label>
<label opaque="NO" userInteractionEnabled="NO" contentMode="left" horizontalHuggingPriority="251" verticalHuggingPriority="251" horizontalCompressionResistancePriority="751" text="开启相机" textAlignment="natural" lineBreakMode="tailTruncation" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="VfD-z1-Okj">
<rect key="frame" x="8" y="594" width="70" height="21"/>
<fontDescription key="fontDescription" type="system" pointSize="17"/>
<nil key="textColor"/>
<nil key="highlightedColor"/>
</label>
<imageView userInteractionEnabled="NO" contentMode="scaleToFill" horizontalHuggingPriority="251" verticalHuggingPriority="251" translatesAutoresizingMaskIntoConstraints="NO" id="ptx-ND-Ywq">
<rect key="frame" x="0.0" y="0.0" width="375" height="564"/>
</imageView>
<label opaque="NO" userInteractionEnabled="NO" contentMode="left" horizontalHuggingPriority="251" verticalHuggingPriority="251" text="" textAlignment="natural" lineBreakMode="tailTruncation" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="T9y-OT-OQS">
<rect key="frame" x="33" y="574" width="309" height="10"/>
<constraints>
<constraint firstAttribute="height" constant="10" id="pMg-XK-d3N"/>
</constraints>
<fontDescription key="fontDescription" type="system" pointSize="17"/>
<nil key="textColor"/>
<nil key="highlightedColor"/>
</label>
<button opaque="NO" contentMode="scaleToFill" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="HJ5-UE-PrR">
<rect key="frame" x="302" y="620.5" width="43" height="38"/>
<fontDescription key="fontDescription" type="system" pointSize="21"/>
<state key="normal" title="拍照"/>
<connections>
<action selector="cap_photo:" destination="BYZ-38-t0r" eventType="touchUpInside" id="PbV-pB-BRY"/>
</connections>
</button>
<switch opaque="NO" contentMode="scaleToFill" horizontalHuggingPriority="750" verticalHuggingPriority="750" contentHorizontalAlignment="center" contentVerticalAlignment="center" translatesAutoresizingMaskIntoConstraints="NO" id="rc6-ZX-igF">
<rect key="frame" x="208" y="624" width="51" height="31"/>
<connections>
<action selector="swith_video_photo:" destination="BYZ-38-t0r" eventType="valueChanged" id="I05-92-4FW"/>
</connections>
</switch>
<label opaque="NO" userInteractionEnabled="NO" contentMode="left" horizontalHuggingPriority="251" verticalHuggingPriority="251" text="视频/拍照" textAlignment="natural" lineBreakMode="tailTruncation" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="0tm-fo-hjF">
<rect key="frame" x="195" y="595" width="75" height="21"/>
<fontDescription key="fontDescription" type="system" pointSize="17"/>
<nil key="textColor"/>
<nil key="highlightedColor"/>
</label>
</subviews>
<color key="backgroundColor" red="1" green="1" blue="1" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
<constraints>
<constraint firstItem="VfD-z1-Okj" firstAttribute="top" secondItem="T9y-OT-OQS" secondAttribute="bottom" constant="10" id="BZ1-F0-re1"/>
<constraint firstItem="wN7-2M-FdP" firstAttribute="top" secondItem="rc6-ZX-igF" secondAttribute="top" id="Gzf-hC-X6O"/>
<constraint firstItem="wN7-2M-FdP" firstAttribute="leading" secondItem="8bC-Xf-vdC" secondAttribute="leadingMargin" id="JB8-MT-bdB"/>
<constraint firstItem="lu6-Fq-OIg" firstAttribute="leading" secondItem="VfD-z1-Okj" secondAttribute="trailing" constant="15" id="JbA-wd-hE8"/>
<constraint firstItem="wN7-2M-FdP" firstAttribute="centerX" secondItem="VfD-z1-Okj" secondAttribute="centerX" id="LW4-R4-nh2"/>
<constraint firstItem="0tm-fo-hjF" firstAttribute="leading" secondItem="lu6-Fq-OIg" secondAttribute="trailing" constant="10" id="NDI-W8-717"/>
<constraint firstItem="6Tk-OE-BBY" firstAttribute="trailing" secondItem="ptx-ND-Ywq" secondAttribute="trailing" id="V5z-FH-SFs"/>
<constraint firstItem="ptx-ND-Ywq" firstAttribute="leading" secondItem="6Tk-OE-BBY" secondAttribute="leading" id="dET-yr-Mon"/>
<constraint firstItem="ptx-ND-Ywq" firstAttribute="top" secondItem="6Tk-OE-BBY" secondAttribute="top" id="fn8-6Z-tv4"/>
<constraint firstItem="T9y-OT-OQS" firstAttribute="leading" secondItem="6Tk-OE-BBY" secondAttribute="leading" constant="33" id="iMB-Zg-Hsa"/>
<constraint firstItem="VfD-z1-Okj" firstAttribute="leading" secondItem="6Tk-OE-BBY" secondAttribute="leading" constant="8" id="izE-la-Fhu"/>
<constraint firstItem="wN7-2M-FdP" firstAttribute="top" secondItem="VfD-z1-Okj" secondAttribute="bottom" constant="9" id="jcU-7c-FNS"/>
<constraint firstItem="HJ5-UE-PrR" firstAttribute="centerY" secondItem="rc6-ZX-igF" secondAttribute="centerY" id="lpA-wq-cXI"/>
<constraint firstItem="6Tk-OE-BBY" firstAttribute="trailing" secondItem="T9y-OT-OQS" secondAttribute="trailing" constant="33" id="mD1-P0-mgB"/>
<constraint firstItem="rc6-ZX-igF" firstAttribute="centerX" secondItem="0tm-fo-hjF" secondAttribute="centerX" id="p5w-6o-OqW"/>
<constraint firstItem="rc6-ZX-igF" firstAttribute="top" secondItem="0tm-fo-hjF" secondAttribute="bottom" constant="8" id="rzr-oM-f7f"/>
<constraint firstItem="6Tk-OE-BBY" firstAttribute="trailing" secondItem="HJ5-UE-PrR" secondAttribute="trailing" constant="30" id="tYA-x1-MRj"/>
<constraint firstItem="T9y-OT-OQS" firstAttribute="top" secondItem="ptx-ND-Ywq" secondAttribute="bottom" constant="10" id="vNp-h8-QF9"/>
<constraint firstItem="VfD-z1-Okj" firstAttribute="baseline" secondItem="lu6-Fq-OIg" secondAttribute="baseline" id="wcZ-9g-OTX"/>
<constraint firstItem="6Tk-OE-BBY" firstAttribute="bottom" secondItem="wN7-2M-FdP" secondAttribute="bottom" constant="12" id="xm2-Eb-dxp"/>
<constraint firstItem="wN7-2M-FdP" firstAttribute="top" secondItem="yZw-YR-x44" secondAttribute="top" id="yHi-Fb-V4o"/>
<constraint firstItem="yZw-YR-x44" firstAttribute="centerX" secondItem="lu6-Fq-OIg" secondAttribute="centerX" id="yXW-Ap-sa7"/>
<constraint firstItem="VfD-z1-Okj" firstAttribute="centerY" secondItem="lu6-Fq-OIg" secondAttribute="centerY" id="zQ1-gg-Rnh"/>
</constraints>
<viewLayoutGuide key="safeArea" id="6Tk-OE-BBY"/>
</view>
<connections>
<outlet property="flag_back_cam" destination="yZw-YR-x44" id="z5O-BW-sm7"/>
<outlet property="flag_process" destination="wN7-2M-FdP" id="i8h-CM-ida"/>
<outlet property="flag_video" destination="rc6-ZX-igF" id="Uch-KB-gwF"/>
<outlet property="imageView" destination="ptx-ND-Ywq" id="XjA-C2-hvm"/>
<outlet property="result" destination="T9y-OT-OQS" id="6kB-Ha-dfo"/>
</connections>
</viewController>
<placeholder placeholderIdentifier="IBFirstResponder" id="dkx-z0-nzr" sceneMemberID="firstResponder"/>
</objects>
<point key="canvasLocation" x="53.600000000000001" y="62.518740629685162"/>
</scene>
</scenes>
</document>

View File

@ -0,0 +1,20 @@
//
// Created by chenxiaoyu on 2018/5/5.
// Copyright (c) 2018 baidu. All rights reserved.
//
#import <Foundation/Foundation.h>
#import <UIKit/UIKit.h>
#import "OcrData.h"
@interface BoxLayer : CAShapeLayer
/**
* OCR的结果
*/
-(void) renderOcrPolygon: (OcrData *)data withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth withLabel:(bool) withLabel;
@end

View File

@ -0,0 +1,80 @@
//
// Created by chenxiaoyu on 2018/5/5.
// Copyright (c) 2018 baidu. All rights reserved.
//
#include "BoxLayer.h"
#import "Helpers.h"
@implementation BoxLayer {
}
#define MAIN_COLOR UIColorFromRGB(0x3B85F5)
- (void)renderOcrPolygon:(OcrData *)d withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth withLabel:(bool)withLabel {
if ([d.polygonPoints count] != 4) {
NSLog(@"poloygonPoints size is not 4");
return;
}
CGPoint startPoint = [d.polygonPoints[0] CGPointValue];
NSString *text = d.label;
CGFloat x = startPoint.x * originWidth;
CGFloat y = startPoint.y * originHeight;
CGFloat width = originWidth - x;
CGFloat height = originHeight - y;
UIFont *font = [UIFont systemFontOfSize:16];
NSDictionary *attrs = @{
// NSStrokeColorAttributeName: [UIColor blackColor],
NSForegroundColorAttributeName: [UIColor whiteColor],
// NSStrokeWidthAttributeName : @((float) -6.0),
NSFontAttributeName: font
};
if (withLabel) {
NSAttributedString *displayStr = [[NSAttributedString alloc] initWithString:text attributes:attrs];
CATextLayer *textLayer = [[CATextLayer alloc] init];
textLayer.wrapped = YES;
textLayer.string = displayStr;
textLayer.frame = CGRectMake(x + 2, y + 2, width, height);
textLayer.contentsScale = [[UIScreen mainScreen] scale];
//
// textLayer.shadowColor = [MAIN_COLOR CGColor];
// textLayer.shadowOffset = CGSizeMake(2.0, 2.0);
// textLayer.shadowOpacity = 0.8;
// textLayer.shadowRadius = 0.0;
[self addSublayer:textLayer];
}
UIBezierPath *path = [UIBezierPath new];
[path moveToPoint:CGPointMake(startPoint.x * originWidth, startPoint.y * originHeight)];
for (NSValue *val in d.polygonPoints) {
CGPoint p = [val CGPointValue];
[path addLineToPoint:CGPointMake(p.x * originWidth, p.y * originHeight)];
}
[path closePath];
self.path = path.CGPath;
self.strokeColor = MAIN_COLOR.CGColor;
self.lineWidth = 2.0;
self.fillColor = [MAIN_COLOR colorWithAlphaComponent:0.2].CGColor;
self.lineJoin = kCALineJoinBevel;
}
- (void)renderSingleBox:(OcrData *)data withHeight:(CGFloat)originHeight withWidth:(CGFloat)originWidth {
[self renderOcrPolygon:data withHeight:originHeight withWidth:originWidth withLabel:YES];
}
@end

View File

@ -0,0 +1,31 @@
//
// Helpers.h
// EasyDLDemo
//
// Created by chenxiaoyu on 2018/5/14.
// Copyright © 2018年 baidu. All rights reserved.
//
#import <Foundation/Foundation.h>
#import <UIKit/UIImage.h>
#define UIColorFromRGB(rgbValue) \
[UIColor colorWithRed:((float)((rgbValue & 0xFF0000) >> 16))/255.0 \
green:((float)((rgbValue & 0x00FF00) >> 8))/255.0 \
blue:((float)((rgbValue & 0x0000FF) >> 0))/255.0 \
alpha:1.0]
#define SCREEN_HEIGHT [UIScreen mainScreen].bounds.size.height
#define SCREEN_WIDTH [UIScreen mainScreen].bounds.size.width
#define HIGHLIGHT_COLOR UIColorFromRGB(0xF5A623)
//#define BTN_HIGHTLIGH_TEXT_COLOR UIColorFromRGB(0xF5A623)
@interface Helpers : NSObject {
}
@end

View File

@ -0,0 +1,17 @@
//
// Helpers.m
// EasyDLDemo
//
// Created by chenxiaoyu on 2018/5/14.
// Copyright © 2018 baidu. All rights reserved.
//
#import "Helpers.h"
@implementation Helpers
@end

View File

@ -0,0 +1,49 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>$(DEVELOPMENT_LANGUAGE)</string>
<key>CFBundleDisplayName</key>
<string>$(PRODUCT_NAME)</string>
<key>CFBundleExecutable</key>
<string>$(EXECUTABLE_NAME)</string>
<key>CFBundleIdentifier</key>
<string>$(PRODUCT_BUNDLE_IDENTIFIER)</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>$(PRODUCT_NAME)</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleShortVersionString</key>
<string>0.1</string>
<key>CFBundleVersion</key>
<string>1</string>
<key>LSRequiresIPhoneOS</key>
<true/>
<key>NSCameraUsageDescription</key>
<string>for test</string>
<key>UILaunchStoryboardName</key>
<string>LaunchScreen</string>
<key>UIMainStoryboardFile</key>
<string>Main</string>
<key>UIRequiredDeviceCapabilities</key>
<array>
<string>armv7</string>
</array>
<key>UISupportedInterfaceOrientations</key>
<array>
<string>UIInterfaceOrientationPortrait</string>
<string>UIInterfaceOrientationLandscapeLeft</string>
<string>UIInterfaceOrientationLandscapeRight</string>
</array>
<key>UISupportedInterfaceOrientations~ipad</key>
<array>
<string>UIInterfaceOrientationPortrait</string>
<string>UIInterfaceOrientationPortraitUpsideDown</string>
<string>UIInterfaceOrientationLandscapeLeft</string>
<string>UIInterfaceOrientationLandscapeRight</string>
</array>
</dict>
</plist>

View File

@ -0,0 +1,15 @@
//
// Created by Lv,Xiangxiang on 2020/7/11.
// Copyright (c) 2020 Li,Xiaoyang(SYS). All rights reserved.
//
#import <Foundation/Foundation.h>
@interface OcrData : NSObject
@property(nonatomic, copy) NSString *label;
@property(nonatomic) int category;
@property(nonatomic) float accuracy;
@property(nonatomic) NSArray *polygonPoints;
@end

View File

@ -0,0 +1,12 @@
//
// Created by Lv,Xiangxiang on 2020/7/11.
// Copyright (c) 2020 Li,Xiaoyang(SYS). All rights reserved.
//
#import "OcrData.h"
@implementation OcrData {
}
@end

View File

@ -0,0 +1,16 @@
//
// ViewController.h
// seg_demo
//
// Created by Li,Xiaoyang(SYS) on 2018/11/13.
// Copyright © 2018年 Li,Xiaoyang(SYS). All rights reserved.
//
#import <UIKit/UIKit.h>
@interface ViewController : UIViewController
@end

Some files were not shown because too many files have changed in this diff Show More