2020-06-04 18:25:23 +08:00
## Introduction
PaddleOCR aims to create a rich, leading, and practical OCR tool library to help users train better models and apply them.
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
**Recent updates**
- 2020.5.30, Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30, Open source general Chinese OCR model
- 2020.5.30, Provide Ultra-lightweight Chinese OCR model inference
2020-05-30 11:57:49 +08:00
2020-06-04 18:25:23 +08:00
## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
- Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
2020-05-14 11:05:33 +08:00
2020-06-04 18:25:23 +08:00
### Supported Chinese models list:
2020-05-30 11:57:49 +08:00
2020-06-04 18:25:23 +08:00
|Model Name|Description |Detection Model link|Recognition Model link|
2020-05-30 11:57:49 +08:00
|-|-|-|-|
2020-06-04 18:25:23 +08:00
|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model ](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar )|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model ](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar )|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model ](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar )|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model ](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar )|
2020-05-29 17:34:56 +08:00
2020-06-04 18:25:23 +08:00
For testing our Chinese OCR online: https://www.paddlepaddle.org.cn/hub/scene/ocr
2020-05-29 17:34:56 +08:00
2020-06-04 18:25:23 +08:00
**You can also quickly experience the Ultra-lightweight Chinese OCR and general Chinese OCR models as follows:**
## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
2020-05-14 00:01:00 +08:00
2020-05-14 12:08:11 +08:00
![](doc/imgs_results/11.jpg)
2020-05-14 01:51:17 +08:00
2020-06-04 18:25:23 +08:00
The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results ](#超轻量级中文OCR效果展示 ) and [General Chinese OCR results ](#通用中文OCR效果展示 ).
2020-05-14 11:05:33 +08:00
2020-06-04 18:25:23 +08:00
#### 1. Environment configuration
2020-05-14 01:42:35 +08:00
2020-06-04 18:25:23 +08:00
Please see [Quick installation ](./doc/installation.md )
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
#### 2. Download inference models
2020-05-14 01:51:17 +08:00
2020-06-04 18:25:23 +08:00
#### (1) Download Ultra-lightweight Chinese OCR models
2020-05-14 00:01:00 +08:00
```
2020-05-29 17:34:56 +08:00
mkdir inference & & cd inference
2020-06-04 18:25:23 +08:00
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
2020-05-29 17:34:56 +08:00
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar & & tar xf ch_det_mv3_db_infer.tar
2020-06-04 18:25:23 +08:00
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
2020-05-29 17:34:56 +08:00
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar & & tar xf ch_rec_mv3_crnn_infer.tar
2020-05-30 11:57:49 +08:00
cd ..
```
2020-06-04 18:25:23 +08:00
#### (2) Download General Chinese OCR models
2020-05-30 11:57:49 +08:00
```
mkdir inference & & cd inference
2020-06-04 18:25:23 +08:00
# Download the detection part of the general Chinese OCR model and decompress it
2020-05-30 11:57:49 +08:00
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar & & tar xf ch_det_r50_vd_db_infer.tar
2020-06-04 18:25:23 +08:00
# Download the recognition part of the generic Chinese OCR model and decompress it
2020-05-30 11:57:49 +08:00
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar & & tar xf ch_rec_r34_vd_crnn_infer.tar
cd ..
2020-05-14 00:01:00 +08:00
```
2020-06-04 18:25:23 +08:00
#### 3. Single image and batch image prediction
2020-05-14 11:05:33 +08:00
2020-06-04 18:25:23 +08:00
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir` , the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
2020-05-14 11:05:33 +08:00
2020-05-14 00:01:00 +08:00
```
2020-06-04 18:25:23 +08:00
# Set PYTHONPATH environment variable
2020-05-14 00:01:00 +08:00
export PYTHONPATH=.
2020-06-04 18:25:23 +08:00
# Predict a single image by specifying image path to image_dir
2020-05-30 11:57:49 +08:00
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
2020-05-14 11:05:33 +08:00
2020-06-04 18:25:23 +08:00
# Predict a batch of images by specifying image folder path to image_dir
2020-05-30 11:57:49 +08:00
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
2020-05-15 22:07:18 +08:00
2020-06-04 18:25:23 +08:00
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
2020-05-30 11:57:49 +08:00
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
2020-06-04 18:25:23 +08:00
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
2020-05-30 11:57:49 +08:00
```
2020-06-04 18:25:23 +08:00
# Predict a single image by specifying image path to image_dir
2020-05-30 11:57:49 +08:00
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
2020-05-14 00:01:00 +08:00
```
2020-05-29 17:34:56 +08:00
2020-06-04 18:25:23 +08:00
For more text detection and recognition models, please refer to the document [Inference ](./doc/inference.md )
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
## Documentation
- [Quick installation ](./doc/installation.md )
- [Text detection model training/evaluation/prediction ](./doc/detection.md )
- [Text recognition model training/evaluation/prediction ](./doc/recognition.md )
- [Inference ](./doc/inference.md )
2020-05-14 01:40:20 +08:00
2020-06-04 18:25:23 +08:00
## Text detection algorithm
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
PaddleOCR open source text detection algorithm list:
2020-05-14 12:04:43 +08:00
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
2020-05-14 12:07:22 +08:00
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
2020-06-04 18:25:23 +08:00
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
On the ICDAR2015 text detection public dataset, the detection result is as follows:
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
|Model|Backbone|precision|recall|Hmean|Download link|
2020-05-25 18:14:13 +08:00
|-|-|-|-|-|-|
2020-06-04 18:25:23 +08:00
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
2020-05-25 16:29:20 +08:00
2020-06-04 18:25:23 +08:00
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction ](./doc/detection.md )
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
## Text recognition algorithm
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
PaddleOCR open-source text recognition algorithm list:
2020-05-14 12:04:43 +08:00
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
2020-06-04 18:25:23 +08:00
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
Refer to [DTRB ](https://arxiv.org/abs/1904.01906 ), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
|Model|Backbone|Avg Accuracy|Module combination|Download link|
2020-05-15 19:51:49 +08:00
|-|-|-|-|-|
2020-06-04 18:25:23 +08:00
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction ](./doc/recognition.md )
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
## End-to-end OCR algorithm
- [ ] [End2End-PSL ](https://arxiv.org/abs/1909.07808 )(Baidu Self-Research, comming soon)
2020-05-14 00:01:00 +08:00
2020-06-02 20:42:16 +08:00
< a name = "超轻量级中文OCR效果展示" > < / a >
2020-06-04 18:25:23 +08:00
## Ultra-lightweight Chinese OCR result
2020-05-14 12:08:11 +08:00
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
2020-05-14 00:01:00 +08:00
2020-06-02 20:42:16 +08:00
< a name = "通用中文OCR效果展示" > < / a >
2020-06-02 16:47:04 +08:00
## 通用中文OCR效果展示
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)
2020-06-02 20:33:26 +08:00
## FAQ
1. 预测报错: got an unexpected keyword argument 'gradient_clip'
2020-06-03 11:23:55 +08:00
2020-06-04 18:25:23 +08:00
The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.。
2020-06-02 20:33:26 +08:00
2. 转换attention识别模型时报错: KeyError: 'predict'
2020-06-03 11:23:55 +08:00
2020-06-02 20:33:26 +08:00
基于Attention损失的识别模型推理还在调试中。对于中文文本识别, 建议优先选择基于CTC损失的识别模型, 实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。
3. 关于推理速度
2020-06-03 11:23:55 +08:00
2020-06-02 20:33:26 +08:00
图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num, 默认值为30, 可以改为10或其他数值。
4. 服务部署与移动端部署
2020-06-03 11:23:55 +08:00
2020-06-02 20:33:26 +08:00
预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案, 欢迎持续关注。
5. 自研算法发布时间
2020-06-03 11:23:55 +08:00
2020-06-02 20:33:26 +08:00
自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布, 敬请期待。
2020-06-04 18:25:23 +08:00
## Welcome to the PaddleOCR technical exchange group
2020-06-03 11:23:55 +08:00
加微信: paddlehelp, 备注OCR, 小助手拉你进群~
2020-06-02 20:33:26 +08:00
2020-05-14 00:01:00 +08:00
2020-06-04 18:25:23 +08:00
## References
2020-05-14 00:01:00 +08:00
```
1. EAST:
@inproceedings {zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article {liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings {baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings {wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article {yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings {sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
2020-05-14 01:58:00 +08:00
2020-06-04 18:25:23 +08:00
## License
This project is released under < a href = "https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE" > Apache 2.0 license< / a >
2020-05-14 01:58:00 +08:00
2020-06-04 18:25:23 +08:00
## Contribution
We welcome your contribution to PaddleOCR and thank you for your feedback.