diff --git a/README_en.md b/README_en.md new file mode 100644 index 00000000..55e33c12 --- /dev/null +++ b/README_en.md @@ -0,0 +1,256 @@ +## 简介 +PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 + +**近期更新** +- 2020.6.8 添加[数据集](./doc/datasets.md),并保持持续更新 +- 2020.6.5 支持 `attetnion` 模型导出 `inference_model` +- 2020.6.5 支持单独预测识别时,输出结果得分 +- 2020.5.30 提供超轻量级中文OCR在线体验 +- 2020.5.30 模型预测、训练支持Windows系统 +- [more](./doc/update.md) + +## 特性 +- 超轻量级中文OCR,总模型仅8.6M + - 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 + - 检测模型DB(4.1M)+识别模型CRNN(4.5M) +- 多种文本检测训练算法,EAST、DB +- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE + +### 支持的中文模型列表: + +|模型名称|模型简介|检测模型地址|识别模型地址| +|-|-|-|-| +|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| + +超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr + +**也可以按如下教程快速体验超轻量级中文OCR和通用中文OCR模型。** + +## **超轻量级中文OCR以及通用中文OCR体验** + +![](doc/imgs_results/11.jpg) + +上图是超轻量级中文OCR模型效果展示,更多效果图请见文末[超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)和[通用中文OCR效果展示](#通用中文OCR效果展示)。 + +#### 1.环境配置 + +请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。 + +#### 2.inference模型下载 + +*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下* + + +#### (1)超轻量级中文OCR模型下载 +``` +mkdir inference && cd inference +# 下载超轻量级中文OCR模型的检测模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar +# 下载超轻量级中文OCR模型的识别模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar +cd .. +``` +#### (2)通用中文OCR模型下载 +``` +mkdir inference && cd inference +# 下载通用中文OCR模型的检测模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar +# 下载通用中文OCR模型的识别模型并解压 +wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar +cd .. +``` + +#### 3.单张图像或者图像集合预测 + +以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。 + +``` +# 设置PYTHONPATH环境变量 +export PYTHONPATH=. + +# windows下设置环境变量 +SET PYTHONPATH=. + +# 预测image_dir指定的单张图像 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" + +# 预测image_dir指定的图像集合 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" + +# 如果想使用CPU进行预测,需设置use_gpu参数为False +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False +``` + +通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: +``` +# 预测image_dir指定的单张图像 +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" +``` + +更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/inference.md)。 + +## 文档教程 +- [快速安装](./doc/installation.md) +- [文本检测模型训练/评估/预测](./doc/detection.md) +- [文本识别模型训练/评估/预测](./doc/recognition.md) +- [基于预测引擎推理](./doc/inference.md) +- [数据集](./doc/datasets.md) + +## 文本检测算法 + +PaddleOCR开源的文本检测算法列表: +- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) +- [x] DB([paper](https://arxiv.org/abs/1911.08947)) +- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, comming soon) + +在ICDAR2015文本检测公开数据集上,算法效果如下: + +|模型|骨干网络|precision|recall|Hmean|下载链接| +|-|-|-|-|-|-| +|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)| +|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)| +|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)| +|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)| + +使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下: +|模型|骨干网络|配置文件|预训练模型| +|-|-|-|-| +|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| +|通用中文OCR模型|ResNet50_vd|det_r50_vd_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| + +* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 + +PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/detection.md)。 + +## 文本识别算法 + +PaddleOCR开源的文本识别算法列表: +- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) +- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) +- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) +- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) +- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, comming soon) + +参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下: + +|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接| +|-|-|-|-|-| +|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)| +|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)| +|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)| +|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)| +|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)| +|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)| +|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| +|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| + +使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下: +|模型|骨干网络|配置文件|预训练模型| +|-|-|-|-| +|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| + +PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/recognition.md)。 + +## 端到端OCR算法 +- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) + + +## 超轻量级中文OCR效果展示 +![](doc/imgs_results/1.jpg) +![](doc/imgs_results/7.jpg) +![](doc/imgs_results/12.jpg) +![](doc/imgs_results/4.jpg) +![](doc/imgs_results/6.jpg) +![](doc/imgs_results/9.jpg) +![](doc/imgs_results/16.png) +![](doc/imgs_results/22.jpg) + + +## 通用中文OCR效果展示 +![](doc/imgs_results/chinese_db_crnn_server/11.jpg) +![](doc/imgs_results/chinese_db_crnn_server/2.jpg) +![](doc/imgs_results/chinese_db_crnn_server/8.jpg) + +## FAQ +1. **预测报错:got an unexpected keyword argument 'gradient_clip'** +安装的paddle版本不对,目前本项目仅支持paddle1.7,近期会适配到1.8。 + +2. **转换attention识别模型时报错:KeyError: 'predict'** +基于Attention损失的识别模型推理还在调试中。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。 + +3. **关于推理速度** +图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 + +4. **服务部署与移动端部署** +预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 + +5. **自研算法发布时间** +自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。 + +[more](./doc/FAQ.md) + +## 欢迎加入PaddleOCR技术交流群 +加微信:paddlehelp,备注OCR,小助手拉你进群~ + +## 参考文献 +``` +1. EAST: +@inproceedings{zhou2017east, + title={EAST: an efficient and accurate scene text detector}, + author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, + booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition}, + pages={5551--5560}, + year={2017} +} + +2. DB: +@article{liao2019real, + title={Real-time Scene Text Detection with Differentiable Binarization}, + author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, + journal={arXiv preprint arXiv:1911.08947}, + year={2019} +} + +3. DTRB: +@inproceedings{baek2019wrong, + title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, + author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={4715--4723}, + year={2019} +} + +4. SAST: +@inproceedings{wang2019single, + title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, + author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, + booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, + pages={1277--1285}, + year={2019} +} + +5. SRN: +@article{yu2020towards, + title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, + author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, + journal={arXiv preprint arXiv:2003.12294}, + year={2020} +} + +6. end2end-psl: +@inproceedings{sun2019chinese, + title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, + author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={9086--9095}, + year={2019} +} +``` + +## 许可证书 +本项目的发布受Apache 2.0 license许可认证。 + +## 如何贡献代码 +我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。 diff --git a/doc/FAQ_en.md b/doc/FAQ_en.md new file mode 100644 index 00000000..3a70a40d --- /dev/null +++ b/doc/FAQ_en.md @@ -0,0 +1,45 @@ +## FAQ + +1. **Prediction error: got an unexpected keyword argument 'gradient_clip'** +The installed version of paddle is incorrect. Currently, this project only supports paddle1.7, which will be adapted to 1.8 in the near future. + +2. **Error when converting attention recognition model: KeyError: 'predict'** +The inference of recognition model based on attention loss is still in debugging. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as that based on CTC loss. + +3. **About inference speed** +When there are many words in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch num. The default value is 30, which can be changed to 10 or other values. + +4. **Service deployment and mobile deployment** +It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates. + +5. **Release time of self-developed algorithm** +Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. + +6. **How to run on Windows or Mac?** +PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation: + 1. In [Quick installation](installation.md), if you do not want to install docker, you can skip the first step and start with the second step. + 2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory. + +7. **The difference between ultra-lightweight model and General OCR model** +At present, PaddleOCR has opensourced two Chinese models, namely 8.6M ultra-lightweight Chinese model and general Chinese OCR model. The comparison information between the two is as follows: + - Similarities: Both use the same **algorithm** and **training data**; + - Differences: The difference lies in **backbone network** and **channel parameters**, the ultra-lightweight model uses MobileNetV3 as the backbone network, the general model uses Resnet50_vd as the detection model backbone, and Resnet34_vd as the recognition model backbone. You can compare the two model training configuration files to see the differences in parameters. + +|Model|Backbone|Detection configuration file|Recognition configuration file| +|-|-|-|-| +|8.6M ultra-lightweight Chinese OCR model|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| +|General Chinese OCR model|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml| + +8. **Is there a plan to opensource a model that only recognizes numbers or only English + numbers?** +It is not planned to opensource numbers only, numbers + English only, or other vertical text models. Paddleocr has opensourced a variety of detection and recognition algorithms for customized training. The two Chinese models are also based on the training output of the open-source algorithm library. You can prepare the data according to the tutorial, choose the appropriate configuration file, train yourselves, and we believe that you can get good result. If you have any questions during the training, you are welcome to open issues or ask in the communication group. We will answer them in time. + +9. **What is the training data used by the open-source model? Can it be opensourced?** +At present, the open source model, dataset and magnitude are as follows: + - Detection: + English dataset: ICDAR2015 + Chinese dataset: LSVT street view dataset with 3w pictures + - Recognition: + English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions. + Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. + + Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](datasets.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. \ No newline at end of file diff --git a/doc/config_en.md b/doc/config_en.md new file mode 100644 index 00000000..c9e45035 --- /dev/null +++ b/doc/config_en.md @@ -0,0 +1,49 @@ +# Optional parameters list + +The following list can be viewed via `--help` + +| FLAG | Supported script | Use | Defaults | Note | +| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | +| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** | +| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | + + +## Introduction to Global Parameters of Configuration File + +Take `rec_chinese_lite_train.yml` as an example + + +| Parameter | Use | Default | Note | +| :----------------------: | :---------------------: | :--------------: | :--------------------: | +| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README_en.md) | +| use_gpu | Set using GPU or not | true | \ | +| epoch_num | Maximum training epoch number | 3000 | \ | +| log_smooth_window | Sliding window size | 20 | \ | +| print_batch_step | Set print log interval | 10 | \ | +| save_model_dir | Set model save path | output/{model_name} | \ | +| save_epoch_step | Set model save interval | 3 | \ | +| eval_batch_step | Set the model evaluation interval | 2000 | \ | +|train_batch_size_per_card | Set the batch size during training | 256 | \ | +| test_batch_size_per_card | Set the batch size during testing | 256 | \ | +| image_shape | Set input image size | [3, 32, 100] | \ | +| max_text_length | Set the maximum text length | 25 | \ | +| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch| +| character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ | +| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | +| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | +| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | +| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | +| save_inference_dir | path to save model for inference | None | Use to save inference model | + +## Introduction to Reader parameters of Configuration file + +Take `rec_chinese_reader.yml` as an example: + +| Parameter | Use | Default | Note | +| :----------------------: | :---------------------: | :--------------: | :--------------------: | +| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader | +| num_workers | Set the number of data reading threads | 8 | \ | +| img_set_dir | Image folder path | ./train_data | \ | +| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | +| infer_img | Result folder path | ./infer_img | \| + diff --git a/doc/customize_en.md b/doc/customize_en.md new file mode 100644 index 00000000..99665329 --- /dev/null +++ b/doc/customize_en.md @@ -0,0 +1,30 @@ +# How to make your own ultra-lightweight OCR models? + +The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps. + +## step1: Train text detection model + +PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml +``` +For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md) + +## step2: Train text recognition model + +PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: +``` +python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml +``` +For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md) + +## step3: Concatenate predictions + +PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results. + +When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path of detection model, and the parameter `rec_model_dir` specifies the path of recogniton model. The visualized results are saved to the `./inference_results` folder by default. + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" +``` +For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference.md) diff --git a/doc/datasets_en.md b/doc/datasets_en.md new file mode 100644 index 00000000..b0514ccc --- /dev/null +++ b/doc/datasets_en.md @@ -0,0 +1,59 @@ +## Dataset +This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~ +- [ICDAR2019-LSVT](#ICDAR2019-LSVT) +- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17) +- [Chinese Street View Text Recognition](#中文街景文字识别) +- [Chinese Document Text Recognition](#中文文档文字识别) +- [ICDAR2019-ArT](#ICDAR2019-ArT) + +In addition to opensource data, users can also use synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. + + +#### 1. ICDAR2019-LSVT +- **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt +- **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: + ![](datasets/LSVT_1.jpg) + (a) Fully labeled data + ![](datasets/LSVT_2.jpg) + (b) Weakly labeled data +- **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt + + +#### 2. ICDAR2017-RCTW-17 +- **Data sources**:https://rctw.vlrlab.net/ +- **Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications. + ![](datasets/rctw.jpg) +- **Download link**:https://rctw.vlrlab.net/dataset/ + + +#### 3. Chinese Street View Text Recognition +- **Data sources**:https://aistudio.baidu.com/aistudio/competition/detail/8 +- **Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure: + + ![](datasets/ch_street_rec_1.png) + (a) Label: 魅派集成吊顶 + ![](datasets/ch_street_rec_2.png) + (b) Label: 母婴用品连锁 +- **Download link** +https://aistudio.baidu.com/aistudio/datasetdetail/8429 + + +#### 4. Chinese Document Text Recognition +- **Data sources**:https://github.com/YCG09/chinese_ocr +- **Introduction**: + - A total of 3.64 million pictures are divided into training set and validation set according to 99:1. + - Using Chinese corpus (news + classical Chinese), the data is randomly generated through changes in font, size, grayscale, blur, perspective, stretching, etc. + - 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ) + - Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus + - Image resolution is 280x32 + ![](datasets/ch_doc1.jpg) + ![](datasets/ch_doc2.jpg) + ![](datasets/ch_doc3.jpg) +- **Download link**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m) + + +#### 5、ICDAR2019-ArT +- **Data source**:https://ai.baidu.com/broad/introduction?dataset=art +- **Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved. + ![](datasets/ArT.jpg) +- **Download link**:https://ai.baidu.com/broad/download?dataset=art \ No newline at end of file diff --git a/doc/detection_en.md b/doc/detection_en.md new file mode 100644 index 00000000..5acba219 --- /dev/null +++ b/doc/detection_en.md @@ -0,0 +1,96 @@ +# Text detection + +This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. + +## Data preparation +The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. + +Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: +``` +# Under the PaddleOCR path +cd PaddleOCR/ +wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt +wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt +``` + +After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are: +``` +/PaddleOCR/train_data/icdar2015/text_localization/ + └─ icdar_c4_train_imgs/ Training data of icdar dataset + └─ ch4_test_images/ Testing data of icdar dataset + └─ train_icdar2015_label.txt Training annotation of icdar dataset + └─ test_icdar2015_label.txt Test annotation of icdar dataset +``` + +The provided annotation file format is as follow: +``` +" Image file name Image annotation information encoded by json.dumps" +ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] +``` +The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. + +`transcription` represents the text of the current text box, and this information is not needed in the text detection task. +If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. + + +## Quickstart training + +First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. +``` +cd PaddleOCR/ +# Download the pre-trained model of MobileNetV3 +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar +# Download the pre-trained model of ResNet50 +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar +``` + +**Start training** +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml +``` + +In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file. +For a detailed explanation of the configuration file, please refer to [link](./doc/config-en.md). + +You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +``` + +## Evaluation Indicator + +PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. + +Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` + +When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5. If you use different datasets, different models for training, these two parameters should be adjusted for better result. + +``` +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` +The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file. + +Such as: +``` +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` + +* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. + +## Test detection result + +Test the detection result on a single image: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" +``` + +When testing the DB model, adjust the post-processing threshold: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` + + +Test the detection result on all images in the folder: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" +``` diff --git a/doc/inference_en.md b/doc/inference_en.md new file mode 100644 index 00000000..521654db --- /dev/null +++ b/doc/inference_en.md @@ -0,0 +1,209 @@ + +# Prediction from inference model + +The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. + +The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. + +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). + +Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. + +## Training model to inference model +### Detection model to inference model + +Download the ultra-lightweight Chinese detection model: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ +``` +The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command: +``` +python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ +``` +When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file. +`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved. +After the conversion is successful, there are two files in the `save_inference_dir` directory: +``` +inference/det_db/ + └─ model Check the program file of inference model + └─ params Check the parameter file of the inference model +``` + +### Recognition model to inference model + +Download the ultra-lightweight Chinese recognition model: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ +``` + +The recognition model is converted to the inference model in the same way as the detection, as follows: +``` +python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ + Global.save_inference_dir=./inference/rec_crnn/ +``` + +If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path. + +After the conversion is successful, there are two files in the directory: +``` +/inference/rec_crnn/ + └─ model Identify the saved model files + └─ params Identify the parameter files of the inference model +``` + +## Text detection model inference + +The following will introduce the ultra-lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. + +### 1.Ultra-lightweight Chinese detection model inference + +For ultra-lightweight Chinese detection model inference, you can execute the following commands: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" +``` + +The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_2.jpg) + +By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200 +``` + +If you want to use the CPU for prediction, execute the command as follows +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +``` + +### 2.DB text detection model inference + +First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db" +``` + +DB text detection model inference, you can execute the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" +``` + +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_img_10_db.jpg) + +**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. + +### 3.EAST text detection model inference + +First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" +``` + +For EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type to EAST, run the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" +``` +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_img_10_east.jpg) + +**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. + + +## Text recognition model inference + +The following will introduce the ultra-lightweight Chinese recognition model inference and CTC loss-based recognition model inference. **The recognition model inference based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. + + +### 1. Ultra-lightweight Chinese recognition model inference + +For ultra-lightweight Chinese recognition model inference, you can execute the following commands: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" +``` + +![](imgs_words/ch/word_4.jpg) + +After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. + +Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] + + +### 2. Recognition model inference based on CTC loss + +Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`. + +First, convert the model saved in the STAR-Net text recognition training process into an inference model. Taking the model based on Resnet34_vd backbone network, using MJSynth and SynthText (two English text recognition synthetic datasets) for training, as an example ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)). It can be converted as follow: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet" +``` + +For STAR-Net text recognition model inference, execute the following commands: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" +``` +![](imgs_words_en/word_336.png) + +After executing the command, the recognition result of the above image is as follows: + +Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] + +**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects: + +- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`. + +- Character list: the experiment in the DTRB paper is only for 26 lowercase English characters and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no characters dictionary file is used here, but a dictionary is generated by the below command. Therefore, the parameter `rec_char_type` needs to be set during inference, which is specified as "en" in English. + +``` +self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" +dict_character = list(self.character_str) +``` + +## Text detection and recognition inference concatenation + +### 1. Ultra-lightweight Chinese OCR model inference + +When performing prediction, you need to specify the path of a single image or a collection of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" +``` + +After executing the command, the recognition result image is as follows: + +![](imgs_results/2.jpg) + +### 2. Other model inference + +If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition: + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" +``` + +After executing the command, the recognition result image is as follows: + +![](imgs_results/img_10.jpg) diff --git a/doc/installation_en.md b/doc/installation_en.md new file mode 100644 index 00000000..05471c0c --- /dev/null +++ b/doc/installation_en.md @@ -0,0 +1,79 @@ +## Quick installation + +After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. + +PaddleOCR working environment: +- PaddlePaddle1.7 +- python3 +- glibc 2.23 + +It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/). + +1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient. +``` +# Switch to the working directory +cd /home/Projects +# You need to create a docker container for the first run, and do not need to run the current command when you run it again +# Create a docker container named ppocr and map the current directory to the /paddle directory of the container + +#If you want to use docker in a CPU environment, use docker instead of nvidia-docker to create docker +sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash +``` +If you have cuda9 installed on your machine, please run the following command to create a container: +``` +sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash +``` +If you have cuda10 installed on your machine, please run the following command to create a container: +``` +sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash +``` +You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine. +``` +# ctrl+P+Q to exit docker, to re-enter docker using the following command: +sudo docker container exec -it ppocr /bin/bash +``` + +Note: If the docker pull is too slow, you can download and load the docker image manually according to the following steps. Take cuda9 docker for example, you only need to change cuda9 to cuda10 to use cuda10 docker: +``` +# Download the CUDA9 docker compressed file and unzip it +wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz +# To reduce download time, the uploaded docker image is compressed and needs to be decompressed +tar zxf docker_pdocr_cuda9.tar.gz +# Create image +docker load < docker_pdocr_cuda9.tar +# After completing the above steps, check whether the downloaded image is loaded through docker images +docker images +# If you have the following output after executing docker images, you can follow step 1 to create a docker environment. +hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 +``` + +2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress) +``` +pip3 install --upgrade pip + +# If you have cuda9 installed on your machine, please run the following command to install +python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple + +# If you have cuda10 installed on your machine, please run the following command to install +python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple +``` +For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + + +3. Clone PaddleOCR repo code +``` +# Recommend +git clone https://github.com/PaddlePaddle/PaddleOCR + +# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: + +git clone https://gitee.com/paddlepaddle/PaddleOCR + +# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. +``` + +4. Install third-party libraries +``` +cd PaddleOCR +pip3 install -r requirments.txt +``` diff --git a/doc/recognition_en.md b/doc/recognition_en.md new file mode 100644 index 00000000..a73aeec5 --- /dev/null +++ b/doc/recognition_en.md @@ -0,0 +1,221 @@ +## Text recognition + +### Data preparation + + +PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data: + +Please organize the dataset as follows: + +The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory: + +``` +ln -sf /train_data/dataset +``` + + +* Dataset download + +If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark + +* Use your own dataset: + +If you want to use your own data for training, please refer to the following to organize your data. + +- Training set + +First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label. + +* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error + +``` +" Image file name Image annotation " + +train_data/train_0001.jpg 简单可依赖 +train_data/train_0002.jpg 用科技让复杂的世界更简单 +``` +PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways: + +``` +# Training set label +wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt +# Test Set Label +wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt +``` + +The final training set should have the following file structure: + +``` +|-train_data + |-ic15_data + |- rec_gt_train.txt + |- train + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- Test set + +Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows: + +``` +|-train_data + |-ic15_data + |- rec_gt_test.txt + |- test + |- word_001.jpg + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- Dictionary + +Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. + +Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format: + +``` +l +d +a +d +r +n +``` + +In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1] + +`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. + +`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters. + +You can use them if needed. + +To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. + +### Start training + +PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: + +First download the pretrain model, you can download the trained model to finetune on the icdar2015 data: + +``` +cd PaddleOCR/ +# Download the pre-trained model of MobileNetV3 +wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar +# Decompress model parameters +cd pretrain_models +tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar +``` + +Start training: + +``` +# Set PYTHONPATH path +export PYTHONPATH=$PYTHONPATH:. +# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES +export CUDA_VISIBLE_DEVICES=0,1,2,3 +# Training icdar15 English data +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml +``` + +PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. + +If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training. + +* Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are: + + +| Configuration file | Algorithm | backbone | trans | seq | pred | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | +| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | +| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | +| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | +| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | +| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | +| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | +| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | +| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | +| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | + +For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: +co +Take `rec_mv3_none_none_ctc.yml` as an example: +``` +Global: + ... + # Modify image_shape to fit long text + image_shape: [3, 32, 320] + ... + # Modify character type + character_type: ch + # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary + character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt + ... + # Modify reader type + reader_yml: ./configs/rec/rec_chinese_reader.yml + ... + +... +``` +**Note that the configuration file for prediction/evaluation must be consistent with the training.** + + + +### Evaluation + +The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. + +``` +export CUDA_VISIBLE_DEVICES=0 +# GPU evaluation, Global.checkpoints is the weight to be tested +python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy +``` + +### Prediction + +* Training engine prediction + +Using the model trained by paddleocr, you can quickly get prediction through the following script. + +The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`: + +``` +# Predict English results +python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg +``` + +Input image: + +![](./imgs_words/en/word_1.png) + +Get the prediction result of the input image: + +``` +infer_img: doc/imgs_words/en/word_1.png + index: [19 24 18 23 29] + word : joint +``` + +The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, you can use the following command to predict the Chinese model: + +``` +# Predict Chinese results +python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg +``` + +Input image: + +![](./imgs_words/ch/word_1.jpg) + +Get the prediction result of the input image: + +``` +infer_img: doc/imgs_words/ch/word_1.jpg + index: [2092 177 312 2503] + word : 韩国小馆 +``` diff --git a/doc/update_en.md b/doc/update_en.md new file mode 100644 index 00000000..e5b908f3 --- /dev/null +++ b/doc/update_en.md @@ -0,0 +1,10 @@ +# Recent updates + +- 2020.6.5 Support exporting `attention` model to `inference_model` +- 2020.6.5 Support separate prediction and recognition, output result score +- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience +- 2020.5.30 Model prediction and training support on Windows system +- 2020.5.30 Open source general Chinese OCR model +- 2020.5.14 Release [PaddleOCR Open Class](https://www.bilibili.com/video/BV1nf4y1U7RX?p=4) +- 2020.5.14 Release [PaddleOCR Practice Notebook](https://aistudio.baidu.com/aistudio/projectdetail/467229) +- 2020.5.14 Open source 8.6M ultra-lightweight Chinese OCR model