Distinguish between English and Chinese documents

This commit is contained in:
LDOUBLEV 2020-06-09 20:03:49 +08:00
parent 7995a93e7a
commit ec257d2cbc
20 changed files with 81 additions and 79 deletions

View File

@ -1,13 +1,15 @@
[English](README_en.md) | 简体中文
## 简介 ## 简介
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力使用者训练出更好的模型并应用落地。 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力使用者训练出更好的模型并应用落地。
**近期更新** **近期更新**
- 2020.6.8 添加[数据集](./doc/datasets.md),并保持持续更新 - 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验 - 2020.5.30 提供超轻量级中文OCR在线体验
- 2020.5.30 模型预测、训练支持Windows系统 - 2020.5.30 模型预测、训练支持Windows系统
- [more](./doc/update.md) - [more](./doc/doc_ch/update.md)
## 特性 ## 特性
- 超轻量级中文OCR总模型仅8.6M - 超轻量级中文OCR总模型仅8.6M
@ -35,7 +37,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
#### 1.环境配置 #### 1.环境配置
请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。 请先参考[快速安装](./doc/doc_ch/installation.md)配置PaddleOCR运行环境。
#### 2.inference模型下载 #### 2.inference模型下载
@ -88,14 +90,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
``` ```
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/inference.md)。 更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/doc_ch/inference.md)。
## 文档教程 ## 文档教程
- [快速安装](./doc/installation.md) - [快速安装](./doc/doc_ch/installation.md)
- [文本检测模型训练/评估/预测](./doc/detection.md) - [文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)
- [文本识别模型训练/评估/预测](./doc/recognition.md) - [文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)
- [基于预测引擎推理](./doc/inference.md) - [基于预测引擎推理](./doc/doc_ch/inference.md)
- [数据集](./doc/datasets.md) - [数据集](./doc/doc_ch/datasets.md)
## 文本检测算法 ## 文本检测算法
@ -121,7 +123,7 @@ PaddleOCR开源的文本检测算法列表
* 注: 上述DB模型的训练和评估需设置后处理参数box_thresh=0.6unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 * 注: 上述DB模型的训练和评估需设置后处理参数box_thresh=0.6unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/detection.md)。 PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)。
## 文本识别算法 ## 文本识别算法
@ -151,7 +153,7 @@ PaddleOCR开源的文本识别算法列表
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| |通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/recognition.md)。 PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)。
## 端到端OCR算法 ## 端到端OCR算法
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon)
@ -189,7 +191,7 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识
5. **自研算法发布时间** 5. **自研算法发布时间**
自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布敬请期待。 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布敬请期待。
[more](./doc/FAQ.md) [more](./doc/doc_ch/FAQ.md)
## 欢迎加入PaddleOCR技术交流群 ## 欢迎加入PaddleOCR技术交流群
加微信paddlehelp备注OCR小助手拉你进群 加微信paddlehelp备注OCR小助手拉你进群

View File

@ -1,13 +1,15 @@
English | [简体中文](README.md)
## Introduction ## Introduction
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates** **Recent updates**
- 2020.6.8 Add [dataset](./doc/datasets_en.md) and keep updating - 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score - 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience - 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training supported on Windows system - 2020.5.30 Model prediction and training supported on Windows system
- [more](./doc/update_en.md) - [more](./doc/doc_en/update_en.md)
## Features ## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M - Ultra-lightweight Chinese OCR model, total model size is only 8.6M
@ -36,7 +38,7 @@ The picture above is the result of our Ultra-lightweight Chinese OCR model. For
#### 1. Environment configuration #### 1. Environment configuration
Please see [Quick installation](./doc/installation_en.md) Please see [Quick installation](./doc/doc_en/installation_en.md)
#### 2. Download inference models #### 2. Download inference models
@ -88,14 +90,14 @@ To run inference of the Generic Chinese OCR model, follow these steps above to d
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
``` ```
For more text detection and recognition models, please refer to the document [Inference](./doc/inference_en.md) For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)
## Documentation ## Documentation
- [Quick installation](./doc/installation_en.md) - [Quick installation](./doc/doc_en/installation_en.md)
- [Text detection model training/evaluation/prediction](./doc/detection_en.md) - [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/recognition_en.md) - [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/inference_en.md) - [Inference](./doc/doc_en/inference_en.md)
- [Dataset](./doc/datasets_en.md) - [Dataset](./doc/doc_en/datasets_en.md)
## Text detection algorithm ## Text detection algorithm
@ -121,7 +123,7 @@ For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/dat
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md) For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection.md)
## Text recognition algorithm ## Text recognition algorithm
@ -194,10 +196,10 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
[more](./doc/FAQ_en.md) [more](./doc/doc_en/FAQ_en.md)
## Welcome to the PaddleOCR technical exchange group ## Welcome to the PaddleOCR technical exchange group
Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~ WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~
## References ## References

View File

@ -16,7 +16,7 @@
自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布敬请期待。 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布敬请期待。
6. **如何在Windows或Mac系统上运行** 6. **如何在Windows或Mac系统上运行**
PaddleOCR已完成Windows和Mac系统适配运行时注意两点1、在[快速安装](installation.md)时如果不想安装docker可跳过第一步直接从第二步安装paddle开始。2、inference模型下载时如果没有安装wget可直接点击模型链接或将链接地址复制到浏览器进行下载并解压放置到相应目录。 PaddleOCR已完成Windows和Mac系统适配运行时注意两点1、在[快速安装](./installation.md)时如果不想安装docker可跳过第一步直接从第二步安装paddle开始。2、inference模型下载时如果没有安装wget可直接点击模型链接或将链接地址复制到浏览器进行下载并解压放置到相应目录。
7. **超轻量模型和通用OCR模型的区别** 7. **超轻量模型和通用OCR模型的区别**
目前PaddleOCR开源了2个中文模型分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下 目前PaddleOCR开源了2个中文模型分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下
@ -40,4 +40,4 @@ PaddleOCR已完成Windows和Mac系统适配运行时注意两点1、在[
英文数据集MJSynth和SynthText合成数据数据量上千万。 英文数据集MJSynth和SynthText合成数据数据量上千万。
中文数据集LSVT街景数据集根据真值将图crop出来并进行位置校准总共30w张图像。此外基于LSVT的语料合成数据500w。 中文数据集LSVT街景数据集根据真值将图crop出来并进行位置校准总共30w张图像。此外基于LSVT的语料合成数据500w。
其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。

View File

@ -46,4 +46,3 @@
| img_set_dir | 数据集路径 | ./train_data | \ | | img_set_dir | 数据集路径 | ./train_data | \ |
| label_file_path | 数据标签路径 | ./train_data/rec_gt_train.txt| \ | | label_file_path | 数据标签路径 | ./train_data/rec_gt_train.txt| \ |
| infer_img | 预测图像文件夹路径 | ./infer_img | \| | infer_img | 预测图像文件夹路径 | ./infer_img | \|

View File

@ -12,9 +12,9 @@
#### 1、ICDAR2019-LSVT #### 1、ICDAR2019-LSVT
- **数据来源**https://ai.baidu.com/broad/introduction?dataset=lsvt - **数据来源**https://ai.baidu.com/broad/introduction?dataset=lsvt
- **数据简介** 共45w中文街景图像包含5w2w测试+3w训练全标注数据文本坐标+文本内容40w弱标注数据仅文本内容如下图所示 - **数据简介** 共45w中文街景图像包含5w2w测试+3w训练全标注数据文本坐标+文本内容40w弱标注数据仅文本内容如下图所示
![](datasets/LSVT_1.jpg) ![](../datasets/LSVT_1.jpg)
(a) 全标注数据 (a) 全标注数据
![](datasets/LSVT_2.jpg) ![](../datasets/LSVT_2.jpg)
(b) 弱标注数据 (b) 弱标注数据
- **下载地址**https://ai.baidu.com/broad/download?dataset=lsvt - **下载地址**https://ai.baidu.com/broad/download?dataset=lsvt
@ -22,16 +22,16 @@
#### 2、ICDAR2017-RCTW-17 #### 2、ICDAR2017-RCTW-17
- **数据来源**https://rctw.vlrlab.net/ - **数据来源**https://rctw.vlrlab.net/
- **数据简介**共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。 - **数据简介**共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。
![](datasets/rctw.jpg) ![](../datasets/rctw.jpg)
- **下载地址**https://rctw.vlrlab.net/dataset/ - **下载地址**https://rctw.vlrlab.net/dataset/
<a name="中文街景文字识别"></a> <a name="中文街景文字识别"></a>
#### 3、中文街景文字识别 #### 3、中文街景文字识别
- **数据来源**https://aistudio.baidu.com/aistudio/competition/detail/8 - **数据来源**https://aistudio.baidu.com/aistudio/competition/detail/8
- **数据简介**共包括29万张图片其中21万张图片作为训练集带标注8万张作为测试集无标注。数据集采自中国街景并由街景图片中的文字行区域例如店铺标牌、地标等等截取出来而形成。所有图像都经过一些预处理将文字区域利用仿射变化等比映射为一张高为48像素的图片如图所示 - **数据简介**共包括29万张图片其中21万张图片作为训练集带标注8万张作为测试集无标注。数据集采自中国街景并由街景图片中的文字行区域例如店铺标牌、地标等等截取出来而形成。所有图像都经过一些预处理将文字区域利用仿射变化等比映射为一张高为48像素的图片如图所示
![](datasets/ch_street_rec_1.png) ![](../datasets/ch_street_rec_1.png)
(a) 标注:魅派集成吊顶 (a) 标注:魅派集成吊顶
![](datasets/ch_street_rec_2.png) ![](../datasets/ch_street_rec_2.png)
(b) 标注:母婴用品连锁 (b) 标注:母婴用品连锁
- **下载地址** - **下载地址**
https://aistudio.baidu.com/aistudio/datasetdetail/8429 https://aistudio.baidu.com/aistudio/datasetdetail/8429
@ -45,14 +45,14 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429
- 包含汉字、英文字母、数字和标点共5990个字符字符集合https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt - 包含汉字、英文字母、数字和标点共5990个字符字符集合https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt
- 每个样本固定10个字符字符随机截取自语料库中的句子 - 每个样本固定10个字符字符随机截取自语料库中的句子
- 图片分辨率统一为280x32 - 图片分辨率统一为280x32
![](datasets/ch_doc1.jpg) ![](../datasets/ch_doc1.jpg)
![](datasets/ch_doc2.jpg) ![](../datasets/ch_doc2.jpg)
![](datasets/ch_doc3.jpg) ![](../datasets/ch_doc3.jpg)
- **下载地址**https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码lu7m) - **下载地址**https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码lu7m)
<a name="ICDAR2019-ArT"></a> <a name="ICDAR2019-ArT"></a>
#### 5、ICDAR2019-ArT #### 5、ICDAR2019-ArT
- **数据来源**https://ai.baidu.com/broad/introduction?dataset=art - **数据来源**https://ai.baidu.com/broad/introduction?dataset=art
- **数据简介**共包含10,166张图像训练集5603图测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text三部分组成包含水平、多方向和弯曲等多种形状的文本。 - **数据简介**共包含10,166张图像训练集5603图测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text三部分组成包含水平、多方向和弯曲等多种形状的文本。
![](datasets/ArT.jpg) ![](../datasets/ArT.jpg)
- **下载地址**https://ai.baidu.com/broad/download?dataset=art - **下载地址**https://ai.baidu.com/broad/download?dataset=art

View File

@ -97,7 +97,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: 可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](imgs_results/det_res_img_10_db.jpg) ![](../imgs_results/det_res_img_10_db.jpg)
**注意**由于ICDAR2015数据集只有1000张训练图像主要针对英文场景所以上述模型对中文文本图像检测效果非常差。 **注意**由于ICDAR2015数据集只有1000张训练图像主要针对英文场景所以上述模型对中文文本图像检测效果非常差。
@ -120,7 +120,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
``` ```
可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: 可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](imgs_results/det_res_img_10_east.jpg) ![](../imgs_results/det_res_img_10_east.jpg)
**注意**本代码库中EAST后处理中NMS采用的Python版本所以预测速度比较耗时。如果采用C++版本,会有明显加速。 **注意**本代码库中EAST后处理中NMS采用的Python版本所以预测速度比较耗时。如果采用C++版本,会有明显加速。
@ -138,7 +138,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/"
``` ```
![](imgs_words/ch/word_4.jpg) ![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: 执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
@ -175,7 +175,7 @@ RARE 文本识别模型推理,可以执行如下命令:
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
``` ```
![](imgs_words_en/word_336.png) ![](../imgs_words_en/word_336.png)
执行命令后,上面图像的识别结果如下: 执行命令后,上面图像的识别结果如下:
@ -204,7 +204,7 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model
执行命令后,识别结果图像如下: 执行命令后,识别结果图像如下:
![](imgs_results/2.jpg) ![](../imgs_results/2.jpg)
### 2.其他模型推理 ### 2.其他模型推理
@ -216,4 +216,4 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --d
执行命令后,识别结果图像如下: 执行命令后,识别结果图像如下:
![](imgs_results/img_10.jpg) ![](../imgs_results/img_10.jpg)

View File

@ -194,7 +194,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkp
预测图片: 预测图片:
![](./imgs_words/en/word_1.png) ![](../imgs_words/en/word_1.png)
得到输入图像的预测结果: 得到输入图像的预测结果:
@ -214,7 +214,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c
预测图片: 预测图片:
![](./imgs_words/ch/word_1.jpg) ![](../imgs_words/ch/word_1.jpg)
得到输入图像的预测结果: 得到输入图像的预测结果:

View File

@ -17,7 +17,7 @@ Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be releas
6. **How to run on Windows or Mac?** 6. **How to run on Windows or Mac?**
PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation: PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation:
1. In [Quick installation](installation.md), if you do not want to install docker, you can skip the first step and start with the second step. 1. In [Quick installation](./installation_en.md), if you do not want to install docker, you can skip the first step and start with the second step.
2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory. 2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory.
7. **The difference between ultra-lightweight model and General OCR model** 7. **The difference between ultra-lightweight model and General OCR model**
@ -42,4 +42,4 @@ At present, the open source model, dataset and magnitude are as follows:
English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions. English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions.
Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](datasets.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.

View File

@ -46,4 +46,3 @@ Take `rec_chinese_reader.yml` as an example:
| img_set_dir | Image folder path | ./train_data | \ | | img_set_dir | Image folder path | ./train_data | \ |
| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | | label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ |
| infer_img | Result folder path | ./infer_img | \| | infer_img | Result folder path | ./infer_img | \|

View File

@ -8,7 +8,7 @@ PaddleOCR provides two text detection algorithms: EAST and DB. Both support Mobi
``` ```
python3 tools/train.py -c configs/det/det_mv3_db.yml python3 tools/train.py -c configs/det/det_mv3_db.yml
``` ```
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md) For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)
## step2: Train text recognition model ## step2: Train text recognition model
@ -16,7 +16,7 @@ PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, an
``` ```
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
``` ```
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md) For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)
## step3: Concatenate predictions ## step3: Concatenate predictions
@ -27,4 +27,4 @@ When performing prediction, you need to specify the path of a single image or a
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
``` ```
For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference.md) For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference_en.md)

View File

@ -12,9 +12,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize
#### 1. ICDAR2019-LSVT #### 1. ICDAR2019-LSVT
- **Data sources**https://ai.baidu.com/broad/introduction?dataset=lsvt - **Data sources**https://ai.baidu.com/broad/introduction?dataset=lsvt
- **Introduction** A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: - **Introduction** A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
![](datasets/LSVT_1.jpg) ![](../datasets/LSVT_1.jpg)
(a) Fully labeled data (a) Fully labeled data
![](datasets/LSVT_2.jpg) ![](../datasets/LSVT_2.jpg)
(b) Weakly labeled data (b) Weakly labeled data
- **Download link**https://ai.baidu.com/broad/download?dataset=lsvt - **Download link**https://ai.baidu.com/broad/download?dataset=lsvt
@ -22,7 +22,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize
#### 2. ICDAR2017-RCTW-17 #### 2. ICDAR2017-RCTW-17
- **Data sources**https://rctw.vlrlab.net/ - **Data sources**https://rctw.vlrlab.net/
- **Introduction**It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications. - **Introduction**It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
![](datasets/rctw.jpg) ![](../datasets/rctw.jpg)
- **Download link**https://rctw.vlrlab.net/dataset/ - **Download link**https://rctw.vlrlab.net/dataset/
<a name="中文街景文字识别"></a> <a name="中文街景文字识别"></a>
@ -30,9 +30,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize
- **Data sources**https://aistudio.baidu.com/aistudio/competition/detail/8 - **Data sources**https://aistudio.baidu.com/aistudio/competition/detail/8
- **Introduction**A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure: - **Introduction**A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
![](datasets/ch_street_rec_1.png) ![](../datasets/ch_street_rec_1.png)
(a) Label: 魅派集成吊顶 (a) Label: 魅派集成吊顶
![](datasets/ch_street_rec_2.png) ![](../datasets/ch_street_rec_2.png)
(b) Label: 母婴用品连锁 (b) Label: 母婴用品连锁
- **Download link** - **Download link**
https://aistudio.baidu.com/aistudio/datasetdetail/8429 https://aistudio.baidu.com/aistudio/datasetdetail/8429
@ -46,14 +46,14 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429
- 5990 characters including Chinese characters, English letters, numbers and punctuationCharacters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt - 5990 characters including Chinese characters, English letters, numbers and punctuationCharacters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt
- Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus - Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus
- Image resolution is 280x32 - Image resolution is 280x32
![](datasets/ch_doc1.jpg) ![](../datasets/ch_doc1.jpg)
![](datasets/ch_doc2.jpg) ![](../datasets/ch_doc2.jpg)
![](datasets/ch_doc3.jpg) ![](../datasets/ch_doc3.jpg)
- **Download link**https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m) - **Download link**https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m)
<a name="ICDAR2019-ArT"></a> <a name="ICDAR2019-ArT"></a>
#### 5、ICDAR2019-ArT #### 5、ICDAR2019-ArT
- **Data source**https://ai.baidu.com/broad/introduction?dataset=art - **Data source**https://ai.baidu.com/broad/introduction?dataset=art
- **Introduction**It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved. - **Introduction**It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved.
![](datasets/ArT.jpg) ![](../datasets/ArT.jpg)
- **Download link**https://ai.baidu.com/broad/download?dataset=art - **Download link**https://ai.baidu.com/broad/download?dataset=art

View File

@ -50,7 +50,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml
``` ```
In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file. In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file.
For a detailed explanation of the configuration file, please refer to [link](./doc/config-en.md). For a detailed explanation of the configuration file, please refer to [link](./config_en.md).
You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
``` ```

View File

@ -65,7 +65,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_di
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](imgs_results/det_res_2.jpg) ![](../imgs_results/det_res_2.jpg)
By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command: By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:
@ -98,7 +98,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_db.jpg) ![](../imgs_results/det_res_img_10_db.jpg)
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
@ -121,7 +121,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
``` ```
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](imgs_results/det_res_img_10_east.jpg) ![](../imgs_results/det_res_img_10_east.jpg)
**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. **Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup.
@ -139,7 +139,7 @@ For ultra-lightweight Chinese recognition model inference, you can execute the f
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/"
``` ```
![](imgs_words/ch/word_4.jpg) ![](../imgs_words/ch/word_4.jpg)
After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.
@ -165,7 +165,7 @@ For STAR-Net text recognition model inference, execute the following commands:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
``` ```
![](imgs_words_en/word_336.png) ![](../imgs_words_en/word_336.png)
After executing the command, the recognition result of the above image is as follows: After executing the command, the recognition result of the above image is as follows:
@ -194,7 +194,7 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model
After executing the command, the recognition result image is as follows: After executing the command, the recognition result image is as follows:
![](imgs_results/2.jpg) ![](../imgs_results/2.jpg)
### 2. Other model inference ### 2. Other model inference
@ -206,4 +206,4 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --d
After executing the command, the recognition result image is as follows: After executing the command, the recognition result image is as follows:
![](imgs_results/img_10.jpg) ![](../imgs_results/img_10.jpg)

View File

@ -191,7 +191,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c
Input image: Input image:
![](./imgs_words/en/word_1.png) ![](../imgs_words/en/word_1.png)
Get the prediction result of the input image: Get the prediction result of the input image:
@ -210,7 +210,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c
Input image: Input image:
![](./imgs_words/ch/word_1.jpg) ![](../imgs_words/ch/word_1.jpg)
Get the prediction result of the input image: Get the prediction result of the input image: