diff --git a/README.md b/README.md index 55e33c12..037eb6e4 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,15 @@ +[English](README_en.md) | 简体中文 + ## 简介 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。 **近期更新** -- 2020.6.8 添加[数据集](./doc/datasets.md),并保持持续更新 +- 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新 - 2020.6.5 支持 `attetnion` 模型导出 `inference_model` - 2020.6.5 支持单独预测识别时,输出结果得分 - 2020.5.30 提供超轻量级中文OCR在线体验 - 2020.5.30 模型预测、训练支持Windows系统 -- [more](./doc/update.md) +- [more](./doc/doc_ch/update.md) ## 特性 - 超轻量级中文OCR,总模型仅8.6M @@ -35,7 +37,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 #### 1.环境配置 -请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。 +请先参考[快速安装](./doc/doc_ch/installation.md)配置PaddleOCR运行环境。 #### 2.inference模型下载 @@ -88,14 +90,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" ``` -更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/inference.md)。 +更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/doc_ch/inference.md)。 ## 文档教程 -- [快速安装](./doc/installation.md) -- [文本检测模型训练/评估/预测](./doc/detection.md) -- [文本识别模型训练/评估/预测](./doc/recognition.md) -- [基于预测引擎推理](./doc/inference.md) -- [数据集](./doc/datasets.md) +- [快速安装](./doc/doc_ch/installation.md) +- [文本检测模型训练/评估/预测](./doc/doc_ch/detection.md) +- [文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md) +- [基于预测引擎推理](./doc/doc_ch/inference.md) +- [数据集](./doc/doc_ch/datasets.md) ## 文本检测算法 @@ -121,7 +123,7 @@ PaddleOCR开源的文本检测算法列表: * 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化 -PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/detection.md)。 +PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)。 ## 文本识别算法 @@ -151,7 +153,7 @@ PaddleOCR开源的文本识别算法列表: |超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| -PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/recognition.md)。 +PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)。 ## 端到端OCR算法 - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon) @@ -189,7 +191,7 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识 5. **自研算法发布时间** 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。 -[more](./doc/FAQ.md) +[more](./doc/doc_ch/FAQ.md) ## 欢迎加入PaddleOCR技术交流群 加微信:paddlehelp,备注OCR,小助手拉你进群~ diff --git a/README_en.md b/README_en.md index 250f302a..0a4c243d 100644 --- a/README_en.md +++ b/README_en.md @@ -1,13 +1,15 @@ +English | [简体中文](README.md) + ## Introduction PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. **Recent updates** -- 2020.6.8 Add [dataset](./doc/datasets_en.md) and keep updating +- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score - 2020.5.30 Provide ultra-lightweight Chinese OCR online experience - 2020.5.30 Model prediction and training supported on Windows system -- [more](./doc/update_en.md) +- [more](./doc/doc_en/update_en.md) ## Features - Ultra-lightweight Chinese OCR model, total model size is only 8.6M @@ -36,7 +38,7 @@ The picture above is the result of our Ultra-lightweight Chinese OCR model. For #### 1. Environment configuration -Please see [Quick installation](./doc/installation_en.md) +Please see [Quick installation](./doc/doc_en/installation_en.md) #### 2. Download inference models @@ -88,14 +90,14 @@ To run inference of the Generic Chinese OCR model, follow these steps above to d python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" ``` -For more text detection and recognition models, please refer to the document [Inference](./doc/inference_en.md) +For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md) ## Documentation -- [Quick installation](./doc/installation_en.md) -- [Text detection model training/evaluation/prediction](./doc/detection_en.md) -- [Text recognition model training/evaluation/prediction](./doc/recognition_en.md) -- [Inference](./doc/inference_en.md) -- [Dataset](./doc/datasets_en.md) +- [Quick installation](./doc/doc_en/installation_en.md) +- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) +- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) +- [Inference](./doc/doc_en/inference_en.md) +- [Dataset](./doc/doc_en/datasets_en.md) ## Text detection algorithm @@ -121,7 +123,7 @@ For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/dat * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. -For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md) +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection.md) ## Text recognition algorithm @@ -194,10 +196,10 @@ Please refer to the document for training guide and use of PaddleOCR text recogn Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. -[more](./doc/FAQ_en.md) +[more](./doc/doc_en/FAQ_en.md) ## Welcome to the PaddleOCR technical exchange group -Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~ +WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~ ## References diff --git a/doc/FAQ.md b/doc/doc_ch/FAQ.md similarity index 81% rename from doc/FAQ.md rename to doc/doc_ch/FAQ.md index 373e275d..f734e4df 100644 --- a/doc/FAQ.md +++ b/doc/doc_ch/FAQ.md @@ -14,15 +14,15 @@ 5. **自研算法发布时间** 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。 - + 6. **如何在Windows或Mac系统上运行** -PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[快速安装](installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。2、inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 +PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[快速安装](./installation.md)时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。2、inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。 7. **超轻量模型和通用OCR模型的区别** 目前PaddleOCR开源了2个中文模型,分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下: - 相同点:两者使用相同的**算法**和**训练数据**; - 不同点:不同之处在于**骨干网络**和**通道参数**,超轻量模型使用MobileNetV3作为骨干网络,通用模型使用Resnet50_vd作为检测模型backbone,Resnet34_vd作为识别模型backbone,具体参数差异可对比两种模型训练的配置文件. - + |模型|骨干网络|检测训练配置|识别训练配置| |-|-|-|-| |8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| @@ -40,4 +40,4 @@ PaddleOCR已完成Windows和Mac系统适配,运行时注意两点:1、在[ 英文数据集,MJSynth和SynthText合成数据,数据量上千万。 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。 - 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 \ No newline at end of file + 其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 diff --git a/doc/config.md b/doc/doc_ch/config.md similarity index 99% rename from doc/config.md rename to doc/doc_ch/config.md index 94186bda..644232b4 100644 --- a/doc/config.md +++ b/doc/doc_ch/config.md @@ -8,7 +8,7 @@ | -o | ALL | 设置配置文件里的参数内容 | None | 使用-o配置相较于-c选择的配置文件具有更高的优先级。例如:`-o Global.use_gpu=false` | -## 配置文件 Global 参数介绍 +## 配置文件 Global 参数介绍 以 `rec_chinese_lite_train.yml` 为例 @@ -46,4 +46,3 @@ | img_set_dir | 数据集路径 | ./train_data | \ | | label_file_path | 数据标签路径 | ./train_data/rec_gt_train.txt| \ | | infer_img | 预测图像文件夹路径 | ./infer_img | \| - diff --git a/doc/customize.md b/doc/doc_ch/customize.md similarity index 100% rename from doc/customize.md rename to doc/doc_ch/customize.md diff --git a/doc/datasets.md b/doc/doc_ch/datasets.md similarity index 91% rename from doc/datasets.md rename to doc/doc_ch/datasets.md index 1bca82c4..983599da 100644 --- a/doc/datasets.md +++ b/doc/doc_ch/datasets.md @@ -12,9 +12,9 @@ #### 1、ICDAR2019-LSVT - **数据来源**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **数据简介**: 共45w中文街景图像,包含5w(2w测试+3w训练)全标注数据(文本坐标+文本内容),40w弱标注数据(仅文本内容),如下图所示: - ![](datasets/LSVT_1.jpg) + ![](../datasets/LSVT_1.jpg) (a) 全标注数据 - ![](datasets/LSVT_2.jpg) + ![](../datasets/LSVT_2.jpg) (b) 弱标注数据 - **下载地址**:https://ai.baidu.com/broad/download?dataset=lsvt @@ -22,16 +22,16 @@ #### 2、ICDAR2017-RCTW-17 - **数据来源**:https://rctw.vlrlab.net/ - **数据简介**:共包含12,000+图像,大部分图片是通过手机摄像头在野外采集的。有些是截图。这些图片展示了各种各样的场景,包括街景、海报、菜单、室内场景和手机应用程序的截图。 - ![](datasets/rctw.jpg) + ![](../datasets/rctw.jpg) - **下载地址**:https://rctw.vlrlab.net/dataset/ #### 3、中文街景文字识别 - **数据来源**:https://aistudio.baidu.com/aistudio/competition/detail/8 - **数据简介**:共包括29万张图片,其中21万张图片作为训练集(带标注),8万张作为测试集(无标注)。数据集采自中国街景,并由街景图片中的文字行区域(例如店铺标牌、地标等等)截取出来而形成。所有图像都经过一些预处理,将文字区域利用仿射变化,等比映射为一张高为48像素的图片,如图所示: - ![](datasets/ch_street_rec_1.png) + ![](../datasets/ch_street_rec_1.png) (a) 标注:魅派集成吊顶 - ![](datasets/ch_street_rec_2.png) + ![](../datasets/ch_street_rec_2.png) (b) 标注:母婴用品连锁 - **下载地址** https://aistudio.baidu.com/aistudio/datasetdetail/8429 @@ -45,14 +45,14 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 - 包含汉字、英文字母、数字和标点共5990个字符(字符集合:https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ) - 每个样本固定10个字符,字符随机截取自语料库中的句子 - 图片分辨率统一为280x32 - ![](datasets/ch_doc1.jpg) - ![](datasets/ch_doc2.jpg) - ![](datasets/ch_doc3.jpg) + ![](../datasets/ch_doc1.jpg) + ![](../datasets/ch_doc2.jpg) + ![](../datasets/ch_doc3.jpg) - **下载地址**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码:lu7m) #### 5、ICDAR2019-ArT - **数据来源**:https://ai.baidu.com/broad/introduction?dataset=art - **数据简介**:共包含10,166张图像,训练集5603图,测试集4563图。由Total-Text、SCUT-CTW1500、Baidu Curved Scene Text三部分组成,包含水平、多方向和弯曲等多种形状的文本。 - ![](datasets/ArT.jpg) -- **下载地址**:https://ai.baidu.com/broad/download?dataset=art \ No newline at end of file + ![](../datasets/ArT.jpg) +- **下载地址**:https://ai.baidu.com/broad/download?dataset=art diff --git a/doc/detection.md b/doc/doc_ch/detection.md similarity index 100% rename from doc/detection.md rename to doc/doc_ch/detection.md diff --git a/doc/inference.md b/doc/doc_ch/inference.md similarity index 98% rename from doc/inference.md rename to doc/doc_ch/inference.md index be607c88..8e1fa9ff 100644 --- a/doc/inference.md +++ b/doc/doc_ch/inference.md @@ -97,7 +97,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ 可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](imgs_results/det_res_img_10_db.jpg) +![](../imgs_results/det_res_img_10_db.jpg) **注意**:由于ICDAR2015数据集只有1000张训练图像,主要针对英文场景,所以上述模型对中文文本图像检测效果非常差。 @@ -120,7 +120,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ ``` 可视化文本检测结果默认保存到 ./inference_results 文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下: -![](imgs_results/det_res_img_10_east.jpg) +![](../imgs_results/det_res_img_10_east.jpg) **注意**:本代码库中EAST后处理中NMS采用的Python版本,所以预测速度比较耗时。如果采用C++版本,会有明显加速。 @@ -138,7 +138,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" ``` -![](imgs_words/ch/word_4.jpg) +![](../imgs_words/ch/word_4.jpg) 执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: @@ -175,7 +175,7 @@ RARE 文本识别模型推理,可以执行如下命令: python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE" ``` -![](imgs_words_en/word_336.png) +![](../imgs_words_en/word_336.png) 执行命令后,上面图像的识别结果如下: @@ -204,7 +204,7 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model 执行命令后,识别结果图像如下: -![](imgs_results/2.jpg) +![](../imgs_results/2.jpg) ### 2.其他模型推理 @@ -216,4 +216,4 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --d 执行命令后,识别结果图像如下: -![](imgs_results/img_10.jpg) +![](../imgs_results/img_10.jpg) diff --git a/doc/installation.md b/doc/doc_ch/installation.md similarity index 100% rename from doc/installation.md rename to doc/doc_ch/installation.md diff --git a/doc/recognition.md b/doc/doc_ch/recognition.md similarity index 99% rename from doc/recognition.md rename to doc/doc_ch/recognition.md index 45fa74ab..ffb99840 100644 --- a/doc/recognition.md +++ b/doc/doc_ch/recognition.md @@ -194,7 +194,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkp 预测图片: -![](./imgs_words/en/word_1.png) +![](../imgs_words/en/word_1.png) 得到输入图像的预测结果: @@ -214,7 +214,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c 预测图片: -![](./imgs_words/ch/word_1.jpg) +![](../imgs_words/ch/word_1.jpg) 得到输入图像的预测结果: diff --git a/doc/update.md b/doc/doc_ch/update.md similarity index 100% rename from doc/update.md rename to doc/doc_ch/update.md diff --git a/doc/FAQ_en.md b/doc/doc_en/FAQ_en.md similarity index 86% rename from doc/FAQ_en.md rename to doc/doc_en/FAQ_en.md index 3a70a40d..9e426486 100644 --- a/doc/FAQ_en.md +++ b/doc/doc_en/FAQ_en.md @@ -14,17 +14,17 @@ It is expected that the service deployment based on Serving and the mobile deplo 5. **Release time of self-developed algorithm** Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. - + 6. **How to run on Windows or Mac?** -PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation: - 1. In [Quick installation](installation.md), if you do not want to install docker, you can skip the first step and start with the second step. +PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation: + 1. In [Quick installation](./installation_en.md), if you do not want to install docker, you can skip the first step and start with the second step. 2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory. 7. **The difference between ultra-lightweight model and General OCR model** At present, PaddleOCR has opensourced two Chinese models, namely 8.6M ultra-lightweight Chinese model and general Chinese OCR model. The comparison information between the two is as follows: - Similarities: Both use the same **algorithm** and **training data**; - Differences: The difference lies in **backbone network** and **channel parameters**, the ultra-lightweight model uses MobileNetV3 as the backbone network, the general model uses Resnet50_vd as the detection model backbone, and Resnet34_vd as the recognition model backbone. You can compare the two model training configuration files to see the differences in parameters. - + |Model|Backbone|Detection configuration file|Recognition configuration file| |-|-|-|-| |8.6M ultra-lightweight Chinese OCR model|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml| @@ -40,6 +40,6 @@ At present, the open source model, dataset and magnitude are as follows: Chinese dataset: LSVT street view dataset with 3w pictures - Recognition: English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions. - Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. + Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w. - Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](datasets.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. \ No newline at end of file + Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc. diff --git a/doc/config_en.md b/doc/doc_en/config_en.md similarity index 98% rename from doc/config_en.md rename to doc/doc_en/config_en.md index c9e45035..e995a629 100644 --- a/doc/config_en.md +++ b/doc/doc_en/config_en.md @@ -8,7 +8,7 @@ The following list can be viewed via `--help` | -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | -## Introduction to Global Parameters of Configuration File +## Introduction to Global Parameters of Configuration File Take `rec_chinese_lite_train.yml` as an example @@ -46,4 +46,3 @@ Take `rec_chinese_reader.yml` as an example: | img_set_dir | Image folder path | ./train_data | \ | | label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | | infer_img | Result folder path | ./infer_img | \| - diff --git a/doc/customize_en.md b/doc/doc_en/customize_en.md similarity index 94% rename from doc/customize_en.md rename to doc/doc_en/customize_en.md index 99665329..d3a61ef2 100644 --- a/doc/customize_en.md +++ b/doc/doc_en/customize_en.md @@ -8,7 +8,7 @@ PaddleOCR provides two text detection algorithms: EAST and DB. Both support Mobi ``` python3 tools/train.py -c configs/det/det_mv3_db.yml ``` -For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md) +For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md) ## step2: Train text recognition model @@ -16,7 +16,7 @@ PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, an ``` python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml ``` -For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md) +For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md) ## step3: Concatenate predictions @@ -27,4 +27,4 @@ When performing prediction, you need to specify the path of a single image or a ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" ``` -For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference.md) +For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference_en.md) diff --git a/doc/datasets_en.md b/doc/doc_en/datasets_en.md similarity index 92% rename from doc/datasets_en.md rename to doc/doc_en/datasets_en.md index b0514ccc..ed858052 100644 --- a/doc/datasets_en.md +++ b/doc/doc_en/datasets_en.md @@ -12,9 +12,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize #### 1. ICDAR2019-LSVT - **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: - ![](datasets/LSVT_1.jpg) + ![](../datasets/LSVT_1.jpg) (a) Fully labeled data - ![](datasets/LSVT_2.jpg) + ![](../datasets/LSVT_2.jpg) (b) Weakly labeled data - **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt @@ -22,7 +22,7 @@ In addition to opensource data, users can also use synthesis tools to synthesize #### 2. ICDAR2017-RCTW-17 - **Data sources**:https://rctw.vlrlab.net/ - **Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications. - ![](datasets/rctw.jpg) + ![](../datasets/rctw.jpg) - **Download link**:https://rctw.vlrlab.net/dataset/ @@ -30,9 +30,9 @@ In addition to opensource data, users can also use synthesis tools to synthesize - **Data sources**:https://aistudio.baidu.com/aistudio/competition/detail/8 - **Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure: - ![](datasets/ch_street_rec_1.png) + ![](../datasets/ch_street_rec_1.png) (a) Label: 魅派集成吊顶 - ![](datasets/ch_street_rec_2.png) + ![](../datasets/ch_street_rec_2.png) (b) Label: 母婴用品连锁 - **Download link** https://aistudio.baidu.com/aistudio/datasetdetail/8429 @@ -46,14 +46,14 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 - 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt ) - Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus - Image resolution is 280x32 - ![](datasets/ch_doc1.jpg) - ![](datasets/ch_doc2.jpg) - ![](datasets/ch_doc3.jpg) + ![](../datasets/ch_doc1.jpg) + ![](../datasets/ch_doc2.jpg) + ![](../datasets/ch_doc3.jpg) - **Download link**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m) #### 5、ICDAR2019-ArT - **Data source**:https://ai.baidu.com/broad/introduction?dataset=art - **Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved. - ![](datasets/ArT.jpg) -- **Download link**:https://ai.baidu.com/broad/download?dataset=art \ No newline at end of file + ![](../datasets/ArT.jpg) +- **Download link**:https://ai.baidu.com/broad/download?dataset=art diff --git a/doc/detection_en.md b/doc/doc_en/detection_en.md similarity index 99% rename from doc/detection_en.md rename to doc/doc_en/detection_en.md index 5acba219..eb500879 100644 --- a/doc/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -50,7 +50,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml ``` In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file. -For a detailed explanation of the configuration file, please refer to [link](./doc/config-en.md). +For a detailed explanation of the configuration file, please refer to [link](./config_en.md). You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 ``` diff --git a/doc/inference_en.md b/doc/doc_en/inference_en.md similarity index 98% rename from doc/inference_en.md rename to doc/doc_en/inference_en.md index 521654db..3326bab8 100644 --- a/doc/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -65,7 +65,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_di The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: -![](imgs_results/det_res_2.jpg) +![](../imgs_results/det_res_2.jpg) By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command: @@ -98,7 +98,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: -![](imgs_results/det_res_img_10_db.jpg) +![](../imgs_results/det_res_img_10_db.jpg) **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. @@ -121,7 +121,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_ ``` The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: -![](imgs_results/det_res_img_10_east.jpg) +![](../imgs_results/det_res_img_10_east.jpg) **Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. @@ -139,7 +139,7 @@ For ultra-lightweight Chinese recognition model inference, you can execute the f python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" ``` -![](imgs_words/ch/word_4.jpg) +![](../imgs_words/ch/word_4.jpg) After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. @@ -165,7 +165,7 @@ For STAR-Net text recognition model inference, execute the following commands: ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` -![](imgs_words_en/word_336.png) +![](../imgs_words_en/word_336.png) After executing the command, the recognition result of the above image is as follows: @@ -194,7 +194,7 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model After executing the command, the recognition result image is as follows: -![](imgs_results/2.jpg) +![](../imgs_results/2.jpg) ### 2. Other model inference @@ -206,4 +206,4 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --d After executing the command, the recognition result image is as follows: -![](imgs_results/img_10.jpg) +![](../imgs_results/img_10.jpg) diff --git a/doc/installation_en.md b/doc/doc_en/installation_en.md similarity index 100% rename from doc/installation_en.md rename to doc/doc_en/installation_en.md diff --git a/doc/recognition_en.md b/doc/doc_en/recognition_en.md similarity index 99% rename from doc/recognition_en.md rename to doc/doc_en/recognition_en.md index a73aeec5..097bcc6d 100644 --- a/doc/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -191,7 +191,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c Input image: -![](./imgs_words/en/word_1.png) +![](../imgs_words/en/word_1.png) Get the prediction result of the input image: @@ -210,7 +210,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.c Input image: -![](./imgs_words/ch/word_1.jpg) +![](../imgs_words/ch/word_1.jpg) Get the prediction result of the input image: diff --git a/doc/update_en.md b/doc/doc_en/update_en.md similarity index 100% rename from doc/update_en.md rename to doc/doc_en/update_en.md