Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into dyg_db
This commit is contained in:
commit
6749a349c0
|
@ -1,6 +1,6 @@
|
||||||
<a name="算法介绍"></a>
|
<a name="算法介绍"></a>
|
||||||
## 算法介绍
|
## 算法介绍
|
||||||
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。
|
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。
|
||||||
|
|
||||||
- [1.文本检测算法](#文本检测算法)
|
- [1.文本检测算法](#文本检测算法)
|
||||||
- [2.文本识别算法](#文本识别算法)
|
- [2.文本识别算法](#文本识别算法)
|
||||||
|
@ -16,18 +16,18 @@ PaddleOCR开源的文本检测算法列表:
|
||||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||||
|
|
||||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||||
|-|-|-|-|-|-|
|
| --- | --- | --- | --- | --- | --- |
|
||||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](link)|
|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接 (coming soon)](link)|
|
||||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](link)|
|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接 (coming soon)](coming soon)|
|
||||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](link)|
|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](link)|
|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](link))|
|
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接 (coming soon)](link)|
|
||||||
|
|
||||||
在Total-text文本检测公开数据集上,算法效果如下:
|
在Total-text文本检测公开数据集上,算法效果如下:
|
||||||
|
|
||||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||||
|-|-|-|-|-|-|
|
| --- | --- | --- | --- | --- | --- |
|
||||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](link)|
|
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接 (coming soon)](link)|
|
||||||
|
|
||||||
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
||||||
|
|
||||||
|
@ -40,20 +40,20 @@ PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训
|
||||||
PaddleOCR基于动态图开源的文本识别算法列表:
|
PaddleOCR基于动态图开源的文本识别算法列表:
|
||||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐)
|
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐)
|
||||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon
|
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon
|
||||||
|
|
||||||
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||||
|
|
||||||
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
||||||
|-|-|-|-|-|
|
| --- | --- | --- | --- | --- |
|
||||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](link)|
|
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](link)|
|
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](link)|
|
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](link)|
|
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](link)|
|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接 (coming soon )]()|
|
||||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](link)|
|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接 (coming soon )]()|
|
||||||
|
|
||||||
|
|
||||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
||||||
|
|
|
@ -62,9 +62,9 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。
|
||||||
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
||||||
|
|
||||||
```
|
```
|
||||||
# GPU训练 支持单卡,多卡训练,通过selected_gpus指定卡号
|
# GPU训练 支持单卡,多卡训练,通过 '--gpus' 指定卡号,如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||||
# 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可
|
# 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可
|
||||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
- 数据增强
|
- 数据增强
|
||||||
|
|
|
@ -76,8 +76,8 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
|
||||||
# 单机单卡训练 mv3_db 模型
|
# 单机单卡训练 mv3_db 模型
|
||||||
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
||||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||||
# 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID;
|
# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID;如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
||||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
@ -22,9 +22,8 @@ inference 模型(`paddle.jit.save`保存的模型)
|
||||||
- [三、文本识别模型推理](#文本识别模型推理)
|
- [三、文本识别模型推理](#文本识别模型推理)
|
||||||
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
||||||
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
||||||
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
|
- [3. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
- [4. 多语言模型的推理](#多语言模型的推理)
|
||||||
- [5. 多语言模型的推理](#多语言模型的推理)
|
|
||||||
|
|
||||||
- [四、方向分类模型推理](#方向识别模型推理)
|
- [四、方向分类模型推理](#方向识别模型推理)
|
||||||
- [1. 方向分类模型推理](#方向分类模型推理)
|
- [1. 方向分类模型推理](#方向分类模型推理)
|
||||||
|
@ -268,16 +267,6 @@ CRNN 文本识别模型推理,可以执行如下命令:
|
||||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="基于Attention损失的识别模型推理"></a>
|
|
||||||
### 3. 基于Attention损失的识别模型推理
|
|
||||||
|
|
||||||
基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE"
|
|
||||||
RARE 文本识别模型推理,可以执行如下命令:
|
|
||||||
```
|
|
||||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
![](../imgs_words_en/word_336.png)
|
![](../imgs_words_en/word_336.png)
|
||||||
|
|
||||||
执行命令后,上面图像的识别结果如下:
|
执行命令后,上面图像的识别结果如下:
|
||||||
|
@ -297,7 +286,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
||||||
dict_character = list(self.character_str)
|
dict_character = list(self.character_str)
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. 自定义文本识别字典的推理
|
### 3. 自定义文本识别字典的推理
|
||||||
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
|
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -305,7 +294,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="多语言模型的推理"></a>
|
<a name="多语言模型的推理"></a>
|
||||||
### 5. 多语言模型的推理
|
### 4. 多语言模型的推理
|
||||||
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
||||||
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
|
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
|
||||||
PaddleOCR 工作环境
|
PaddleOCR 工作环境
|
||||||
- PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0
|
- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0rc1
|
||||||
- python3.7
|
- python3.7
|
||||||
- glibc 2.23
|
- glibc 2.23
|
||||||
- cuDNN 7.6+ (GPU)
|
- cuDNN 7.6+ (GPU)
|
||||||
|
@ -35,11 +35,11 @@ sudo docker container exec -it ppocr /bin/bash
|
||||||
pip3 install --upgrade pip
|
pip3 install --upgrade pip
|
||||||
|
|
||||||
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
|
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
|
||||||
python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
python3 -m pip install paddlepaddle-gpu==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||||
|
|
||||||
如果您的机器是CPU,请运行以下命令安装
|
如果您的机器是CPU,请运行以下命令安装
|
||||||
|
|
||||||
python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
python3 -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||||
|
|
||||||
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
|
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
|
||||||
```
|
```
|
||||||
|
|
|
@ -200,11 +200,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
|
||||||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
|
||||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
|
||||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
|
||||||
|
|
||||||
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
|
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
|
||||||
|
|
||||||
|
|
|
@ -18,18 +18,18 @@ PaddleOCR open source text detection algorithms list:
|
||||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||||
|
|
||||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||||
|-|-|-|-|-|-|
|
| --- | --- | --- | --- | --- | --- |
|
||||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](link)|
|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[download link (coming soon)](link)|
|
||||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](link)|
|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[download link (coming soon)](coming soon)|
|
||||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](link)|
|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](link)|
|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](link)|
|
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[download link (coming soon)](link)|
|
||||||
|
|
||||||
On Total-Text dataset, the text detection result is as follows:
|
On Total-Text dataset, the text detection result is as follows:
|
||||||
|
|
||||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||||
|-|-|-|-|-|-|
|
| --- | --- | --- | --- | --- | --- |
|
||||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](link)|
|
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[download link (coming soon)](link)|
|
||||||
|
|
||||||
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
||||||
|
|
||||||
|
@ -41,20 +41,21 @@ For the training guide and use of PaddleOCR text detection algorithms, please re
|
||||||
PaddleOCR open-source text recognition algorithms list:
|
PaddleOCR open-source text recognition algorithms list:
|
||||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon
|
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon
|
||||||
|
|
||||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||||
|
|
||||||
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
||||||
|-|-|-|-|-|
|
| --- | --- | --- | --- | --- |
|
||||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[download link (coming soon )]()|
|
||||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[download link (coming soon )]()|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
||||||
|
|
|
@ -65,9 +65,9 @@ Start training:
|
||||||
```
|
```
|
||||||
# Set PYTHONPATH path
|
# Set PYTHONPATH path
|
||||||
export PYTHONPATH=$PYTHONPATH:.
|
export PYTHONPATH=$PYTHONPATH:.
|
||||||
# GPU training Support single card and multi-card training, specify the card number through selected_gpus
|
# GPU training Support single card and multi-card training, specify the card number through --gpus. If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||||
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
|
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
|
||||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
- Data Augmentation
|
- Data Augmentation
|
||||||
|
|
|
@ -76,8 +76,10 @@ You can also use `-o` to change the training parameters without modifying the ym
|
||||||
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||||
|
|
||||||
# multi-GPU training
|
# multi-GPU training
|
||||||
# Set the GPU ID used by the '--select_gpus' parameter;
|
# Set the GPU ID used by the '--gpus' parameter; If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
#### load trained model and continue training
|
#### load trained model and continue training
|
||||||
|
|
|
@ -25,9 +25,8 @@ Next, we first introduce how to convert a trained model into an inference model,
|
||||||
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
||||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||||
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
||||||
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
|
- [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||||
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
- [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||||
- [5. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
|
||||||
|
|
||||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||||
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||||
|
@ -275,15 +274,6 @@ For CRNN text recognition model inference, execute the following commands:
|
||||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="ATTENTION-BASED_RECOGNITION"></a>
|
|
||||||
### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
|
|
||||||
|
|
||||||
The recognition model based on Attention loss is different from ctc, and additional recognition algorithm parameters need to be set --rec_algorithm="RARE"
|
|
||||||
After executing the command, the recognition result of the above image is as follows:
|
|
||||||
```bash
|
|
||||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
|
||||||
```
|
|
||||||
|
|
||||||
![](../imgs_words_en/word_336.png)
|
![](../imgs_words_en/word_336.png)
|
||||||
|
|
||||||
After executing the command, the recognition result of the above image is as follows:
|
After executing the command, the recognition result of the above image is as follows:
|
||||||
|
@ -303,7 +293,7 @@ dict_character = list(self.character_str)
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="USING_CUSTOM_CHARACTERS"></a>
|
<a name="USING_CUSTOM_CHARACTERS"></a>
|
||||||
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||||
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
|
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -311,7 +301,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
||||||
```
|
```
|
||||||
|
|
||||||
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
||||||
### 5. MULTILINGAUL MODEL INFERENCE
|
### 4. MULTILINGAUL MODEL INFERENCE
|
||||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
||||||
|
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
|
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
|
||||||
|
|
||||||
PaddleOCR working environment:
|
PaddleOCR working environment:
|
||||||
- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0
|
- PaddlePaddle 1.8+, Recommend PaddlePaddle 2.0rc1
|
||||||
- python3.7
|
- python3.7
|
||||||
- glibc 2.23
|
- glibc 2.23
|
||||||
|
|
||||||
|
@ -38,10 +38,10 @@ sudo docker container exec -it ppocr /bin/bash
|
||||||
pip3 install --upgrade pip
|
pip3 install --upgrade pip
|
||||||
|
|
||||||
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
|
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
|
||||||
python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||||
|
|
||||||
# If you only have cpu on your machine, please run the following command to install
|
# If you only have cpu on your machine, please run the following command to install
|
||||||
python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
python3 -m pip install paddlepaddle==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||||
```
|
```
|
||||||
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
|
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
|
||||||
|
|
||||||
|
|
|
@ -193,11 +193,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
|
||||||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
|
||||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
|
||||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
|
||||||
|
|
||||||
For training Chinese data, it is recommended to use
|
For training Chinese data, it is recommended to use
|
||||||
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
|
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
|
||||||
|
|
|
@ -180,7 +180,6 @@ class GridGenerator(nn.Layer):
|
||||||
P = self.build_P_paddle(I_r_size)
|
P = self.build_P_paddle(I_r_size)
|
||||||
|
|
||||||
inv_delta_C_tensor = self.build_inv_delta_C_paddle(C).astype('float32')
|
inv_delta_C_tensor = self.build_inv_delta_C_paddle(C).astype('float32')
|
||||||
# inv_delta_C_tensor = paddle.zeros((23,23)).astype('float32')
|
|
||||||
P_hat_tensor = self.build_P_hat_paddle(
|
P_hat_tensor = self.build_P_hat_paddle(
|
||||||
C, paddle.to_tensor(P)).astype('float32')
|
C, paddle.to_tensor(P)).astype('float32')
|
||||||
|
|
||||||
|
|
6
train.sh
6
train.sh
|
@ -1 +1,5 @@
|
||||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
# for paddle.__version__ >= 2.0rc1
|
||||||
|
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
||||||
|
|
||||||
|
# for paddle.__version__ < 2.0rc1
|
||||||
|
# python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
||||||
|
|
Loading…
Reference in New Issue