Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into dyg_db
This commit is contained in:
commit
6749a349c0
|
@ -1,6 +1,6 @@
|
|||
<a name="算法介绍"></a>
|
||||
## 算法介绍
|
||||
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。
|
||||
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。
|
||||
|
||||
- [1.文本检测算法](#文本检测算法)
|
||||
- [2.文本识别算法](#文本识别算法)
|
||||
|
@ -9,25 +9,25 @@
|
|||
### 1.文本检测算法
|
||||
|
||||
PaddleOCR开源的文本检测算法列表:
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐)
|
||||
- [x] DB([paper]( https://arxiv.org/abs/1911.08947) )(ppocr推荐)
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](link)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](link)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](link)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](link))|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接 (coming soon)](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接 (coming soon)](coming soon)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接 (coming soon)](link)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接 (coming soon)](link)|
|
||||
|
||||
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
||||
|
||||
|
@ -38,22 +38,22 @@ PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训
|
|||
### 2.文本识别算法
|
||||
|
||||
PaddleOCR基于动态图开源的文本识别算法列表:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐)
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐)
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon
|
||||
|
||||
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||
|
||||
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](link)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](link)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](link)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](link)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](link)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](link)|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接 (coming soon )]()|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接 (coming soon )]()|
|
||||
|
||||
|
||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
||||
|
|
|
@ -62,9 +62,9 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。
|
|||
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
||||
|
||||
```
|
||||
# GPU训练 支持单卡,多卡训练,通过selected_gpus指定卡号
|
||||
# GPU训练 支持单卡,多卡训练,通过 '--gpus' 指定卡号,如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||
# 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- 数据增强
|
||||
|
@ -74,7 +74,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
|
|||
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。
|
||||
|
||||
训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考:
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
|
||||
|
||||
*由于OpenCV的兼容性问题,扰动操作暂时只支持linux*
|
||||
|
|
|
@ -76,8 +76,8 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
|
|||
# 单机单卡训练 mv3_db 模型
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||
# 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID;
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID;如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||
```
|
||||
|
||||
|
|
|
@ -22,9 +22,8 @@ inference 模型(`paddle.jit.save`保存的模型)
|
|||
- [三、文本识别模型推理](#文本识别模型推理)
|
||||
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
||||
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
||||
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
|
||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [5. 多语言模型的推理](#多语言模型的推理)
|
||||
- [3. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [4. 多语言模型的推理](#多语言模型的推理)
|
||||
|
||||
- [四、方向分类模型推理](#方向识别模型推理)
|
||||
- [1. 方向分类模型推理](#方向分类模型推理)
|
||||
|
@ -268,16 +267,6 @@ CRNN 文本识别模型推理,可以执行如下命令:
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||
```
|
||||
|
||||
<a name="基于Attention损失的识别模型推理"></a>
|
||||
### 3. 基于Attention损失的识别模型推理
|
||||
|
||||
基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE"
|
||||
RARE 文本识别模型推理,可以执行如下命令:
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
||||
|
||||
```
|
||||
|
||||
![](../imgs_words_en/word_336.png)
|
||||
|
||||
执行命令后,上面图像的识别结果如下:
|
||||
|
@ -297,7 +286,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
|||
dict_character = list(self.character_str)
|
||||
```
|
||||
|
||||
### 4. 自定义文本识别字典的推理
|
||||
### 3. 自定义文本识别字典的推理
|
||||
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
|
||||
|
||||
```
|
||||
|
@ -305,7 +294,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
|||
```
|
||||
|
||||
<a name="多语言模型的推理"></a>
|
||||
### 5. 多语言模型的推理
|
||||
### 4. 多语言模型的推理
|
||||
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
||||
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
|
||||
PaddleOCR 工作环境
|
||||
- PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0
|
||||
- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0rc1
|
||||
- python3.7
|
||||
- glibc 2.23
|
||||
- cuDNN 7.6+ (GPU)
|
||||
|
@ -35,11 +35,11 @@ sudo docker container exec -it ppocr /bin/bash
|
|||
pip3 install --upgrade pip
|
||||
|
||||
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
|
||||
python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle-gpu==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
如果您的机器是CPU,请运行以下命令安装
|
||||
|
||||
python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
|
||||
```
|
||||
|
|
|
@ -200,11 +200,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
|
|||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
||||
|
||||
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
|
||||
|
||||
|
|
|
@ -13,23 +13,23 @@ This tutorial lists the text detection algorithms and text recognition algorithm
|
|||
PaddleOCR open source text detection algorithms list:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498) )(Baidu Self-Research)
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](link)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](link)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](link)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[download link (coming soon)](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[download link (coming soon)](coming soon)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[download link (coming soon)](link)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[download link (coming soon)](link)|
|
||||
|
||||
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
||||
|
||||
|
@ -41,20 +41,21 @@ For the training guide and use of PaddleOCR text detection algorithms, please re
|
|||
PaddleOCR open-source text recognition algorithms list:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[download link (coming soon )]()|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[download link (coming soon )]()|
|
||||
|
||||
|
||||
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
||||
|
|
|
@ -65,9 +65,9 @@ Start training:
|
|||
```
|
||||
# Set PYTHONPATH path
|
||||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU training Support single card and multi-card training, specify the card number through selected_gpus
|
||||
# GPU training Support single card and multi-card training, specify the card number through --gpus. If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- Data Augmentation
|
||||
|
@ -77,7 +77,7 @@ PaddleOCR provides a variety of data augmentation methods. If you want to add di
|
|||
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
|
||||
|
||||
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
|
||||
|
||||
|
||||
|
|
|
@ -76,8 +76,10 @@ You can also use `-o` to change the training parameters without modifying the ym
|
|||
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
|
||||
# multi-GPU training
|
||||
# Set the GPU ID used by the '--select_gpus' parameter;
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
# Set the GPU ID used by the '--gpus' parameter; If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
|
||||
|
||||
```
|
||||
|
||||
#### load trained model and continue training
|
||||
|
|
|
@ -25,9 +25,8 @@ Next, we first introduce how to convert a trained model into an inference model,
|
|||
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
||||
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
|
||||
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
- [5. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
- [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
- [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
|
||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
|
@ -275,15 +274,6 @@ For CRNN text recognition model inference, execute the following commands:
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||
```
|
||||
|
||||
<a name="ATTENTION-BASED_RECOGNITION"></a>
|
||||
### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
|
||||
|
||||
The recognition model based on Attention loss is different from ctc, and additional recognition algorithm parameters need to be set --rec_algorithm="RARE"
|
||||
After executing the command, the recognition result of the above image is as follows:
|
||||
```bash
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
||||
```
|
||||
|
||||
![](../imgs_words_en/word_336.png)
|
||||
|
||||
After executing the command, the recognition result of the above image is as follows:
|
||||
|
@ -303,7 +293,7 @@ dict_character = list(self.character_str)
|
|||
```
|
||||
|
||||
<a name="USING_CUSTOM_CHARACTERS"></a>
|
||||
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
|
||||
|
||||
```
|
||||
|
@ -311,7 +301,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
|||
```
|
||||
|
||||
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
||||
### 5. MULTILINGAUL MODEL INFERENCE
|
||||
### 4. MULTILINGAUL MODEL INFERENCE
|
||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
|
||||
|
||||
PaddleOCR working environment:
|
||||
- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0
|
||||
- PaddlePaddle 1.8+, Recommend PaddlePaddle 2.0rc1
|
||||
- python3.7
|
||||
- glibc 2.23
|
||||
|
||||
|
@ -38,10 +38,10 @@ sudo docker container exec -it ppocr /bin/bash
|
|||
pip3 install --upgrade pip
|
||||
|
||||
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
|
||||
python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
# If you only have cpu on your machine, please run the following command to install
|
||||
python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
```
|
||||
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
|
||||
|
||||
|
|
|
@ -193,11 +193,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
|
|||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
||||
|
||||
For training Chinese data, it is recommended to use
|
||||
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
|
||||
|
|
|
@ -180,7 +180,6 @@ class GridGenerator(nn.Layer):
|
|||
P = self.build_P_paddle(I_r_size)
|
||||
|
||||
inv_delta_C_tensor = self.build_inv_delta_C_paddle(C).astype('float32')
|
||||
# inv_delta_C_tensor = paddle.zeros((23,23)).astype('float32')
|
||||
P_hat_tensor = self.build_P_hat_paddle(
|
||||
C, paddle.to_tensor(P)).astype('float32')
|
||||
|
||||
|
|
6
train.sh
6
train.sh
|
@ -1 +1,5 @@
|
|||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
||||
# for paddle.__version__ >= 2.0rc1
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
||||
|
||||
# for paddle.__version__ < 2.0rc1
|
||||
# python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml
|
||||
|
|
Loading…
Reference in New Issue