Merge remote-tracking branch 'upstream/dygraph' into dy3
|
@ -9,7 +9,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
|
||||
**近期更新**
|
||||
- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
|
||||
- 2020.12.07 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。
|
||||
- 2020.12.14 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数127个,每周一都会更新,欢迎大家持续关注。
|
||||
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
|
||||
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
|
||||
- [More](./doc/doc_ch/update.md)
|
||||
|
@ -39,6 +39,14 @@ PaddleOCR同时支持动态图与静态图两种编程范式
|
|||
|
||||
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
## 快速体验
|
||||
- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
|
||||
|
||||
|
@ -121,7 +129,7 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
|
|||
|
||||
- 英文模型
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/img_12.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- 其他语言模型
|
||||
|
@ -130,13 +138,6 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
|
|||
<img src="./doc/imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
请扫描下面二维码,完成问卷填写,获取加群二维码和OCR方向的炼丹秘籍
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
<a name="许可证书"></a>
|
||||
## 许可证书
|
||||
|
|
|
@ -72,7 +72,10 @@ fusion_generator:
|
|||
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
|
||||
```
|
||||
|
||||
* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
|
||||
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
|
||||
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
|
||||
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
|
||||
|
||||
|
||||
For example, enter the following image and corpus `PaddleOCR`.
|
||||
|
||||
|
@ -116,9 +119,17 @@ In actual application scenarios, it is often necessary to synthesize pictures in
|
|||
* `CorpusGenerator`:
|
||||
* `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`.
|
||||
* `language`:Language of the corpus.
|
||||
* `corpus_file`: Filepath of the corpus.
|
||||
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings('\n'). Corpus generator samples one line each time.
|
||||
|
||||
|
||||
Example of corpus file:
|
||||
```
|
||||
PaddleOCR
|
||||
飞桨文字识别
|
||||
StyleText
|
||||
风格文本图像数据合成
|
||||
```
|
||||
|
||||
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
|
||||
|
||||
<div align="center">
|
||||
|
@ -130,7 +141,18 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
|
|||
``` bash
|
||||
python -m tools.synth_dataset.py -c configs/dataset_config.yml
|
||||
```
|
||||
|
||||
We also provide example corpus and images in `examples` folder.
|
||||
<div align="center">
|
||||
<img src="examples/style_images/1.jpg" width="300">
|
||||
<img src="examples/style_images/2.jpg" width="300">
|
||||
</div>
|
||||
If you run the code above directly, you will get example output data in `output_data` folder.
|
||||
You will get synthesis images and labels as below:
|
||||
<div align="center">
|
||||
<img src="doc/images/12.png" width="800">
|
||||
</div>
|
||||
There will be some cache under the `label` folder. If the program exit unexpectedly, you can find cached labels there.
|
||||
When the program finish normally, you will find all the labels in `label.txt` which give the final results.
|
||||
|
||||
<a name="Applications"></a>
|
||||
### Applications
|
||||
|
|
|
@ -63,7 +63,10 @@ fusion_generator:
|
|||
```python
|
||||
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
|
||||
```
|
||||
* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
|
||||
* 注1:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
|
||||
* 注2:Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计,我们主要支持高度在32左右的风格图像。
|
||||
如果输入图像尺寸相差过多,效果可能不佳。
|
||||
|
||||
|
||||
例如,输入如下图片和语料"PaddleOCR":
|
||||
|
||||
|
@ -102,7 +105,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
|
|||
* `CorpusGenerator`:
|
||||
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`;
|
||||
* `language`:语料的语种;
|
||||
* `corpus_file`: 语料文件路径。
|
||||
* `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。
|
||||
|
||||
语料文件格式示例:
|
||||
```
|
||||
PaddleOCR
|
||||
飞桨文字识别
|
||||
StyleText
|
||||
风格文本图像数据合成
|
||||
...
|
||||
```
|
||||
|
||||
Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。
|
||||
|
||||
|
@ -117,6 +129,19 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
|
|||
``` bash
|
||||
python -m tools.synth_dataset -c configs/dataset_config.yml
|
||||
```
|
||||
我们在examples目录下提供了样例图片和语料。
|
||||
<div align="center">
|
||||
<img src="examples/style_images/1.jpg" width="300">
|
||||
<img src="examples/style_images/2.jpg" width="300">
|
||||
</div>
|
||||
|
||||
直接运行上述命令,可以在output_data中产生样例输出,包括图片和用于训练识别模型的标注文件:
|
||||
<div align="center">
|
||||
<img src="doc/images/12.png" width="800">
|
||||
</div>
|
||||
|
||||
其中label目录下的标注文件为程序运行过程中产生的缓存,如果程序在中途异常终止,可以使用缓存的标注文件。
|
||||
如果程序正常运行完毕,则会在output_data下生成label.txt,为最终的标注结果。
|
||||
|
||||
<a name="应用案例"></a>
|
||||
### 四、应用案例
|
||||
|
|
|
@ -33,7 +33,7 @@ Predictor:
|
|||
- 0.5
|
||||
expand_result: false
|
||||
bg_generator:
|
||||
pretrain: models/style_text_rec/bg_generator
|
||||
pretrain: style_text_models/bg_generator
|
||||
module_name: bg_generator
|
||||
generator_type: BgGeneratorWithMask
|
||||
encode_dim: 64
|
||||
|
@ -43,7 +43,7 @@ Predictor:
|
|||
conv_block_dilation: true
|
||||
output_factor: 1.05
|
||||
text_generator:
|
||||
pretrain: models/style_text_rec/text_generator
|
||||
pretrain: style_text_models/text_generator
|
||||
module_name: text_generator
|
||||
generator_type: TextGenerator
|
||||
encode_dim: 64
|
||||
|
@ -52,7 +52,7 @@ Predictor:
|
|||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
fusion_generator:
|
||||
pretrain: models/style_text_rec/fusion_generator
|
||||
pretrain: style_text_models/fusion_generator
|
||||
module_name: fusion_generator
|
||||
generator_type: FusionGeneratorSimple
|
||||
encode_dim: 64
|
||||
|
|
After Width: | Height: | Size: 148 KiB |
|
@ -1,2 +1,2 @@
|
|||
PaddleOCR
|
||||
Paddle
|
||||
飞桨文字识别
|
||||
|
|
|
@ -2,11 +2,11 @@ Global:
|
|||
use_gpu: true
|
||||
epoch_num: 1200
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 2
|
||||
print_batch_step: 10
|
||||
save_model_dir: ./output/db_mv3/
|
||||
save_epoch_step: 1200
|
||||
# evaluation is run every 5000 iterations after the 4000th iteration
|
||||
eval_batch_step: [4000, 5000]
|
||||
# evaluation is run every 2000 iterations
|
||||
eval_batch_step: [0, 2000]
|
||||
# if pretrained_model is saved in static mode, load_static_weights must set to True
|
||||
load_static_weights: True
|
||||
cal_metric_during_train: False
|
||||
|
@ -39,7 +39,7 @@ Loss:
|
|||
alpha: 5
|
||||
beta: 10
|
||||
ohem_ratio: 3
|
||||
|
||||
|
||||
Optimizer:
|
||||
name: Adam
|
||||
beta1: 0.9
|
||||
|
@ -100,7 +100,7 @@ Train:
|
|||
loader:
|
||||
shuffle: True
|
||||
drop_last: False
|
||||
batch_size_per_card: 4
|
||||
batch_size_per_card: 16
|
||||
num_workers: 8
|
||||
|
||||
Eval:
|
||||
|
@ -128,4 +128,4 @@ Eval:
|
|||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 2
|
||||
num_workers: 8
|
|
@ -5,8 +5,8 @@ Global:
|
|||
print_batch_step: 10
|
||||
save_model_dir: ./output/det_r50_vd/
|
||||
save_epoch_step: 1200
|
||||
# evaluation is run every 5000 iterations after the 4000th iteration
|
||||
eval_batch_step: [5000,4000]
|
||||
# evaluation is run every 2000 iterations
|
||||
eval_batch_step: [0,2000]
|
||||
# if pretrained_model is saved in static mode, load_static_weights must set to True
|
||||
load_static_weights: True
|
||||
cal_metric_during_train: False
|
||||
|
|
|
@ -60,7 +60,8 @@ Metric:
|
|||
Train:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
|
||||
data_dir: ./train_data/
|
||||
label_file_list: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
|
||||
data_ratio_list: [0.5, 0.5]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
|
|
|
@ -103,17 +103,17 @@ make inference_lib_dist
|
|||
更多编译参数选项可以参考Paddle C++预测库官网:[https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
|
||||
|
||||
|
||||
* 编译完成之后,可以在`build/fluid_inference_install_dir/`文件下看到生成了以下文件及文件夹。
|
||||
* 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。
|
||||
|
||||
```
|
||||
build/fluid_inference_install_dir/
|
||||
build/paddle_inference_install_dir/
|
||||
|-- CMakeCache.txt
|
||||
|-- paddle
|
||||
|-- third_party
|
||||
|-- version.txt
|
||||
```
|
||||
|
||||
其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
|
||||
其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
|
||||
|
||||
#### 1.2.2 直接下载安装
|
||||
|
||||
|
|
|
@ -11,10 +11,15 @@ max_side_len 960
|
|||
det_db_thresh 0.3
|
||||
det_db_box_thresh 0.5
|
||||
det_db_unclip_ratio 2.0
|
||||
det_model_dir ./inference/det_db
|
||||
det_model_dir ./inference/ch__ppocr_mobile_v2.0_det_infer/
|
||||
|
||||
# cls config
|
||||
use_angle_cls 0
|
||||
cls_model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer/
|
||||
cls_thresh 0.9
|
||||
|
||||
# rec config
|
||||
rec_model_dir ./inference/rec_crnn
|
||||
rec_model_dir ./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt
|
||||
|
||||
# show the detection results
|
||||
|
|
|
@ -9,44 +9,42 @@
|
|||
|
||||
## PaddleOCR常见问题汇总(持续更新)
|
||||
|
||||
* [近期更新(2020.12.07)](#近期更新)
|
||||
* [近期更新(2020.12.14)](#近期更新)
|
||||
* [【精选】OCR精选10个问题](#OCR精选10个问题)
|
||||
* [【理论篇】OCR通用30个问题](#OCR通用问题)
|
||||
* [基础知识7题](#基础知识)
|
||||
* [数据集7题](#数据集2)
|
||||
* [模型训练调优7题](#模型训练调优2)
|
||||
* [预测部署9题](#预测部署2)
|
||||
* [【实战篇】PaddleOCR实战84个问题](#PaddleOCR实战问题)
|
||||
* [使用咨询20题](#使用咨询)
|
||||
* [【实战篇】PaddleOCR实战87个问题](#PaddleOCR实战问题)
|
||||
* [使用咨询21题](#使用咨询)
|
||||
* [数据集17题](#数据集3)
|
||||
* [模型训练调优24题](#模型训练调优3)
|
||||
* [预测部署23题](#预测部署3)
|
||||
* [模型训练调优25题](#模型训练调优3)
|
||||
* [预测部署24题](#预测部署3)
|
||||
|
||||
|
||||
<a name="近期更新"></a>
|
||||
## 近期更新(2020.12.07)
|
||||
## 近期更新(2020.12.14)
|
||||
|
||||
#### Q2.4.9:弯曲文本有试过opencv的TPS进行弯曲校正吗?
|
||||
#### Q3.1.21:PaddleOCR支持动态图吗?
|
||||
|
||||
**A**:opencv的tps需要标出上下边界对应的点,这些点很难通过传统方法或者深度学习方法获取。PaddleOCR里StarNet网络中的tps模块实现了自动学点,自动校正,可以直接尝试这个。
|
||||
**A**:动态图版本正在紧锣密鼓开发中,将于2020年12月16日发布,敬请关注。
|
||||
|
||||
#### Q3.3.20: 文字检测时怎么模糊的数据增强?
|
||||
#### Q3.3.23:检测模型训练或预测时出现elementwise_add报错
|
||||
|
||||
**A**: 模糊的数据增强需要修改代码进行添加,以DB为例,参考[Normalize](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/operators.py#L60) ,添加模糊的增强就行
|
||||
**A**:设置的输入尺寸必须是32的倍数,否则在网络多次下采样和上采样后,feature map会产生1个像素的diff,从而导致elementwise_add时报shape不匹配的错误。
|
||||
|
||||
#### Q3.3.21: 文字检测时怎么更改图片旋转的角度,实现360度任意旋转?
|
||||
#### Q3.3.24: DB检测训练输入尺寸640,可以改大一些吗?
|
||||
|
||||
**A**: 将[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/iaa_augment.py#L64) 的(-10,10) 改为(-180,180)即可
|
||||
**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸,并非直接将原图进行resize,多数场景下这个尺寸并不小了,改大后可能反而并不合适,而且训练会变慢。另外,代码里可能有的地方参数按照预设输入尺寸适配的,改大后可能有隐藏风险。
|
||||
|
||||
#### Q3.3.22: 训练数据的长宽比过大怎么修改shape
|
||||
#### Q3.3.25: 识别模型训练时,loss能正常下降,但acc一直为0
|
||||
|
||||
**A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) ,
|
||||
检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85)
|
||||
**A**: 识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。
|
||||
|
||||
#### Q3.4.24:DB模型能正确推理预测,但换成EAST或SAST模型时报错或结果不正确
|
||||
|
||||
#### Q3.4.23:安装paddleocr后,提示没有paddle
|
||||
|
||||
**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。
|
||||
**A**:使用EAST或SAST模型进行推理预测时,需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST",使用DB时不用指定是因为该参数默认值是"DB":https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43
|
||||
|
||||
<a name="OCR精选10个问题"></a>
|
||||
## 【精选】OCR精选10个问题
|
||||
|
@ -390,6 +388,10 @@
|
|||
**A**:PaddleOCR主要聚焦通用ocr,如果有垂类需求,您可以用PaddleOCR+垂类数据自己训练;
|
||||
如果缺少带标注的数据,或者不想投入研发成本,建议直接调用开放的API,开放的API覆盖了目前比较常见的一些垂类。
|
||||
|
||||
#### Q3.1.21:PaddleOCR支持动态图吗?
|
||||
|
||||
**A**:动态图版本正在紧锣密鼓开发中,将于2020年12月16日发布,敬请关注。
|
||||
|
||||
<a name="数据集3"></a>
|
||||
### 数据集
|
||||
|
||||
|
@ -603,6 +605,18 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
|
|||
**A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) ,
|
||||
检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85)
|
||||
|
||||
#### Q3.3.23:检测模型训练或预测时出现elementwise_add报错
|
||||
|
||||
**A**:设置的输入尺寸必须是32的倍数,否则在网络多次下采样和上采样后,feature map会产生1个像素的diff,从而导致elementwise_add时报shape不匹配的错误。
|
||||
|
||||
#### Q3.3.24: DB检测训练输入尺寸640,可以改大一些吗?
|
||||
|
||||
**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸,并非直接将原图进行resize,多数场景下这个尺寸并不小了,改大后可能反而并不合适,而且训练会变慢。另外,代码里可能有的地方参数按照预设输入尺寸适配的,改大后可能有隐藏风险。
|
||||
|
||||
#### Q3.3.25: 识别模型训练时,loss能正常下降,但acc一直为0
|
||||
|
||||
**A**: 识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。
|
||||
|
||||
<a name="预测部署3"></a>
|
||||
|
||||
### 预测部署
|
||||
|
@ -710,4 +724,8 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
|
|||
|
||||
#### Q3.4.23:安装paddleocr后,提示没有paddle
|
||||
|
||||
**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。
|
||||
**A**:这是因为paddlepaddle gpu版本和cpu版本的名称不一致,现在已经在[whl的文档](./whl.md)里做了安装说明。
|
||||
|
||||
#### Q3.4.24:DB模型能正确推理预测,但换成EAST或SAST模型时报错或结果不正确
|
||||
|
||||
**A**:使用EAST或SAST模型进行推理预测时,需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST",使用DB时不用指定是因为该参数默认值是"DB":https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43
|
|
@ -131,12 +131,12 @@ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_mo
|
|||
# 下载超轻量中文检测模型:
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
|
||||
```
|
||||
|
||||
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
|
||||
|
||||

|
||||

|
||||
|
||||
通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制,
|
||||
`litmit_type`可选参数为[`max`, `min`],
|
||||
|
|
|
@ -58,7 +58,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
|
|||
**4. 安装第三方库**
|
||||
```
|
||||
cd PaddleOCR
|
||||
pip3 install -r requirments.txt
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
|
||||
注意,windows环境下,建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装,
|
||||
|
|
|
@ -115,7 +115,7 @@ PaddleOCR
|
|||
│ │ │ ├── text_image_aug // 文本识别的 tia 数据扩充
|
||||
│ │ │ │ ├── __init__.py
|
||||
│ │ │ │ ├── augment.py // tia_distort,tia_stretch 和 tia_perspective 的代码
|
||||
│ │ │ │ ├── warp_mls.py
|
||||
│ │ │ │ ├── warp_mls.py
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── east_process.py // EAST 算法的数据处理步骤
|
||||
│ │ │ ├── make_border_map.py // 生成边界图
|
||||
|
@ -167,7 +167,7 @@ PaddleOCR
|
|||
│ │ │ ├── det_east_head.py // EAST 检测头
|
||||
│ │ │ ├── det_sast_head.py // SAST 检测头
|
||||
│ │ │ ├── rec_ctc_head.py // 识别 ctc
|
||||
│ │ │ ├── rec_att_head.py // 识别 attention
|
||||
│ │ │ ├── rec_att_head.py // 识别 attention
|
||||
│ │ ├── transforms // 图像变换
|
||||
│ │ │ ├── __init__.py // 构造 transform 相关代码
|
||||
│ │ │ └── tps.py // TPS 变换
|
||||
|
@ -185,7 +185,7 @@ PaddleOCR
|
|||
│ │ └── sast_postprocess.py // SAST 后处理
|
||||
│ └── utils // 工具
|
||||
│ ├── dict // 小语种字典
|
||||
│ ....
|
||||
│ ....
|
||||
│ ├── ic15_dict.txt // 英文数字字典,区分大小写
|
||||
│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型
|
||||
│ ├── logging.py // logger
|
||||
|
@ -207,10 +207,10 @@ PaddleOCR
|
|||
│ ├── program.py // 整体流程
|
||||
│ ├── test_hubserving.py
|
||||
│ └── train.py // 启动训练
|
||||
├── paddleocr.py
|
||||
├── paddleocr.py
|
||||
├── README_ch.md // 中文说明文档
|
||||
├── README_en.md // 英文说明文档
|
||||
├── README.md // 主页说明文档
|
||||
├── requirments.txt // 安装依赖
|
||||
├── requirements.txt // 安装依赖
|
||||
├── setup.py // whl包打包脚本
|
||||
├── train.sh // 启动训练脚本
|
||||
├── train.sh // 启动训练脚本
|
||||
|
|
|
@ -138,12 +138,12 @@ For lightweight Chinese detection model inference, you can execute the following
|
|||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
# predict
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/"
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/"
|
||||
```
|
||||
|
||||
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
|
||||
|
||||

|
||||

|
||||
|
||||
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
|
||||
The optional parameters of `litmit_type` are [`max`, `min`], and
|
||||
|
|
|
@ -61,7 +61,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
|
|||
**4. Install third-party libraries**
|
||||
```
|
||||
cd PaddleOCR
|
||||
pip3 install -r requirments.txt
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
|
||||
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.
|
||||
|
|
|
@ -116,7 +116,7 @@ PaddleOCR
|
|||
│ │ │ ├── text_image_aug // Tia data augment for text recognition
|
||||
│ │ │ │ ├── __init__.py
|
||||
│ │ │ │ ├── augment.py // Tia_distort,tia_stretch and tia_perspective
|
||||
│ │ │ │ ├── warp_mls.py
|
||||
│ │ │ │ ├── warp_mls.py
|
||||
│ │ │ ├── __init__.py
|
||||
│ │ │ ├── east_process.py // Data processing steps of EAST algorithm
|
||||
│ │ │ ├── iaa_augment.py // Data augmentation operations
|
||||
|
@ -188,7 +188,7 @@ PaddleOCR
|
|||
│ │ └── sast_postprocess.py // SAST post-processing
|
||||
│ └── utils // utils
|
||||
│ ├── dict // Minor language dictionary
|
||||
│ ....
|
||||
│ ....
|
||||
│ ├── ic15_dict.txt // English number dictionary, case sensitive
|
||||
│ ├── ppocr_keys_v1.txt // Chinese dictionary for training Chinese models
|
||||
│ ├── logging.py // logger
|
||||
|
@ -210,10 +210,10 @@ PaddleOCR
|
|||
│ ├── program.py // Inference system
|
||||
│ ├── test_hubserving.py
|
||||
│ └── train.py // Start training script
|
||||
├── paddleocr.py
|
||||
├── paddleocr.py
|
||||
├── README_ch.md // Chinese documentation
|
||||
├── README_en.md // English documentation
|
||||
├── README.md // Home page documentation
|
||||
├── requirments.txt // Requirments
|
||||
├── requirements.txt // Requirements
|
||||
├── setup.py // Whl package packaging script
|
||||
├── train.sh // Start training bash script
|
||||
├── train.sh // Start training bash script
|
||||
|
|
After Width: | Height: | Size: 126 KiB |
After Width: | Height: | Size: 42 KiB |
After Width: | Height: | Size: 89 KiB |
After Width: | Height: | Size: 67 KiB |
After Width: | Height: | Size: 100 KiB |
After Width: | Height: | Size: 150 KiB |
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 122 KiB |
After Width: | Height: | Size: 98 KiB |
After Width: | Height: | Size: 49 KiB |
BIN
doc/imgs/10.jpg
Before Width: | Height: | Size: 25 KiB |
BIN
doc/imgs/13.png
Before Width: | Height: | Size: 1012 KiB |
BIN
doc/imgs/15.jpg
Before Width: | Height: | Size: 198 KiB |
BIN
doc/imgs/16.png
Before Width: | Height: | Size: 226 KiB |
BIN
doc/imgs/17.png
Before Width: | Height: | Size: 167 KiB |
BIN
doc/imgs/2.jpg
Before Width: | Height: | Size: 44 KiB |
BIN
doc/imgs/22.jpg
Before Width: | Height: | Size: 47 KiB |
BIN
doc/imgs/3.jpg
Before Width: | Height: | Size: 233 KiB |
BIN
doc/imgs/4.jpg
Before Width: | Height: | Size: 51 KiB |
BIN
doc/imgs/5.jpg
Before Width: | Height: | Size: 62 KiB |
BIN
doc/imgs/6.jpg
Before Width: | Height: | Size: 81 KiB |
BIN
doc/imgs/7.jpg
Before Width: | Height: | Size: 90 KiB |
BIN
doc/imgs/8.jpg
Before Width: | Height: | Size: 28 KiB |
BIN
doc/imgs/9.jpg
Before Width: | Height: | Size: 246 KiB |
Before Width: | Height: | Size: 92 KiB |
After Width: | Height: | Size: 86 KiB |
Before Width: | Height: | Size: 77 KiB |
Before Width: | Height: | Size: 76 KiB |
|
@ -47,11 +47,12 @@ class DBLoss(nn.Layer):
|
|||
negative_ratio=ohem_ratio)
|
||||
|
||||
def forward(self, predicts, labels):
|
||||
predict_maps = predicts['maps']
|
||||
label_threshold_map, label_threshold_mask, label_shrink_map, label_shrink_mask = labels[
|
||||
1:]
|
||||
shrink_maps = predicts[:, 0, :, :]
|
||||
threshold_maps = predicts[:, 1, :, :]
|
||||
binary_maps = predicts[:, 2, :, :]
|
||||
shrink_maps = predict_maps[:, 0, :, :]
|
||||
threshold_maps = predict_maps[:, 1, :, :]
|
||||
binary_maps = predict_maps[:, 2, :, :]
|
||||
|
||||
loss_shrink_maps = self.bce_loss(shrink_maps, label_shrink_map,
|
||||
label_shrink_mask)
|
||||
|
|
|
@ -120,9 +120,9 @@ class DBHead(nn.Layer):
|
|||
def forward(self, x):
|
||||
shrink_maps = self.binarize(x)
|
||||
if not self.training:
|
||||
return shrink_maps
|
||||
return {'maps': shrink_maps}
|
||||
|
||||
threshold_maps = self.thresh(x)
|
||||
binary_maps = self.step_function(shrink_maps, threshold_maps)
|
||||
y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
|
||||
return y
|
||||
return {'maps': y}
|
||||
|
|
|
@ -40,7 +40,8 @@ class DBPostProcess(object):
|
|||
self.max_candidates = max_candidates
|
||||
self.unclip_ratio = unclip_ratio
|
||||
self.min_size = 3
|
||||
self.dilation_kernel = None if not use_dilation else np.array([[1, 1], [1, 1]])
|
||||
self.dilation_kernel = None if not use_dilation else np.array(
|
||||
[[1, 1], [1, 1]])
|
||||
|
||||
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
|
||||
'''
|
||||
|
@ -132,7 +133,8 @@ class DBPostProcess(object):
|
|||
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
|
||||
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
|
||||
|
||||
def __call__(self, pred, shape_list):
|
||||
def __call__(self, outs_dict, shape_list):
|
||||
pred = outs_dict['maps']
|
||||
if isinstance(pred, paddle.Tensor):
|
||||
pred = pred.numpy()
|
||||
pred = pred[:, 0, :, :]
|
||||
|
|
|
@ -102,7 +102,6 @@ def init_model(config, model, logger, optimizer=None, lr_scheduler=None):
|
|||
best_model_dict = states_dict.get('best_model_dict', {})
|
||||
if 'epoch' in states_dict:
|
||||
best_model_dict['start_epoch'] = states_dict['epoch'] + 1
|
||||
best_model_dict['start_epoch'] = best_model_dict['best_epoch'] + 1
|
||||
|
||||
logger.info("resume from {}".format(checkpoints))
|
||||
elif pretrained_model:
|
||||
|
|
|
@ -65,12 +65,12 @@ class TextDetector(object):
|
|||
postprocess_params["unclip_ratio"] = args.det_db_unclip_ratio
|
||||
postprocess_params["use_dilation"] = True
|
||||
elif self.det_algorithm == "EAST":
|
||||
postprocess_params['name'] = 'EASTPostProcess'
|
||||
postprocess_params['name'] = 'EASTPostProcess'
|
||||
postprocess_params["score_thresh"] = args.det_east_score_thresh
|
||||
postprocess_params["cover_thresh"] = args.det_east_cover_thresh
|
||||
postprocess_params["nms_thresh"] = args.det_east_nms_thresh
|
||||
elif self.det_algorithm == "SAST":
|
||||
postprocess_params['name'] = 'SASTPostProcess'
|
||||
postprocess_params['name'] = 'SASTPostProcess'
|
||||
postprocess_params["score_thresh"] = args.det_sast_score_thresh
|
||||
postprocess_params["nms_thresh"] = args.det_sast_nms_thresh
|
||||
self.det_sast_polygon = args.det_sast_polygon
|
||||
|
@ -177,8 +177,10 @@ class TextDetector(object):
|
|||
preds['f_score'] = outputs[1]
|
||||
preds['f_tco'] = outputs[2]
|
||||
preds['f_tvo'] = outputs[3]
|
||||
elif self.det_algorithm == 'DB':
|
||||
preds['maps'] = outputs[0]
|
||||
else:
|
||||
preds = outputs[0]
|
||||
raise NotImplementedError
|
||||
|
||||
post_result = self.postprocess_op(preds, shape_list)
|
||||
dt_boxes = post_result[0]['points']
|
||||
|
|