Merge remote-tracking branch 'upstream/dygraph' into dy3

This commit is contained in:
Leif 2020-12-17 15:18:20 +08:00
commit 246a0bce7d
51 changed files with 155 additions and 79 deletions

19
README_ch.md Normal file → Executable file
View File

@ -9,7 +9,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
**近期更新**
- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.12.07 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题总数124个并且计划以后每周一都会更新,欢迎大家持续关注。
- 2020.12.14 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题总数127个每周一都会更新,欢迎大家持续关注。
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md)辅助开发者高效完成标注任务输出格式与PP-OCR训练任务完美衔接。
- 2020.9.22 更新PP-OCR技术文章https://arxiv.org/abs/2009.09941
- [More](./doc/doc_ch/update.md)
@ -39,6 +39,14 @@ PaddleOCR同时支持动态图与静态图两种编程范式
上图是通用ppocr_server模型效果展示更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
## 快速体验
- PC端超轻量级中文OCR在线体验地址https://www.paddlepaddle.org.cn/hub/scene/ocr
@ -121,7 +129,7 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
- 英文模型
<div align="center">
<img src="./doc/imgs_results/img_12.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
</div>
- 其他语言模型
@ -130,13 +138,6 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
<img src="./doc/imgs_results/korean.jpg" width="800">
</div>
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
请扫描下面二维码完成问卷填写获取加群二维码和OCR方向的炼丹秘籍
<div align="center">
<img src="./doc/joinus.PNG" width = "200" height = "200" />
</div>
<a name="许可证书"></a>
## 许可证书

View File

@ -72,7 +72,10 @@ fusion_generator:
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
```
* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
For example, enter the following image and corpus `PaddleOCR`.
@ -116,9 +119,17 @@ In actual application scenarios, it is often necessary to synthesize pictures in
* `CorpusGenerator`
* `method`Method of CorpusGeneratorsupports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is usedNo other configuration is neededotherwise you need to set `corpus_file` and `language`.
* `language`Language of the corpus.
* `corpus_file`: Filepath of the corpus.
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings'\n'. Corpus generator samples one line each time.
Example of corpus file:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
```
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
<div align="center">
@ -130,7 +141,18 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
``` bash
python -m tools.synth_dataset.py -c configs/dataset_config.yml
```
We also provide example corpus and images in `examples` folder.
<div align="center">
<img src="examples/style_images/1.jpg" width="300">
<img src="examples/style_images/2.jpg" width="300">
</div>
If you run the code above directly, you will get example output data in `output_data` folder.
You will get synthesis images and labels as below:
<div align="center">
<img src="doc/images/12.png" width="800">
</div>
There will be some cache under the `label` folder. If the program exit unexpectedly, you can find cached labels there.
When the program finish normally, you will find all the labels in `label.txt` which give the final results.
<a name="Applications"></a>
### Applications

View File

@ -63,7 +63,10 @@ fusion_generator:
```python
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
```
* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
* 注1语言选项和语料相对应目前该工具只支持英文、简体中文和韩语。
* 注2Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计我们主要支持高度在32左右的风格图像。
如果输入图像尺寸相差过多,效果可能不佳。
例如,输入如下图片和语料"PaddleOCR":
@ -102,7 +105,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
* `CorpusGenerator`
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`
* `language`:语料的语种;
* `corpus_file`: 语料文件路径。
* `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。
语料文件格式示例:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
...
```
Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像便于合成场景丰富的文本图像下图给出了一些示例。
@ -117,6 +129,19 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
``` bash
python -m tools.synth_dataset -c configs/dataset_config.yml
```
我们在examples目录下提供了样例图片和语料。
<div align="center">
<img src="examples/style_images/1.jpg" width="300">
<img src="examples/style_images/2.jpg" width="300">
</div>
直接运行上述命令可以在output_data中产生样例输出包括图片和用于训练识别模型的标注文件
<div align="center">
<img src="doc/images/12.png" width="800">
</div>
其中label目录下的标注文件为程序运行过程中产生的缓存如果程序在中途异常终止可以使用缓存的标注文件。
如果程序正常运行完毕则会在output_data下生成label.txt为最终的标注结果。
<a name="应用案例"></a>
### 四、应用案例

View File

@ -33,7 +33,7 @@ Predictor:
- 0.5
expand_result: false
bg_generator:
pretrain: models/style_text_rec/bg_generator
pretrain: style_text_models/bg_generator
module_name: bg_generator
generator_type: BgGeneratorWithMask
encode_dim: 64
@ -43,7 +43,7 @@ Predictor:
conv_block_dilation: true
output_factor: 1.05
text_generator:
pretrain: models/style_text_rec/text_generator
pretrain: style_text_models/text_generator
module_name: text_generator
generator_type: TextGenerator
encode_dim: 64
@ -52,7 +52,7 @@ Predictor:
conv_block_dropout: false
conv_block_dilation: true
fusion_generator:
pretrain: models/style_text_rec/fusion_generator
pretrain: style_text_models/fusion_generator
module_name: fusion_generator
generator_type: FusionGeneratorSimple
encode_dim: 64

BIN
StyleText/doc/images/12.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

View File

@ -1,2 +1,2 @@
PaddleOCR
Paddle
飞桨文字识别

View File

@ -2,11 +2,11 @@ Global:
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
print_batch_step: 10
save_model_dir: ./output/db_mv3/
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
# evaluation is run every 2000 iterations
eval_batch_step: [0, 2000]
# if pretrained_model is saved in static mode, load_static_weights must set to True
load_static_weights: True
cal_metric_during_train: False
@ -39,7 +39,7 @@ Loss:
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
@ -100,7 +100,7 @@ Train:
loader:
shuffle: True
drop_last: False
batch_size_per_card: 4
batch_size_per_card: 16
num_workers: 8
Eval:
@ -128,4 +128,4 @@ Eval:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
num_workers: 8

View File

@ -5,8 +5,8 @@ Global:
print_batch_step: 10
save_model_dir: ./output/det_r50_vd/
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [5000,4000]
# evaluation is run every 2000 iterations
eval_batch_step: [0,2000]
# if pretrained_model is saved in static mode, load_static_weights must set to True
load_static_weights: True
cal_metric_during_train: False

View File

@ -60,7 +60,8 @@ Metric:
Train:
dataset:
name: SimpleDataSet
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
data_dir: ./train_data/
label_file_list: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
data_ratio_list: [0.5, 0.5]
transforms:
- DecodeImage: # load image

View File

@ -103,17 +103,17 @@ make inference_lib_dist
更多编译参数选项可以参考Paddle C++预测库官网:[https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
* 编译完成之后,可以在`build/fluid_inference_install_dir/`文件下看到生成了以下文件及文件夹。
* 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。
```
build/fluid_inference_install_dir/
build/paddle_inference_install_dir/
|-- CMakeCache.txt
|-- paddle
|-- third_party
|-- version.txt
```
其中`paddle`就是之后进行C++预测所需的Paddle库`version.txt`中包含当前预测库的版本信息。
其中`paddle`就是C++预测所需的Paddle库`version.txt`中包含当前预测库的版本信息。
#### 1.2.2 直接下载安装

View File

@ -11,10 +11,15 @@ max_side_len 960
det_db_thresh 0.3
det_db_box_thresh 0.5
det_db_unclip_ratio 2.0
det_model_dir ./inference/det_db
det_model_dir ./inference/ch__ppocr_mobile_v2.0_det_infer/
# cls config
use_angle_cls 0
cls_model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer/
cls_thresh 0.9
# rec config
rec_model_dir ./inference/rec_crnn
rec_model_dir ./inference/ch_ppocr_mobile_v2.0_rec_infer/
char_list_file ../../ppocr/utils/ppocr_keys_v1.txt
# show the detection results

56
doc/doc_ch/FAQ.md Normal file → Executable file
View File

@ -9,44 +9,42 @@
## PaddleOCR常见问题汇总(持续更新)
* [近期更新2020.12.07](#近期更新)
* [近期更新2020.12.14](#近期更新)
* [【精选】OCR精选10个问题](#OCR精选10个问题)
* [【理论篇】OCR通用30个问题](#OCR通用问题)
* [基础知识7题](#基础知识)
* [数据集7题](#数据集2)
* [模型训练调优7题](#模型训练调优2)
* [预测部署9题](#预测部署2)
* [【实战篇】PaddleOCR实战84个问题](#PaddleOCR实战问题)
* [使用咨询20题](#使用咨询)
* [【实战篇】PaddleOCR实战87个问题](#PaddleOCR实战问题)
* [使用咨询21题](#使用咨询)
* [数据集17题](#数据集3)
* [模型训练调优24题](#模型训练调优3)
* [预测部署23题](#预测部署3)
* [模型训练调优25题](#模型训练调优3)
* [预测部署24题](#预测部署3)
<a name="近期更新"></a>
## 近期更新2020.12.07
## 近期更新2020.12.14
#### Q2.4.9弯曲文本有试过opencv的TPS进行弯曲校正吗?
#### Q3.1.21PaddleOCR支持动态图吗?
**A**opencv的tps需要标出上下边界对应的点这些点很难通过传统方法或者深度学习方法获取。PaddleOCR里StarNet网络中的tps模块实现了自动学点自动校正可以直接尝试这个
**A**动态图版本正在紧锣密鼓开发中将于2020年12月16日发布敬请关注
#### Q3.3.20: 文字检测时怎么模糊的数据增强?
#### Q3.3.23检测模型训练或预测时出现elementwise_add报错
**A**: 模糊的数据增强需要修改代码进行添加以DB为例参考[Normalize](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/operators.py#L60) ,添加模糊的增强就行
**A**设置的输入尺寸必须是32的倍数否则在网络多次下采样和上采样后feature map会产生1个像素的diff从而导致elementwise_add时报shape不匹配的错误。
#### Q3.3.21: 文字检测时怎么更改图片旋转的角度实现360度任意旋转
#### Q3.3.24: DB检测训练输入尺寸640可以改大一些吗
**A**: 将[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/iaa_augment.py#L64) 的(-10,10) 改为(-180,180)即可
**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸并非直接将原图进行resize多数场景下这个尺寸并不小了改大后可能反而并不合适而且训练会变慢。另外代码里可能有的地方参数按照预设输入尺寸适配的改大后可能有隐藏风险。
#### Q3.3.22: 训练数据的长宽比过大怎么修改shape
#### Q3.3.25: 识别模型训练时loss能正常下降但acc一直为0
**A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) ,
检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85)
**A**: 识别模型训练初期acc为0是正常的多训一段时间指标就上来了。
#### Q3.4.24DB模型能正确推理预测但换成EAST或SAST模型时报错或结果不正确
#### Q3.4.23安装paddleocr后提示没有paddle
**A**这是因为paddlepaddle gpu版本和cpu版本的名称不一致现在已经在[whl的文档](./whl.md)里做了安装说明。
**A**使用EAST或SAST模型进行推理预测时需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST"使用DB时不用指定是因为该参数默认值是"DB"https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43
<a name="OCR精选10个问题"></a>
## 【精选】OCR精选10个问题
@ -390,6 +388,10 @@
**A**PaddleOCR主要聚焦通用ocr如果有垂类需求您可以用PaddleOCR+垂类数据自己训练;
如果缺少带标注的数据或者不想投入研发成本建议直接调用开放的API开放的API覆盖了目前比较常见的一些垂类。
#### Q3.1.21PaddleOCR支持动态图吗
**A**动态图版本正在紧锣密鼓开发中将于2020年12月16日发布敬请关注。
<a name="数据集3"></a>
### 数据集
@ -603,6 +605,18 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
**A**: 识别修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yaml#L75) ,
检测修改[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml#L85)
#### Q3.3.23检测模型训练或预测时出现elementwise_add报错
**A**设置的输入尺寸必须是32的倍数否则在网络多次下采样和上采样后feature map会产生1个像素的diff从而导致elementwise_add时报shape不匹配的错误。
#### Q3.3.24: DB检测训练输入尺寸640可以改大一些吗
**A**: 不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸并非直接将原图进行resize多数场景下这个尺寸并不小了改大后可能反而并不合适而且训练会变慢。另外代码里可能有的地方参数按照预设输入尺寸适配的改大后可能有隐藏风险。
#### Q3.3.25: 识别模型训练时loss能正常下降但acc一直为0
**A**: 识别模型训练初期acc为0是正常的多训一段时间指标就上来了。
<a name="预测部署3"></a>
### 预测部署
@ -710,4 +724,8 @@ ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
#### Q3.4.23安装paddleocr后提示没有paddle
**A**这是因为paddlepaddle gpu版本和cpu版本的名称不一致现在已经在[whl的文档](./whl.md)里做了安装说明。
**A**这是因为paddlepaddle gpu版本和cpu版本的名称不一致现在已经在[whl的文档](./whl.md)里做了安装说明。
#### Q3.4.24DB模型能正确推理预测但换成EAST或SAST模型时报错或结果不正确
**A**使用EAST或SAST模型进行推理预测时需要在命令中指定参数--det_algorithm="EAST" 或 --det_algorithm="SAST"使用DB时不用指定是因为该参数默认值是"DB"https://github.com/PaddlePaddle/PaddleOCR/blob/e7a708e9fdaf413ed7a14da8e4a7b4ac0b211e42/tools/infer/utility.py#L43

View File

@ -131,12 +131,12 @@ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_mo
# 下载超轻量中文检测模型:
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
![](../imgs_results/det_res_22.jpg)
![](../imgs_results/det_res_00018069.jpg)
通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制,
`litmit_type`可选参数为[`max`, `min`]

View File

@ -58,7 +58,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
**4. 安装第三方库**
```
cd PaddleOCR
pip3 install -r requirments.txt
pip3 install -r requirements.txt
```
注意windows环境下建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装

View File

@ -115,7 +115,7 @@ PaddleOCR
│ │ │ ├── text_image_aug // 文本识别的 tia 数据扩充
│ │ │ │ ├── __init__.py
│ │ │ │ ├── augment.py // tia_distort,tia_stretch 和 tia_perspective 的代码
│ │ │ │ ├── warp_mls.py
│ │ │ │ ├── warp_mls.py
│ │ │ ├── __init__.py
│ │ │ ├── east_process.py // EAST 算法的数据处理步骤
│ │ │ ├── make_border_map.py // 生成边界图
@ -167,7 +167,7 @@ PaddleOCR
│ │ │ ├── det_east_head.py // EAST 检测头
│ │ │ ├── det_sast_head.py // SAST 检测头
│ │ │ ├── rec_ctc_head.py // 识别 ctc
│ │ │ ├── rec_att_head.py // 识别 attention
│ │ │ ├── rec_att_head.py // 识别 attention
│ │ ├── transforms // 图像变换
│ │ │ ├── __init__.py // 构造 transform 相关代码
│ │ │ └── tps.py // TPS 变换
@ -185,7 +185,7 @@ PaddleOCR
│ │ └── sast_postprocess.py // SAST 后处理
│ └── utils // 工具
│ ├── dict // 小语种字典
│ ....
│ ....
│ ├── ic15_dict.txt // 英文数字字典,区分大小写
│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型
│ ├── logging.py // logger
@ -207,10 +207,10 @@ PaddleOCR
│ ├── program.py // 整体流程
│ ├── test_hubserving.py
│ └── train.py // 启动训练
├── paddleocr.py
├── paddleocr.py
├── README_ch.md // 中文说明文档
├── README_en.md // 英文说明文档
├── README.md // 主页说明文档
├── requirments.txt // 安装依赖
├── requirements.txt // 安装依赖
├── setup.py // whl包打包脚本
├── train.sh // 启动训练脚本
├── train.sh // 启动训练脚本

View File

@ -138,12 +138,12 @@ For lightweight Chinese detection model inference, you can execute the following
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# predict
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/"
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/"
```
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_22.jpg)
![](../imgs_results/det_res_00018069.jpg)
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and

View File

@ -61,7 +61,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
**4. Install third-party libraries**
```
cd PaddleOCR
pip3 install -r requirments.txt
pip3 install -r requirements.txt
```
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.

View File

@ -116,7 +116,7 @@ PaddleOCR
│ │ │ ├── text_image_aug // Tia data augment for text recognition
│ │ │ │ ├── __init__.py
│ │ │ │ ├── augment.py // Tia_distort,tia_stretch and tia_perspective
│ │ │ │ ├── warp_mls.py
│ │ │ │ ├── warp_mls.py
│ │ │ ├── __init__.py
│ │ │ ├── east_process.py // Data processing steps of EAST algorithm
│ │ │ ├── iaa_augment.py // Data augmentation operations
@ -188,7 +188,7 @@ PaddleOCR
│ │ └── sast_postprocess.py // SAST post-processing
│ └── utils // utils
│ ├── dict // Minor language dictionary
│ ....
│ ....
│ ├── ic15_dict.txt // English number dictionary, case sensitive
│ ├── ppocr_keys_v1.txt // Chinese dictionary for training Chinese models
│ ├── logging.py // logger
@ -210,10 +210,10 @@ PaddleOCR
│ ├── program.py // Inference system
│ ├── test_hubserving.py
│ └── train.py // Start training script
├── paddleocr.py
├── paddleocr.py
├── README_ch.md // Chinese documentation
├── README_en.md // English documentation
├── README.md // Home page documentation
├── requirments.txt // Requirments
├── requirements.txt // Requirements
├── setup.py // Whl package packaging script
├── train.sh // Start training bash script
├── train.sh // Start training bash script

BIN
doc/imgs/00006737.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

BIN
doc/imgs/00009282.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

BIN
doc/imgs/00015504.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

BIN
doc/imgs/00018069.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

BIN
doc/imgs/00056221.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

BIN
doc/imgs/00057937.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

BIN
doc/imgs/00059985.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

BIN
doc/imgs/00077949.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

BIN
doc/imgs/00111002.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

BIN
doc/imgs/00207393.jpg Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1012 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 226 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 233 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 246 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

View File

@ -47,11 +47,12 @@ class DBLoss(nn.Layer):
negative_ratio=ohem_ratio)
def forward(self, predicts, labels):
predict_maps = predicts['maps']
label_threshold_map, label_threshold_mask, label_shrink_map, label_shrink_mask = labels[
1:]
shrink_maps = predicts[:, 0, :, :]
threshold_maps = predicts[:, 1, :, :]
binary_maps = predicts[:, 2, :, :]
shrink_maps = predict_maps[:, 0, :, :]
threshold_maps = predict_maps[:, 1, :, :]
binary_maps = predict_maps[:, 2, :, :]
loss_shrink_maps = self.bce_loss(shrink_maps, label_shrink_map,
label_shrink_mask)

View File

@ -120,9 +120,9 @@ class DBHead(nn.Layer):
def forward(self, x):
shrink_maps = self.binarize(x)
if not self.training:
return shrink_maps
return {'maps': shrink_maps}
threshold_maps = self.thresh(x)
binary_maps = self.step_function(shrink_maps, threshold_maps)
y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
return y
return {'maps': y}

View File

@ -40,7 +40,8 @@ class DBPostProcess(object):
self.max_candidates = max_candidates
self.unclip_ratio = unclip_ratio
self.min_size = 3
self.dilation_kernel = None if not use_dilation else np.array([[1, 1], [1, 1]])
self.dilation_kernel = None if not use_dilation else np.array(
[[1, 1], [1, 1]])
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
'''
@ -132,7 +133,8 @@ class DBPostProcess(object):
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
def __call__(self, pred, shape_list):
def __call__(self, outs_dict, shape_list):
pred = outs_dict['maps']
if isinstance(pred, paddle.Tensor):
pred = pred.numpy()
pred = pred[:, 0, :, :]

View File

@ -102,7 +102,6 @@ def init_model(config, model, logger, optimizer=None, lr_scheduler=None):
best_model_dict = states_dict.get('best_model_dict', {})
if 'epoch' in states_dict:
best_model_dict['start_epoch'] = states_dict['epoch'] + 1
best_model_dict['start_epoch'] = best_model_dict['best_epoch'] + 1
logger.info("resume from {}".format(checkpoints))
elif pretrained_model:

View File

@ -65,12 +65,12 @@ class TextDetector(object):
postprocess_params["unclip_ratio"] = args.det_db_unclip_ratio
postprocess_params["use_dilation"] = True
elif self.det_algorithm == "EAST":
postprocess_params['name'] = 'EASTPostProcess'
postprocess_params['name'] = 'EASTPostProcess'
postprocess_params["score_thresh"] = args.det_east_score_thresh
postprocess_params["cover_thresh"] = args.det_east_cover_thresh
postprocess_params["nms_thresh"] = args.det_east_nms_thresh
elif self.det_algorithm == "SAST":
postprocess_params['name'] = 'SASTPostProcess'
postprocess_params['name'] = 'SASTPostProcess'
postprocess_params["score_thresh"] = args.det_sast_score_thresh
postprocess_params["nms_thresh"] = args.det_sast_nms_thresh
self.det_sast_polygon = args.det_sast_polygon
@ -177,8 +177,10 @@ class TextDetector(object):
preds['f_score'] = outputs[1]
preds['f_tco'] = outputs[2]
preds['f_tvo'] = outputs[3]
elif self.det_algorithm == 'DB':
preds['maps'] = outputs[0]
else:
preds = outputs[0]
raise NotImplementedError
post_result = self.postprocess_op(preds, shape_list)
dt_boxes = post_result[0]['points']