update lastest docs 09.06.20
This commit is contained in:
parent
406463efb6
commit
dc66d9a0ec
|
@ -3,7 +3,7 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user
|
|||
|
||||
**Recent updates**
|
||||
- 2020.6.8 Add [dataset](./doc/datasets.md) and keep updating
|
||||
- 2020.6.5 Add `attention` model in `inference_model`
|
||||
- 2020.6.5 Support exporting `attention` model to `inference_model`
|
||||
- 2020.6.5 Support separate prediction and recognition, output result score
|
||||
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
|
||||
- 2020.5.30 Model prediction and training supported on Windows system
|
||||
|
|
|
@ -0,0 +1,45 @@
|
|||
## FAQ
|
||||
|
||||
1. **Prediction error: got an unexpected keyword argument 'gradient_clip'**
|
||||
The installed version of paddle is incorrect. Currently, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.
|
||||
|
||||
2. **Error when converting attention recognition model: KeyError: 'predict'**
|
||||
The inference of recognition model based on attention loss is still in debugging. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as that based on CTC loss.
|
||||
|
||||
3. **About inference speed**
|
||||
When there are many words in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch num. The default value is 30, which can be changed to 10 or other values.
|
||||
|
||||
4. **Service deployment and mobile deployment**
|
||||
It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates.
|
||||
|
||||
5. **Release time of self-developed algorithm**
|
||||
Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
|
||||
|
||||
6. **How to run on Windows or Mac?**
|
||||
PaddleOCR has completed the adaptation to Windows and MAC systems. Two points should be noted during operation:
|
||||
1. In [Quick installation](installation.md), if you do not want to install docker, you can skip the first step and start with the second step.
|
||||
2. When downloading the inference model, if wget is not installed, you can directly click the model link or copy the link address to the browser to download, then extract and place it in the corresponding directory.
|
||||
|
||||
7. **The difference between ultra-lightweight model and General OCR model**
|
||||
At present, PaddleOCR has opensourced two Chinese models, namely 8.6M ultra-lightweight Chinese model and general Chinese OCR model. The comparison information between the two is as follows:
|
||||
- Similarities: Both use the same **algorithm** and **training data**;
|
||||
- Differences: The difference lies in **backbone network** and **channel parameters**, the ultra-lightweight model uses MobileNetV3 as the backbone network, the general model uses Resnet50_vd as the detection model backbone, and Resnet34_vd as the recognition model backbone. You can compare the two model training configuration files to see the differences in parameters.
|
||||
|
||||
|Model|Backbone|Detection configuration file|Recognition configuration file|
|
||||
|-|-|-|-|
|
||||
|8.6M ultra-lightweight Chinese OCR model|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
|
||||
|General Chinese OCR model|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
|
||||
|
||||
8. **Is there a plan to opensource a model that only recognizes numbers or only English + numbers?**
|
||||
It is not planned to opensource numbers only, numbers + English only, or other vertical text models. Paddleocr has opensourced a variety of detection and recognition algorithms for customized training. The two Chinese models are also based on the training output of the open-source algorithm library. You can prepare the data according to the tutorial, choose the appropriate configuration file, train yourselves, and we believe that you can get good result. If you have any questions during the training, you are welcome to open issues or ask in the communication group. We will answer them in time.
|
||||
|
||||
9. **What is the training data used by the open-source model? Can it be opensourced?**
|
||||
At present, the open source model, dataset and magnitude are as follows:
|
||||
- Detection:
|
||||
English dataset: ICDAR2015
|
||||
Chinese dataset: LSVT street view dataset with 3w pictures
|
||||
- Recognition:
|
||||
English dataset: MJSynth and SynthText synthetic dataset, the amount of data is tens of millions.
|
||||
Chinese dataset: LSVT street view dataset with cropped text area, a total of 30w images. In addition, the synthesized data based on LSVT corpus is 500w.
|
||||
|
||||
Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](datasets.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
|
|
@ -0,0 +1,49 @@
|
|||
# 可选参数列表
|
||||
|
||||
以下列表可以通过`--help`查看
|
||||
|
||||
| FLAG | 支持脚本 | 用途 | 默认值 | 备注 |
|
||||
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
|
||||
| -c | ALL | 指定配置文件 | None | **配置模块说明请参考 参数介绍** |
|
||||
| -o | ALL | 设置配置文件里的参数内容 | None | 使用-o配置相较于-c选择的配置文件具有更高的优先级。例如:`-o Global.use_gpu=false` |
|
||||
|
||||
|
||||
## 配置文件 Global 参数介绍
|
||||
|
||||
以 `rec_chinese_lite_train.yml` 为例
|
||||
|
||||
|
||||
| 字段 | 用途 | 默认值 | 备注 |
|
||||
| :----------------------: | :---------------------: | :--------------: | :--------------------: |
|
||||
| algorithm | 设置算法 | 与配置文件同步 | 选择模型,支持模型请参考[简介](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) |
|
||||
| use_gpu | 设置代码运行场所 | true | \ |
|
||||
| epoch_num | 最大训练epoch数 | 3000 | \ |
|
||||
| log_smooth_window | 滑动窗口大小 | 20 | \ |
|
||||
| print_batch_step | 设置打印log间隔 | 10 | \ |
|
||||
| save_model_dir | 设置模型保存路径 | output/{算法名称} | \ |
|
||||
| save_epoch_step | 设置模型保存间隔 | 3 | \ |
|
||||
| eval_batch_step | 设置模型评估间隔 | 2000 | \ |
|
||||
|train_batch_size_per_card | 设置训练时单卡batch size | 256 | \ |
|
||||
| test_batch_size_per_card | 设置评估时单卡batch size | 256 | \ |
|
||||
| image_shape | 设置输入图片尺寸 | [3, 32, 100] | \ |
|
||||
| max_text_length | 设置文本最大长度 | 25 | \ |
|
||||
| character_type | 设置字符类型 | ch | en/ch, en时将使用默认dict,ch时使用自定义dict|
|
||||
| character_dict_path | 设置字典路径 | ./ppocr/utils/ic15_dict.txt | \ |
|
||||
| loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention |
|
||||
| reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ |
|
||||
| pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ |
|
||||
| checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 |
|
||||
| save_inference_dir | inference model 保存路径 | None | 用于保存inference model |
|
||||
|
||||
## 配置文件 Reader 系列参数介绍
|
||||
|
||||
以 `rec_chinese_reader.yml` 为例
|
||||
|
||||
| 字段 | 用途 | 默认值 | 备注 |
|
||||
| :----------------------: | :---------------------: | :--------------: | :--------------------: |
|
||||
| reader_function | 选择数据读取方式 | ppocr.data.rec.dataset_traversal,SimpleReader | 支持SimpleReader / LMDBReader 两种数据读取方式 |
|
||||
| num_workers | 设置数据读取线程数 | 8 | \ |
|
||||
| img_set_dir | 数据集路径 | ./train_data | \ |
|
||||
| label_file_path | 数据标签路径 | ./train_data/rec_gt_train.txt| \ |
|
||||
| infer_img | 预测图像文件夹路径 | ./infer_img | \|
|
||||
|
|
@ -0,0 +1,30 @@
|
|||
# 如何生产自定义超轻量模型?
|
||||
|
||||
生产自定义的超轻量模型可分为三步:训练文本检测模型、训练文本识别模型、模型串联预测。
|
||||
|
||||
## step1:训练文本检测模型
|
||||
|
||||
PaddleOCR提供了EAST、DB两种文本检测算法,均支持MobileNetV3、ResNet50_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的DB检测模型(即超轻量模型使用的配置):
|
||||
```
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml
|
||||
```
|
||||
更详细的数据准备和训练教程参考文档教程中[文本检测模型训练/评估/预测](./detection.md)。
|
||||
|
||||
## step2:训练文本识别模型
|
||||
|
||||
PaddleOCR提供了CRNN、Rosetta、STAR-Net、RARE四种文本识别算法,均支持MobileNetV3、ResNet34_vd两种骨干网络,根据需要选择相应的配置文件,启动训练。例如,训练使用MobileNetV3作为骨干网络的CRNN识别模型(即超轻量模型使用的配置):
|
||||
```
|
||||
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
|
||||
```
|
||||
更详细的数据准备和训练教程参考文档教程中[文本识别模型训练/评估/预测](./recognition.md)。
|
||||
|
||||
## step3:模型串联预测
|
||||
|
||||
PaddleOCR提供了检测和识别模型的串联工具,可以将训练好的任一检测模型和任一识别模型串联成两阶段的文本识别系统。输入图像经过文本检测、检测框矫正、文本识别、得分过滤四个主要阶段输出文本位置和识别结果,同时可选择对结果进行可视化。
|
||||
|
||||
在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/"
|
||||
```
|
||||
更多的文本检测、识别串联推理使用方式请参考文档教程中的[基于预测引擎推理](./inference.md)。
|
|
@ -0,0 +1,59 @@
|
|||
## Dataset
|
||||
This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~
|
||||
- [ICDAR2019-LSVT](#ICDAR2019-LSVT)
|
||||
- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17)
|
||||
- [Chinese Street View Text Recognition](#中文街景文字识别)
|
||||
- [Chinese Document Text Recognition](#中文文档文字识别)
|
||||
- [ICDAR2019-ArT](#ICDAR2019-ArT)
|
||||
|
||||
In addition to opensource data, users can also use synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.
|
||||
|
||||
<a name="ICDAR2019-LSVT"></a>
|
||||
#### 1. ICDAR2019-LSVT
|
||||
- **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt
|
||||
- **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
|
||||
![](datasets/LSVT_1.jpg)
|
||||
(a) Fully labeled data
|
||||
![](datasets/LSVT_2.jpg)
|
||||
(b) Weakly labeled data
|
||||
- **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt
|
||||
|
||||
<a name="ICDAR2017-RCTW-17"></a>
|
||||
#### 2. ICDAR2017-RCTW-17
|
||||
- **Data sources**:https://rctw.vlrlab.net/
|
||||
- **Introduction**:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
|
||||
![](datasets/rctw.jpg)
|
||||
- **Download link**:https://rctw.vlrlab.net/dataset/
|
||||
|
||||
<a name="中文街景文字识别"></a>
|
||||
#### 3. Chinese Street View Text Recognition
|
||||
- **Data sources**:https://aistudio.baidu.com/aistudio/competition/detail/8
|
||||
- **Introduction**:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
|
||||
|
||||
![](datasets/ch_street_rec_1.png)
|
||||
(a) Label: 魅派集成吊顶
|
||||
![](datasets/ch_street_rec_2.png)
|
||||
(b) Label: 母婴用品连锁
|
||||
- **Download link**
|
||||
https://aistudio.baidu.com/aistudio/datasetdetail/8429
|
||||
|
||||
<a name="中文文档文字识别"></a>
|
||||
#### 4. Chinese Document Text Recognition
|
||||
- **Data sources**:https://github.com/YCG09/chinese_ocr
|
||||
- **Introduction**:
|
||||
- A total of 3.64 million pictures are divided into training set and validation set according to 99:1.
|
||||
- Using Chinese corpus (news + classical Chinese), the data is randomly generated through changes in font, size, grayscale, blur, perspective, stretching, etc.
|
||||
- 5990 characters including Chinese characters, English letters, numbers and punctuation(Characters set: https://github.com/YCG09/chinese_ocr/blob/master/train/char_std_5990.txt )
|
||||
- Each sample is fixed with 10 characters, and the characters are randomly intercepted from the sentences in the corpus
|
||||
- Image resolution is 280x32
|
||||
![](datasets/ch_doc1.jpg)
|
||||
![](datasets/ch_doc2.jpg)
|
||||
![](datasets/ch_doc3.jpg)
|
||||
- **Download link**:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (Password: lu7m)
|
||||
|
||||
<a name="ICDAR2019-ArT"></a>
|
||||
#### 5、ICDAR2019-ArT
|
||||
- **Data source**:https://ai.baidu.com/broad/introduction?dataset=art
|
||||
- **Introduction**:It includes 10166 images, 5603 in training sets and 4563 in test sets. It is composed of three parts: total text, scut-ctw1500 and Baidu curved scene text, including text with various shapes such as horizontal, multi-directional and curved.
|
||||
![](datasets/ArT.jpg)
|
||||
- **Download link**:https://ai.baidu.com/broad/download?dataset=art
|
|
@ -0,0 +1,10 @@
|
|||
# Recent updates
|
||||
|
||||
- 2020.6.5 Support exporting `attention` model to `inference_model`
|
||||
- 2020.6.5 Support separate prediction and recognition, output result score
|
||||
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
|
||||
- 2020.5.30 Model prediction and training support on Windows system
|
||||
- 2020.5.30 Open source general Chinese OCR model
|
||||
- 2020.5.14 Release [PaddleOCR Open Class](https://www.bilibili.com/video/BV1nf4y1U7RX?p=4)
|
||||
- 2020.5.14 Release [PaddleOCR Practice Notebook](https://aistudio.baidu.com/aistudio/projectdetail/467229)
|
||||
- 2020.5.14 Open source 8.6M ultra-lightweight Chinese OCR model
|
Loading…
Reference in New Issue