parent
0d4f4d62b1
commit
066a8a70a2
|
@ -82,7 +82,7 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
|
|||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
|
||||
## PP-OCR series model list(Update on September 8th)
|
||||
## PP-OCR Series Model List(Update on September 8th)
|
||||
|
||||
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
|
||||
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||||
|
@ -174,7 +174,7 @@ For a new language request, please refer to [Guideline for new language_requests
|
|||
|
||||
|
||||
<a name="language_requests"></a>
|
||||
## Guideline for new language requests
|
||||
## Guideline for New Language Requests
|
||||
|
||||
If you want to request a new language support, a PR with 2 following files are needed:
|
||||
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# BENCHMARK
|
||||
# Benchmark
|
||||
|
||||
This document gives the performance of the series models for Chinese and English recognition.
|
||||
|
||||
## TEST DATA
|
||||
## Test Data
|
||||
|
||||
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
|
||||
|
||||
|
@ -10,7 +10,7 @@ We collected 300 images for different real application scenarios to evaluate the
|
|||
<img src="../datasets/doc.jpg" width = "1000" height = "500" />
|
||||
</div>
|
||||
|
||||
## MEASUREMENT
|
||||
## Measurement
|
||||
|
||||
Explanation:
|
||||
|
||||
|
|
|
@ -1,23 +1,23 @@
|
|||
# TEXT DETECTION
|
||||
# Text Detection
|
||||
|
||||
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
|
||||
|
||||
- [1. DATA AND WEIGHTS PREPARATIO](#1-data-and-weights-preparatio)
|
||||
* [1.1 DATA PREPARATION](#11-data-preparation)
|
||||
* [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model)
|
||||
- [2. TRAINING](#2-training)
|
||||
* [2.1 START TRAINING](#21-start-training)
|
||||
* [2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING](#22-load-trained-model-and-continue-training)
|
||||
* [2.3 TRAINING WITH NEW BACKBONE](#23-training-with-new-backbone)
|
||||
- [3. EVALUATION AND TEST](#3-evaluation-and-test)
|
||||
* [3.1 EVALUATION](#31-evaluation)
|
||||
* [3.2 TEST](#32-test)
|
||||
- [4. INFERENCE](#4-inference)
|
||||
- [2. FAQ](#2-faq)
|
||||
- [1. Data and Weights Preparation](#1-data-and-weights-preparatio)
|
||||
* [1.1 Data Preparation](#11-data-preparation)
|
||||
* [1.2 Download Pretrained Model](#12-download-pretrained-model)
|
||||
- [2. Training](#2-training)
|
||||
* [2.1 Start Training](#21-start-training)
|
||||
* [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
|
||||
* [2.3 Training with New Backbone](#23-training-with-new-backbone)
|
||||
- [3. Evaluation and Test](#3-evaluation-and-test)
|
||||
* [3.1 Evaluation](#31-evaluation)
|
||||
* [3.2 Test](#32-test)
|
||||
- [4. Inference](#4-inference)
|
||||
- [5. FAQ](#2-faq)
|
||||
|
||||
# 1 DATA AND WEIGHTS PREPARATIO
|
||||
## 1. Data and Weights Preparation
|
||||
|
||||
## 1.1 DATA PREPARATION
|
||||
### 1.1 Data Preparation
|
||||
|
||||
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
|
||||
|
||||
|
@ -59,7 +59,7 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
|
|||
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
|
||||
|
||||
|
||||
## 1.2 DOWNLOAD PRETRAINED MODEL
|
||||
### 1.2 Download Pretrained Model
|
||||
|
||||
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
|
||||
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
|
||||
|
@ -77,7 +77,7 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg
|
|||
|
||||
# 2. TRAINING
|
||||
|
||||
## 2.1 START TRAINING
|
||||
### 2.1 Start Training
|
||||
|
||||
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
|
||||
```shell
|
||||
|
@ -101,7 +101,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
|
|||
|
||||
```
|
||||
|
||||
## 2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING
|
||||
### 2.2 Load Trained Model and Continue Training
|
||||
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
|
||||
|
||||
For example:
|
||||
|
@ -112,7 +112,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
|
|||
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
|
||||
|
||||
|
||||
## 2.3 TRAINING WITH NEW BACKBONE
|
||||
### 2.3 Training with New Backbone
|
||||
|
||||
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
|
||||
necks->heads).
|
||||
|
@ -162,9 +162,9 @@ After adding the four-part modules of the network, you only need to configure th
|
|||
|
||||
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
|
||||
|
||||
# 3. EVALUATION AND TEST
|
||||
## 3. Evaluation and Test
|
||||
|
||||
## 3.1 EVALUATION
|
||||
### 3.1 Evaluation
|
||||
|
||||
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
|
||||
|
||||
|
@ -179,7 +179,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{pat
|
|||
|
||||
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
|
||||
|
||||
## 3.2 TEST
|
||||
### 3.2 Test
|
||||
|
||||
Test the detection result on a single image:
|
||||
```shell
|
||||
|
@ -197,7 +197,7 @@ Test the detection result on all images in the folder:
|
|||
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
|
||||
```
|
||||
|
||||
# 4. INFERENCE
|
||||
## 4. Inference
|
||||
|
||||
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
|
||||
|
||||
|
@ -220,7 +220,7 @@ If it is other detection algorithms, such as the EAST, the det_algorithm paramet
|
|||
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
|
||||
```
|
||||
|
||||
# 2. FAQ
|
||||
## 5. FAQ
|
||||
|
||||
Q1: The prediction results of trained model and inference model are inconsistent?
|
||||
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
|
||||
|
|
|
@ -7,15 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_
|
|||
|
||||
Let's first understand some basic concepts.
|
||||
|
||||
- [INTRODUCTION ABOUT OCR](#introduction-about-ocr)
|
||||
* [Basic concepts of OCR detection model](#basic-concepts-of-ocr-detection-model)
|
||||
* [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
|
||||
* [PP-OCR model](#pp-ocr-model)
|
||||
* [And a table of contents](#and-a-table-of-contents)
|
||||
* [On the right](#on-the-right)
|
||||
- [Introduction about OCR](#introduction-about-ocr)
|
||||
* [Basic Concepts of OCR Detection Model](#basic-concepts-of-ocr-detection-model)
|
||||
* [Basic Concepts of OCR Recognition Model](#basic-concepts-of-ocr-recognition-model)
|
||||
* [PP-OCR Model](#pp-ocr-model)
|
||||
|
||||
|
||||
## 1. INTRODUCTION ABOUT OCR
|
||||
## 1. Introduction about OCR
|
||||
|
||||
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
|
||||
|
||||
|
@ -24,7 +22,7 @@ OCR (Optical Character Recognition, Optical Character Recognition) is currently
|
|||
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
|
||||
|
||||
|
||||
### 1.1 Basic concepts of OCR detection model
|
||||
### 1.1 Basic Concepts of OCR Detection Model
|
||||
|
||||
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
|
||||
|
||||
|
@ -34,14 +32,14 @@ Text detection algorithms based on deep learning can be roughly divided into the
|
|||
3. Hybrid target detection and segmentation method.
|
||||
|
||||
|
||||
### 1.2 Basic concepts of OCR recognition model
|
||||
### 1.2 Basic Concepts of OCR Recognition Model
|
||||
|
||||
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
|
||||
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
|
||||
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
|
||||
|
||||
|
||||
### 1.3 PP-OCR model
|
||||
### 1.3 PP-OCR Model
|
||||
|
||||
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
|
||||
|
||||
|
|
|
@ -36,4 +36,4 @@ If you getting this error `OSError: [WinError 126] The specified module could no
|
|||
|
||||
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
|
||||
|
||||
Reference: [Solve shapely installation on windows](
|
||||
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
|
|
@ -1,24 +1,23 @@
|
|||
# TEXT RECOGNITION
|
||||
# Text Recognition
|
||||
|
||||
- [1 DATA PREPARATION](#DATA_PREPARATION)
|
||||
- [1. Data Preparation](#DATA_PREPARATION)
|
||||
- [1.1 Costom Dataset](#Costom_Dataset)
|
||||
- [1.2 Dataset Download](#Dataset_download)
|
||||
- [1.3 Dictionary](#Dictionary)
|
||||
- [1.4 Add Space Category](#Add_space_category)
|
||||
|
||||
- [2 TRAINING](#TRAINING)
|
||||
- [2. Training](#TRAINING)
|
||||
- [2.1 Data Augmentation](#Data_Augmentation)
|
||||
- [2.2 General Training](#Training)
|
||||
- [2.3 Multi-language Training](#Multi_language)
|
||||
|
||||
- [3 EVALUATION](#EVALUATION)
|
||||
- [3. Evaluation](#EVALUATION)
|
||||
|
||||
- [4 PREDICTION](#PREDICTION)
|
||||
- [4.1 Training engine prediction](#Training_engine_prediction)
|
||||
- [5 CONVERT TO INFERENCE MODEL](#Inference)
|
||||
- [4. Prediction](#PREDICTION)
|
||||
- [5. Convert to Inference Model](#Inference)
|
||||
|
||||
<a name="DATA_PREPARATION"></a>
|
||||
## 1 DATA PREPARATION
|
||||
## 1. Data Preparation
|
||||
|
||||
|
||||
PaddleOCR supports two data formats:
|
||||
|
@ -37,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
|
|||
```
|
||||
|
||||
<a name="Costom_Dataset"></a>
|
||||
### 1.1 Costom dataset
|
||||
### 1.1 Costom Dataset
|
||||
|
||||
If you want to use your own data for training, please refer to the following to organize your data.
|
||||
|
||||
|
@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
|
|||
```
|
||||
|
||||
<a name="Dataset_download"></a>
|
||||
### 1.2 Dataset download
|
||||
### 1.2 Dataset Download
|
||||
|
||||
- ICDAR2015
|
||||
|
||||
|
@ -167,14 +166,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
|
|||
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
|
||||
|
||||
<a name="Add_space_category"></a>
|
||||
### 1.4 Add space category
|
||||
### 1.4 Add Space Category
|
||||
|
||||
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
|
||||
|
||||
**Note: use_space_char only takes effect when character_type=ch**
|
||||
|
||||
<a name="TRAINING"></a>
|
||||
## 2 TRAINING
|
||||
## 2.Training
|
||||
|
||||
<a name="Data_Augmentation"></a>
|
||||
### 2.1 Data Augmentation
|
||||
|
@ -363,7 +362,7 @@ Eval:
|
|||
|
||||
<a name="EVALUATION"></a>
|
||||
|
||||
## 3 EVALUATION
|
||||
## 3. Evalution
|
||||
|
||||
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
|
||||
|
||||
|
@ -373,7 +372,7 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
|
|||
```
|
||||
|
||||
<a name="PREDICTION"></a>
|
||||
## 4 PREDICTION
|
||||
## 4. Prediction
|
||||
|
||||
|
||||
Using the model trained by paddleocr, you can quickly get prediction through the following script.
|
||||
|
@ -437,7 +436,7 @@ infer_img: doc/imgs_words/ch/word_1.jpg
|
|||
|
||||
<a name="Inference"></a>
|
||||
|
||||
## 5 CONVERT TO INFERENCE MODEL
|
||||
## 5. Convert to Inference Model
|
||||
|
||||
The recognition model is converted to the inference model in the same way as the detection, as follows:
|
||||
|
||||
|
|
|
@ -1,14 +1,14 @@
|
|||
# MODEL TRAINING
|
||||
# Model Training
|
||||
|
||||
- [1.Yml Configuration ](#1-Yml-Configuration)
|
||||
- [2. Basic concepts](#1-basic-concepts)
|
||||
* [2.1 Learning rate](#11-learning-rate)
|
||||
- [2. Basic Concepts](#1-basic-concepts)
|
||||
* [2.1 Learning Rate](#11-learning-rate)
|
||||
* [2.2 Regularization](#12-regularization)
|
||||
* [2.3 Evaluation indicators](#13-evaluation-indicators-)
|
||||
- [3. Data and vertical scenes](#2-data-and-vertical-scenes)
|
||||
* [3.1 Training data](#21-training-data)
|
||||
* [3.2 Vertical scene](#22-vertical-scene)
|
||||
* [3.3 Build your own data set](#23-build-your-own-data-set)
|
||||
* [2.3 Evaluation Indicators](#13-evaluation-indicators-)
|
||||
- [3. Data and Vertical Scenes](#2-data-and-vertical-scenes)
|
||||
* [3.1 Training Data](#21-training-data)
|
||||
* [3.2 Vertical Scene](#22-vertical-scene)
|
||||
* [3.3 Build Your Own Dataset](#23-build-your-own-data-set)
|
||||
* [4. FAQ](#3-faq)
|
||||
|
||||
|
||||
|
@ -18,7 +18,7 @@ At the same time, it will briefly introduce the components of the PaddleOCR mode
|
|||
|
||||
<a name="1-Yml-Configuration"></a>
|
||||
|
||||
## 1. Yml configuration
|
||||
## 1. Yml Configuration
|
||||
|
||||
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
|
||||
|
||||
|
@ -26,12 +26,12 @@ For the complete configuration file description, please refer to [Configuration
|
|||
|
||||
<a name="1-basic-concepts"></a>
|
||||
|
||||
## 2. Basic concepts
|
||||
## 2. Basic Concepts
|
||||
|
||||
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
|
||||
|
||||
<a name="11-learning-rate"></a>
|
||||
### 2.1 Learning rate
|
||||
### 2.1 Learning Rate
|
||||
|
||||
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
|
||||
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
|
||||
|
@ -68,7 +68,7 @@ Optimizer:
|
|||
factor: 2.0e-05
|
||||
```
|
||||
<a name="13-evaluation-indicators-"></a>
|
||||
### 2.3 Evaluation indicators
|
||||
### 2.3 Evaluation Indicators
|
||||
|
||||
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
|
||||
|
||||
|
@ -78,11 +78,11 @@ Optimizer:
|
|||
|
||||
<a name="2-data-and-vertical-scenes"></a>
|
||||
|
||||
## 3. Data and vertical scenes
|
||||
## 3. Data and Vertical Scenes
|
||||
|
||||
<a name="21-training-data"></a>
|
||||
|
||||
### 3.1 Training data
|
||||
### 3.1 Training Data
|
||||
|
||||
The current open source models, data sets and magnitudes are as follows:
|
||||
|
||||
|
@ -99,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
|
|||
|
||||
<a name="22-vertical-scene"></a>
|
||||
|
||||
### 3.2 Vertical scene
|
||||
### 3.2 Vertical Scene
|
||||
|
||||
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
|
||||
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
|
||||
|
||||
<a name="23-build-your-own-data-set"></a>
|
||||
|
||||
### 3.3 Build your own data set
|
||||
### 3.3 Build Your Own Dataset
|
||||
|
||||
There are several experiences for reference when constructing the data set:
|
||||
|
||||
|
|
Loading…
Reference in New Issue