PaddleOCR/doc/doc_en/detection_en.md

# TEXT DETECTION

This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.

## DATA PREPARATION
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
```
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

The provided annotation file format is as follow, seperated by "\t":
```
" Image file name             Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries. 

The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.


## TRAINING

First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# Download the pre-trained model of ResNet50
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar

# decompressing the pre-training model file, take MobileNetV3 as an example
tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/

# Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:
./pretrain_models/MobileNetV3_large_x0_5_pretrained/
  └─ conv_last_bn_mean
  └─ conv_last_bn_offset
  └─ conv_last_bn_scale
  └─ conv_last_bn_variance
  └─ ......

```

#### START TRAINING
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
```

In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).

You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
```
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
```

#### load trained model and conntinue training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

For example:
```
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
```

**Note**:The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.


## EVALUATION

PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.

Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`

When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.

```
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.

Such as:
```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```

* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.

## TEST

Test the detection result on a single image:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
```

When testing the DB model, adjust the post-processing threshold:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```


Test the detection result on all images in the folder:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
```
update docs 2020-06-24 18:13:03 +08:00			`# TEXT DETECTION`
restore missing files 2020-06-08 10:16:26 +08:00
update docs 2020-08-18 11:44:24 +08:00			`This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.`
restore missing files 2020-06-08 10:16:26 +08:00
update docs 2020-06-24 18:13:03 +08:00			`## DATA PREPARATION`
restore missing files 2020-06-08 10:16:26 +08:00			`The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.`

			`Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:`
			```
			`# Under the PaddleOCR path`
			`cd PaddleOCR/`
			`wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt`
			`wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt`
			```

			`After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:`
			```
			`/PaddleOCR/train_data/icdar2015/text_localization/`
			`└─ icdar_c4_train_imgs/ Training data of icdar dataset`
			`└─ ch4_test_images/ Testing data of icdar dataset`
			`└─ train_icdar2015_label.txt Training annotation of icdar dataset`
			`└─ test_icdar2015_label.txt Test annotation of icdar dataset`
			```

add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			`The provided annotation file format is as follow, seperated by "\t":`
restore missing files 2020-06-08 10:16:26 +08:00			```
			`" Image file name Image annotation information encoded by json.dumps"`
fix typo and opt doc 2020-08-03 17:44:41 +08:00			`ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]`
restore missing files 2020-06-08 10:16:26 +08:00			```
update docs 2020-08-18 11:44:24 +08:00			`The image annotation after json.dumps() encoding is a list containing multiple dictionaries.`
restore missing files 2020-06-08 10:16:26 +08:00
update docs 2020-08-18 11:44:24 +08:00			The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

			`transcription` represents the text of the current text box. When its content is "###" it means that the text box is invalid and will be skipped during training.

			`If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.`
restore missing files 2020-06-08 10:16:26 +08:00

update docs 2020-06-24 18:30:14 +08:00			`## TRAINING`
restore missing files 2020-06-08 10:16:26 +08:00
			`First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.`
			```
			`cd PaddleOCR/`
			`# Download the pre-trained model of MobileNetV3`
			`wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar`
			`# Download the pre-trained model of ResNet50`
			`wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar`
add doc of decommpressing pretrain weight file 2020-06-10 20:44:06 +08:00
			`# decompressing the pre-training model file, take MobileNetV3 as an example`
			`tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/`

			`# Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:`
			`./pretrain_models/MobileNetV3_large_x0_5_pretrained/`
			`└─ conv_last_bn_mean`
			`└─ conv_last_bn_offset`
			`└─ conv_last_bn_scale`
			`└─ conv_last_bn_variance`
			`└─ ......`

restore missing files 2020-06-08 10:16:26 +08:00			```

update docs 2020-08-18 11:44:24 +08:00			`#### START TRAINING`
Update detection_en.md 2020-07-17 12:30:00 +08:00			If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.
restore missing files 2020-06-08 10:16:26 +08:00			```
			`python3 tools/train.py -c configs/det/det_mv3_db.yml`
			```

Update detection_en.md 2020-07-17 12:32:16 +08:00			In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
			`For a detailed explanation of the configuration file, please refer to [config](./config_en.md).`
restore missing files 2020-06-08 10:16:26 +08:00
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
restore missing files 2020-06-08 10:16:26 +08:00			```
			`python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001`
			```

update docs 2020-08-18 11:44:24 +08:00			`#### load trained model and conntinue training`
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
optimize doc and error message 2020-06-17 13:02:52 +08:00
			`For example:`
			```
			`python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model`
			```

add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			Note:The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
optimize doc and error message 2020-06-17 13:02:52 +08:00

update docs 2020-06-24 18:13:03 +08:00			`## EVALUATION`
restore missing files 2020-06-08 10:16:26 +08:00
			`PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.`

			Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`

add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
restore missing files 2020-06-08 10:16:26 +08:00
			```
			`python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5`
			```
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
restore missing files 2020-06-08 10:16:26 +08:00
			`Such as:`
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			```shell
restore missing files 2020-06-08 10:16:26 +08:00			`python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5`
			```

add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
restore missing files 2020-06-08 10:16:26 +08:00
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			`## TEST`
restore missing files 2020-06-08 10:16:26 +08:00
			`Test the detection result on a single image:`
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			```shell
restore missing files 2020-06-08 10:16:26 +08:00			`python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"`
			```

			`When testing the DB model, adjust the post-processing threshold:`
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			```shell
restore missing files 2020-06-08 10:16:26 +08:00			`python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5`
			```


			`Test the detection result on all images in the folder:`
add detection_en.md & serving_en.md 2020-07-17 12:20:18 +08:00			```shell
restore missing files 2020-06-08 10:16:26 +08:00			`python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"`
			```