From 4d22bf3af60af7317bdf89e90ff9111af06e25c7 Mon Sep 17 00:00:00 2001 From: Khanh Tran Date: Mon, 8 Jun 2020 09:16:26 +0700 Subject: [PATCH] restore missing files --- doc/config_en.md | 49 +++++++++ doc/customize_en.md | 30 ++++++ doc/detection_en.md | 96 ++++++++++++++++++ doc/inference_en.md | 209 ++++++++++++++++++++++++++++++++++++++ doc/installation_en.md | 79 +++++++++++++++ doc/recognition_en.md | 221 +++++++++++++++++++++++++++++++++++++++++ 6 files changed, 684 insertions(+) create mode 100644 doc/config_en.md create mode 100644 doc/customize_en.md create mode 100644 doc/detection_en.md create mode 100644 doc/inference_en.md create mode 100644 doc/installation_en.md create mode 100644 doc/recognition_en.md diff --git a/doc/config_en.md b/doc/config_en.md new file mode 100644 index 00000000..c9e45035 --- /dev/null +++ b/doc/config_en.md @@ -0,0 +1,49 @@ +# Optional parameters list + +The following list can be viewed via `--help` + +| FLAG | Supported script | Use | Defaults | Note | +| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | +| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** | +| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | + + +## Introduction to Global Parameters of Configuration File + +Take `rec_chinese_lite_train.yml` as an example + + +| Parameter | Use | Default | Note | +| :----------------------: | :---------------------: | :--------------: | :--------------------: | +| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README_en.md) | +| use_gpu | Set using GPU or not | true | \ | +| epoch_num | Maximum training epoch number | 3000 | \ | +| log_smooth_window | Sliding window size | 20 | \ | +| print_batch_step | Set print log interval | 10 | \ | +| save_model_dir | Set model save path | output/{model_name} | \ | +| save_epoch_step | Set model save interval | 3 | \ | +| eval_batch_step | Set the model evaluation interval | 2000 | \ | +|train_batch_size_per_card | Set the batch size during training | 256 | \ | +| test_batch_size_per_card | Set the batch size during testing | 256 | \ | +| image_shape | Set input image size | [3, 32, 100] | \ | +| max_text_length | Set the maximum text length | 25 | \ | +| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch| +| character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ | +| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | +| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | +| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | +| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | +| save_inference_dir | path to save model for inference | None | Use to save inference model | + +## Introduction to Reader parameters of Configuration file + +Take `rec_chinese_reader.yml` as an example: + +| Parameter | Use | Default | Note | +| :----------------------: | :---------------------: | :--------------: | :--------------------: | +| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader | +| num_workers | Set the number of data reading threads | 8 | \ | +| img_set_dir | Image folder path | ./train_data | \ | +| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | +| infer_img | Result folder path | ./infer_img | \| + diff --git a/doc/customize_en.md b/doc/customize_en.md new file mode 100644 index 00000000..99665329 --- /dev/null +++ b/doc/customize_en.md @@ -0,0 +1,30 @@ +# How to make your own ultra-lightweight OCR models? + +The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps. + +## step1: Train text detection model + +PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml +``` +For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md) + +## step2: Train text recognition model + +PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: +``` +python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml +``` +For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md) + +## step3: Concatenate predictions + +PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results. + +When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path of detection model, and the parameter `rec_model_dir` specifies the path of recogniton model. The visualized results are saved to the `./inference_results` folder by default. + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" +``` +For more details about text detection and recognition concatenation, please refer to the document [Inference](./inference.md) diff --git a/doc/detection_en.md b/doc/detection_en.md new file mode 100644 index 00000000..5acba219 --- /dev/null +++ b/doc/detection_en.md @@ -0,0 +1,96 @@ +# Text detection + +This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. + +## Data preparation +The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. + +Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: +``` +# Under the PaddleOCR path +cd PaddleOCR/ +wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt +wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt +``` + +After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are: +``` +/PaddleOCR/train_data/icdar2015/text_localization/ + └─ icdar_c4_train_imgs/ Training data of icdar dataset + └─ ch4_test_images/ Testing data of icdar dataset + └─ train_icdar2015_label.txt Training annotation of icdar dataset + └─ test_icdar2015_label.txt Test annotation of icdar dataset +``` + +The provided annotation file format is as follow: +``` +" Image file name Image annotation information encoded by json.dumps" +ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] +``` +The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. + +`transcription` represents the text of the current text box, and this information is not needed in the text detection task. +If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. + + +## Quickstart training + +First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. +``` +cd PaddleOCR/ +# Download the pre-trained model of MobileNetV3 +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar +# Download the pre-trained model of ResNet50 +wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar +``` + +**Start training** +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml +``` + +In the above instruction, use `-c` to select the training to use the configs/det/det_db_mv3.yml configuration file. +For a detailed explanation of the configuration file, please refer to [link](./doc/config-en.md). + +You can also use the `-o` parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 +``` +python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 +``` + +## Evaluation Indicator + +PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. + +Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` + +When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5. If you use different datasets, different models for training, these two parameters should be adjusted for better result. + +``` +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` +The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file. + +Such as: +``` +python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` + +* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. + +## Test detection result + +Test the detection result on a single image: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" +``` + +When testing the DB model, adjust the post-processing threshold: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 +``` + + +Test the detection result on all images in the folder: +``` +python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" +``` diff --git a/doc/inference_en.md b/doc/inference_en.md new file mode 100644 index 00000000..521654db --- /dev/null +++ b/doc/inference_en.md @@ -0,0 +1,209 @@ + +# Prediction from inference model + +The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. + +The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. + +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). + +Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. + +## Training model to inference model +### Detection model to inference model + +Download the ultra-lightweight Chinese detection model: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ +``` +The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command: +``` +python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ +``` +When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file. +`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved. +After the conversion is successful, there are two files in the `save_inference_dir` directory: +``` +inference/det_db/ + └─ model Check the program file of inference model + └─ params Check the parameter file of the inference model +``` + +### Recognition model to inference model + +Download the ultra-lightweight Chinese recognition model: +``` +wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ +``` + +The recognition model is converted to the inference model in the same way as the detection, as follows: +``` +python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ + Global.save_inference_dir=./inference/rec_crnn/ +``` + +If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path. + +After the conversion is successful, there are two files in the directory: +``` +/inference/rec_crnn/ + └─ model Identify the saved model files + └─ params Identify the parameter files of the inference model +``` + +## Text detection model inference + +The following will introduce the ultra-lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. + +### 1.Ultra-lightweight Chinese detection model inference + +For ultra-lightweight Chinese detection model inference, you can execute the following commands: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" +``` + +The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_2.jpg) + +By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200 +``` + +If you want to use the CPU for prediction, execute the command as follows +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False +``` + +### 2.DB text detection model inference + +First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db" +``` + +DB text detection model inference, you can execute the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" +``` + +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_img_10_db.jpg) + +**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. + +### 3.EAST text detection model inference + +First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" +``` + +For EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type to EAST, run the following command: + +``` +python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" +``` +The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows: + +![](imgs_results/det_res_img_10_east.jpg) + +**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. + + +## Text recognition model inference + +The following will introduce the ultra-lightweight Chinese recognition model inference and CTC loss-based recognition model inference. **The recognition model inference based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. + + +### 1. Ultra-lightweight Chinese recognition model inference + +For ultra-lightweight Chinese recognition model inference, you can execute the following commands: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" +``` + +![](imgs_words/ch/word_4.jpg) + +After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. + +Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] + + +### 2. Recognition model inference based on CTC loss + +Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`. + +First, convert the model saved in the STAR-Net text recognition training process into an inference model. Taking the model based on Resnet34_vd backbone network, using MJSynth and SynthText (two English text recognition synthetic datasets) for training, as an example ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)). It can be converted as follow: + +``` +# Set the yml configuration file of the training algorithm after -c +# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. +# The Global.save_inference_dir parameter sets the address where the converted model will be saved. + +python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet" +``` + +For STAR-Net text recognition model inference, execute the following commands: + +``` +python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" +``` +![](imgs_words_en/word_336.png) + +After executing the command, the recognition result of the above image is as follows: + +Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] + +**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects: + +- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`. + +- Character list: the experiment in the DTRB paper is only for 26 lowercase English characters and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no characters dictionary file is used here, but a dictionary is generated by the below command. Therefore, the parameter `rec_char_type` needs to be set during inference, which is specified as "en" in English. + +``` +self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" +dict_character = list(self.character_str) +``` + +## Text detection and recognition inference concatenation + +### 1. Ultra-lightweight Chinese OCR model inference + +When performing prediction, you need to specify the path of a single image or a collection of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" +``` + +After executing the command, the recognition result image is as follows: + +![](imgs_results/2.jpg) + +### 2. Other model inference + +If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition: + +``` +python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" +``` + +After executing the command, the recognition result image is as follows: + +![](imgs_results/img_10.jpg) diff --git a/doc/installation_en.md b/doc/installation_en.md new file mode 100644 index 00000000..05471c0c --- /dev/null +++ b/doc/installation_en.md @@ -0,0 +1,79 @@ +## Quick installation + +After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. + +PaddleOCR working environment: +- PaddlePaddle1.7 +- python3 +- glibc 2.23 + +It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/). + +1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient. +``` +# Switch to the working directory +cd /home/Projects +# You need to create a docker container for the first run, and do not need to run the current command when you run it again +# Create a docker container named ppocr and map the current directory to the /paddle directory of the container + +#If you want to use docker in a CPU environment, use docker instead of nvidia-docker to create docker +sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash +``` +If you have cuda9 installed on your machine, please run the following command to create a container: +``` +sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash +``` +If you have cuda10 installed on your machine, please run the following command to create a container: +``` +sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash +``` +You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine. +``` +# ctrl+P+Q to exit docker, to re-enter docker using the following command: +sudo docker container exec -it ppocr /bin/bash +``` + +Note: If the docker pull is too slow, you can download and load the docker image manually according to the following steps. Take cuda9 docker for example, you only need to change cuda9 to cuda10 to use cuda10 docker: +``` +# Download the CUDA9 docker compressed file and unzip it +wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz +# To reduce download time, the uploaded docker image is compressed and needs to be decompressed +tar zxf docker_pdocr_cuda9.tar.gz +# Create image +docker load < docker_pdocr_cuda9.tar +# After completing the above steps, check whether the downloaded image is loaded through docker images +docker images +# If you have the following output after executing docker images, you can follow step 1 to create a docker environment. +hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 +``` + +2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress) +``` +pip3 install --upgrade pip + +# If you have cuda9 installed on your machine, please run the following command to install +python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple + +# If you have cuda10 installed on your machine, please run the following command to install +python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple +``` +For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. + + +3. Clone PaddleOCR repo code +``` +# Recommend +git clone https://github.com/PaddlePaddle/PaddleOCR + +# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: + +git clone https://gitee.com/paddlepaddle/PaddleOCR + +# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. +``` + +4. Install third-party libraries +``` +cd PaddleOCR +pip3 install -r requirments.txt +``` diff --git a/doc/recognition_en.md b/doc/recognition_en.md new file mode 100644 index 00000000..a73aeec5 --- /dev/null +++ b/doc/recognition_en.md @@ -0,0 +1,221 @@ +## Text recognition + +### Data preparation + + +PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data: + +Please organize the dataset as follows: + +The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory: + +``` +ln -sf /train_data/dataset +``` + + +* Dataset download + +If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark + +* Use your own dataset: + +If you want to use your own data for training, please refer to the following to organize your data. + +- Training set + +First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label. + +* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error + +``` +" Image file name Image annotation " + +train_data/train_0001.jpg 简单可依赖 +train_data/train_0002.jpg 用科技让复杂的世界更简单 +``` +PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways: + +``` +# Training set label +wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt +# Test Set Label +wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt +``` + +The final training set should have the following file structure: + +``` +|-train_data + |-ic15_data + |- rec_gt_train.txt + |- train + |- word_001.png + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- Test set + +Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows: + +``` +|-train_data + |-ic15_data + |- rec_gt_test.txt + |- test + |- word_001.jpg + |- word_002.jpg + |- word_003.jpg + | ... +``` + +- Dictionary + +Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. + +Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format: + +``` +l +d +a +d +r +n +``` + +In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1] + +`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters. + +`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters. + +You can use them if needed. + +To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. + +### Start training + +PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: + +First download the pretrain model, you can download the trained model to finetune on the icdar2015 data: + +``` +cd PaddleOCR/ +# Download the pre-trained model of MobileNetV3 +wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar +# Decompress model parameters +cd pretrain_models +tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar +``` + +Start training: + +``` +# Set PYTHONPATH path +export PYTHONPATH=$PYTHONPATH:. +# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES +export CUDA_VISIBLE_DEVICES=0,1,2,3 +# Training icdar15 English data +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml +``` + +PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. + +If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training. + +* Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are: + + +| Configuration file | Algorithm | backbone | trans | seq | pred | +| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | +| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | +| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | +| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | +| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | +| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | +| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | +| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | +| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | +| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | +| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | + +For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: +co +Take `rec_mv3_none_none_ctc.yml` as an example: +``` +Global: + ... + # Modify image_shape to fit long text + image_shape: [3, 32, 320] + ... + # Modify character type + character_type: ch + # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary + character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt + ... + # Modify reader type + reader_yml: ./configs/rec/rec_chinese_reader.yml + ... + +... +``` +**Note that the configuration file for prediction/evaluation must be consistent with the training.** + + + +### Evaluation + +The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. + +``` +export CUDA_VISIBLE_DEVICES=0 +# GPU evaluation, Global.checkpoints is the weight to be tested +python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy +``` + +### Prediction + +* Training engine prediction + +Using the model trained by paddleocr, you can quickly get prediction through the following script. + +The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`: + +``` +# Predict English results +python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg +``` + +Input image: + +![](./imgs_words/en/word_1.png) + +Get the prediction result of the input image: + +``` +infer_img: doc/imgs_words/en/word_1.png + index: [19 24 18 23 29] + word : joint +``` + +The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, you can use the following command to predict the Chinese model: + +``` +# Predict Chinese results +python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg +``` + +Input image: + +![](./imgs_words/ch/word_1.jpg) + +Get the prediction result of the input image: + +``` +infer_img: doc/imgs_words/ch/word_1.jpg + index: [2092 177 312 2503] + word : 韩国小馆 +```