PaddleOCR/doc/doc_en/angle_class_en.md

4.0 KiB

TEXT ANGLE CLASSIFICATION

DATA PREPARATION

Please organize the dataset as follows:

The default storage path for training data is PaddleOCR/train_data/cls, if you already have a dataset on your disk, just create a soft link to the dataset directory:

ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset

please refer to the following to organize your data.

  • Training set

First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label.

  • Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error

0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively.

" Image file name           Image annotation "

train_data/word_001.jpg   0
train_data/word_002.jpg   180

The final training set should have the following file structure:

|-train_data
    |-cls
        |- cls_gt_train.txt
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
            | ...
  • Test set

Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows:

|-train_data
    |-cls
        |- cls_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...

TRAINING

PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.

Start training:

# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Training icdar15 English data
python3 tools/train.py -c configs/cls/cls_mv3.yml
  • Data Augmentation

PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set distort: true in the configuration file.

The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.

Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: randaugment.py img_tools.py

  • Training

PaddleOCR supports alternating training and evaluation. You can modify eval_batch_step in configs/cls/cls_mv3.yml to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under output/cls_mv3/best_accuracy during the evaluation process.

If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.

Note that the configuration file for prediction/evaluation must be consistent with the training.

EVALUATION

The evaluation data set can be modified via configs/cls/cls_reader.yml setting of label_file_path in EvalReader.

export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy

PREDICTION

  • Training engine prediction

Using the model trained by paddleocr, you can quickly get prediction through the following script.

The default prediction picture is stored in infer_img, and the weight is specified via -o Global.checkpoints:

# Predict English results
python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png

Input image:

Get the prediction result of the input image:

infer_img: doc/imgs_words/en/word_1.png
    scores: [[0.93161047 0.06838956]]
    label: [0]