PaddleOCR/ppstructure/README.md

127 lines
6.4 KiB
Markdown
Raw Normal View History

2021-08-02 17:04:53 +08:00
# PPStructure
2021-06-11 14:17:59 +08:00
2021-08-02 17:04:53 +08:00
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
2021-07-29 11:51:28 +08:00
## 1. Quick start
### install
2021-08-02 17:04:53 +08:00
**install paddleocr**
2021-06-18 12:55:44 +08:00
2021-08-02 17:04:53 +08:00
ref to [paddleocr whl doc](../doc/doc_en/whl_en.md)
2021-06-11 14:17:59 +08:00
2021-08-02 17:04:53 +08:00
**install layoutparser**
```sh
pip3 install -U premailer https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
2021-07-29 11:51:28 +08:00
```
2021-06-11 14:17:59 +08:00
2021-07-29 11:51:28 +08:00
### 1.2 Use
2021-06-11 14:17:59 +08:00
2021-07-29 11:51:28 +08:00
#### 1.2.1 Use by command line
2021-06-11 14:17:59 +08:00
2021-07-29 11:51:28 +08:00
```bash
2021-08-02 17:04:53 +08:00
paddleocr --image_dir=../doc/table/1.png --type=structure
2021-06-23 12:45:05 +08:00
```
2021-07-29 11:51:28 +08:00
#### 1.2.2 Use by code
2021-06-11 14:17:59 +08:00
```python
2021-06-23 12:28:32 +08:00
import os
2021-06-11 14:17:59 +08:00
import cv2
2021-08-02 17:04:53 +08:00
from paddleocr import PPStructure,draw_structure_result,save_structure_res
2021-06-11 14:17:59 +08:00
2021-08-02 17:04:53 +08:00
table_engine = PPStructure(show_log=True)
2021-06-11 14:17:59 +08:00
2021-06-23 12:28:32 +08:00
save_folder = './output/table'
2021-06-11 14:17:59 +08:00
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
2021-08-02 17:04:53 +08:00
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
2021-06-23 12:28:32 +08:00
2021-06-11 14:17:59 +08:00
for line in result:
2021-08-02 17:22:31 +08:00
line.pop('img')
2021-06-11 14:17:59 +08:00
print(line)
from PIL import Image
2021-08-02 17:04:53 +08:00
font_path = '../doc/fonts/simfang.ttf'
2021-06-11 14:17:59 +08:00
image = Image.open(img_path).convert('RGB')
2021-08-02 17:04:53 +08:00
im_show = draw_structure_result(image, result,font_path=font_path)
2021-06-11 14:17:59 +08:00
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
2021-08-01 16:46:43 +08:00
#### 1.2.3 返回结果说明
2021-08-02 17:04:53 +08:00
The return result of PPStructure is a list composed of a dict, an example is as follows
2021-08-01 16:46:43 +08:00
```shell
[
{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```
The description of each field in dict is as follows
| Parameter | Description |
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|
2021-06-11 14:17:59 +08:00
2021-08-01 16:46:43 +08:00
#### 1.2.4 Parameter Description
2021-07-29 11:51:28 +08:00
| Parameter | Description | Default value |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output | The path where excel and recognition results are saved | ./output/table |
| table_max_len | The long side of the image is resized in table structure model | 488 |
| table_model_dir | inference model path of table structure model | None |
| table_char_type | dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.tx |
Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)
2021-08-02 17:22:31 +08:00
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
2021-07-29 11:51:28 +08:00
2021-08-02 17:04:53 +08:00
## 2. PPStructure Pipeline
2021-07-29 11:51:28 +08:00
the process is as follows
2021-07-29 12:32:00 +08:00
![pipeline](../doc/table/pipeline_en.jpg)
2021-07-29 11:51:28 +08:00
2021-08-02 17:04:53 +08:00
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will be converted to an excel file of the same table style via Table OCR.
2021-06-11 14:17:59 +08:00
2021-07-29 11:51:28 +08:00
### 2.1 LayoutParser
2021-07-29 18:08:33 +08:00
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
2021-07-29 11:51:28 +08:00
2021-08-02 19:52:35 +08:00
### 2.2 Table Structure
2021-07-29 11:51:28 +08:00
Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)
2021-07-29 16:12:44 +08:00
## 3. Predictive by inference engine
2021-07-29 11:51:28 +08:00
Use the following commands to complete the inference.
```python
2021-08-02 19:42:10 +08:00
cd PaddleOCR/ppstructure
# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
python3 table/predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
2021-07-29 11:51:28 +08:00
```
2021-08-02 19:42:10 +08:00
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
2021-06-11 14:17:59 +08:00
2021-07-29 16:12:44 +08:00
**Model List**
2021-06-11 14:17:59 +08:00
2021-07-29 11:51:28 +08:00
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |