PaddleOCR/ppstructure/layout/README.md

142 lines
6.6 KiB
Markdown
Raw Normal View History

2021-08-02 23:42:52 +08:00
English | [简体中文](README_ch.md)
2021-06-16 16:05:37 +08:00
2021-07-29 18:08:33 +08:00
2021-08-02 23:42:52 +08:00
# Getting Started
2021-07-29 18:08:33 +08:00
2021-08-03 10:38:41 +08:00
[1. Install whl package](#Install)
2021-07-29 18:08:33 +08:00
2021-08-03 10:38:41 +08:00
[2. Quick Start](#QuickStart)
2021-07-29 18:08:33 +08:00
2021-08-02 23:42:52 +08:00
[3. PostProcess](#PostProcess)
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
[4. Results](#Results)
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
[5. Training](#Training)
2021-08-03 10:38:41 +08:00
<a name="Install"></a>
2021-08-02 23:42:52 +08:00
## 1. Install whl package
2021-06-16 16:05:37 +08:00
```bash
2021-08-02 23:42:52 +08:00
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
2021-06-16 16:05:37 +08:00
```
2021-08-03 10:38:41 +08:00
<a name="QuickStart"></a>
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
## 2. Quick Start
2021-06-16 16:05:37 +08:00
2021-08-03 10:39:10 +08:00
Use LayoutParser to identify the layout of a document:
2021-06-16 16:05:37 +08:00
```python
2021-07-29 16:12:44 +08:00
import cv2
2021-06-16 16:05:37 +08:00
import layoutparser as lp
2021-08-01 12:00:30 +08:00
image = cv2.imread("doc/table/layout.jpg")
2021-06-16 16:05:37 +08:00
image = image[..., ::-1]
2021-08-02 23:42:52 +08:00
# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
2021-06-16 16:05:37 +08:00
threshold=0.5,
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
2021-08-02 23:42:52 +08:00
enforce_cpu=False,
2021-06-16 16:05:37 +08:00
enable_mkldnn=True)
2021-08-02 23:42:52 +08:00
# detect
2021-06-16 16:05:37 +08:00
layout = model.detect(image)
2021-08-02 23:42:52 +08:00
# show result
2021-08-01 11:50:11 +08:00
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
2021-06-16 16:05:37 +08:00
```
2021-08-02 23:42:52 +08:00
The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`
2021-06-16 16:05:37 +08:00
<div align="center">
<img src="../../doc/table/result_all.jpg" width = "600" />
</div>
2021-08-02 23:42:52 +08:00
`PaddleDetectionLayoutModel`parameters are described as follows:
| parameter | description | default | remark |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
| config_path | model config path | None | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
| model_path | model path | None | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
| threshold | threshold of prediction score | 0.5 | \ |
| input_shape | picture size of reshape | [3,640,640] | \ |
| batch_size | testing batch size | 1 | \ |
| label_map | category mapping table | None | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
| enforce_cpu | whether to use CPU | False | False to use GPU, and True to force the use of CPU |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction | True | \ |
| thread_num | the number of CPU threads | 10 | \ |
The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:
2021-06-16 16:05:37 +08:00
| dataset | config_path | label_map |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"} |
| TableBank latex | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"} |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |
2021-08-02 23:42:52 +08:00
* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
<a name="PostProcess"></a>
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
## 3. PostProcess
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
2021-06-16 16:05:37 +08:00
```python
2021-08-02 23:42:52 +08:00
# follow the above code
# filter areas for a specific text type
2021-06-16 16:05:37 +08:00
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])
2021-08-02 23:42:52 +08:00
# text areas may be detected within the image area, delete these areas
2021-06-16 16:05:37 +08:00
text_blocks = lp.Layout([b for b in text_blocks \
if not any(b.is_in(b_fig) for b_fig in figure_blocks)])
2021-08-02 23:42:52 +08:00
# sort text areas and assign ID
2021-06-16 16:05:37 +08:00
h, w = image.shape[:2]
left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)
left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])
right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])
2021-08-02 23:42:52 +08:00
# the two lists are merged and the indexes are added in order
2021-06-16 16:05:37 +08:00
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])
2021-08-02 23:42:52 +08:00
# display result
2021-08-01 11:50:11 +08:00
show_img = lp.draw_box(image, text_blocks,
2021-08-02 23:42:52 +08:00
box_width=3,
2021-06-16 16:05:37 +08:00
show_element_id=True)
2021-08-01 11:50:11 +08:00
show_img.show()
2021-06-16 16:05:37 +08:00
```
2021-08-02 23:42:52 +08:00
Displays results with only the "Text" category
2021-06-16 16:05:37 +08:00
<div align="center">
<img src="../../doc/table/result_text.jpg" width = "600" />
</div>
2021-08-02 23:42:52 +08:00
<a name="Results"></a>
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
## 4. Results
2021-06-16 16:05:37 +08:00
| Dataset | mAP | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms | 66.6ms |
| TableBank | 96.2 | 1968.4ms | 65.1ms |
2021-08-02 23:42:52 +08:00
**Envrionment**
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
**CPU** Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz24core
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
**GPU** a single NVIDIA Tesla P40
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
<a name="Training"></a>
2021-06-16 16:05:37 +08:00
2021-08-02 23:42:52 +08:00
## 5. Training
2021-06-16 16:05:37 +08:00
2021-08-03 10:40:22 +08:00
The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser modelplease refer to[train_layoutparser_model](train_layoutparser_model.md)