PaddleOCR/ppstructure/layout/README.md

English | [简体中文](README_ch.md)


# Getting Started

[1. Install whl package](#Install)

[2. Quick Start](#QuickStart)

[3. PostProcess](#PostProcess)

[4. Results](#Results)

[5. Training](#Training)

<a name="Install"></a>

## 1.  Install whl package
```bash
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
```

<a name="QuickStart"></a>

## 2. Quick Start

Use LayoutParser to identify the layout of a document:

```python
import cv2
import layoutparser as lp
image = cv2.imread("doc/table/layout.jpg")
image = image[..., ::-1]

# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
                                enforce_cpu=False,
                                enable_mkldnn=True)
# detect
layout = model.detect(image)

# show result
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
```

The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`

<div align="center">
<img src="../../doc/table/result_all.jpg"  width = "600" />
</div>
`PaddleDetectionLayoutModel`parameters are described as follows:

|   parameter    |                       description                        |   default   |                            remark                            |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
|   batch_size   |                    testing batch size                    |      1      |                              \                               |
|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |

The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。

<a name="PostProcess"></a>

## 3. PostProcess

Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:

```python
# follow the above code
# filter areas for a specific text type
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

# text areas may be detected within the image area, delete these areas
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

# sort text areas and assign ID
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# the two lists are merged and the indexes are added in order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# display result
show_img = lp.draw_box(image, text_blocks,
            box_width=3,
            show_element_id=True)
show_img.show()
```

Displays results with only the "Text" category：

<div align="center">
<img src="../../doc/table/result_text.jpg"  width = "600" />
</div>
<a name="Results"></a>

## 4. Results

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

**Envrionment：**

    **CPU：**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz，24core

    **GPU：**  a single NVIDIA Tesla P40

<a name="Training"></a>

## 5. Training

The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model，please refer to：[train_layoutparser_model](train_layoutparser_model.md)
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								English | [简体中文](README_ch.md)
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												add en layout doc

											
										
										
											2021-07-29 18:08:33 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# Getting Started
-												add en layout doc

											
										
										
											2021-07-29 18:08:33 +08:00
-												Update README.md
											
										
										
											2021-08-03 10:38:41 +08:00
+								[1. Install whl package](#Install)
-												add en layout doc

											
										
										
											2021-07-29 18:08:33 +08:00
-												Update README.md
											
										
										
											2021-08-03 10:38:41 +08:00
+								[2. Quick Start](#QuickStart)
-												add en layout doc

											
										
										
											2021-07-29 18:08:33 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								[3. PostProcess](#PostProcess)
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								[4. Results](#Results)
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								[5. Training](#Training)
-												Update README.md
											
										
										
											2021-08-03 10:38:41 +08:00
+								<a name="Install"></a>
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
 								## 1.  Install whl package
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								```bash
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
 								pip install -U layoutparser-0.0.0-py3-none-any.whl
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								```
-												Update README.md
											
										
										
											2021-08-03 10:38:41 +08:00
+								<a name="QuickStart"></a>
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								## 2. Quick Start
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												Update README.md
											
										
										
											2021-08-03 10:39:10 +08:00
+								Use LayoutParser to identify the layout of a document:
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								```python
-												replace image in layoutparse doc

											
										
										
											2021-07-29 16:12:44 +08:00
+								import cv2
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								import layoutparser as lp
-												fix document;replace layout.png

											
										
										
											2021-08-01 12:00:30 +08:00
+								image = cv2.imread("doc/table/layout.jpg")
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								image = image[..., ::-1]
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# load model
 								model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								                                threshold=0.5,
 								                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								                                enforce_cpu=False,
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								                                enable_mkldnn=True)
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# detect
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								layout = model.detect(image)
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# show result
-												add img show code

											
										
										
											2021-08-01 11:50:11 +08:00
+								show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
 								show_img.show()
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								```
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								<div align="center">
 								<img src="../../doc/table/result_all.jpg"  width = "600" />
 								</div>
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								`PaddleDetectionLayoutModel`parameters are described as follows:
 								|   parameter    |                       description                        |   default   |                            remark                            |
 								| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
 								|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
 								|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
 								|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
 								|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
 								|   batch_size   |                    testing batch size                    |      1      |                              \                               |
 								|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
 								|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
 								| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
 								|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |
 								The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								| dataset                                                      | config_path                                                  | label_map                                                 |
 								| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
 								| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
 								| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
 								| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
 								* Download TableBank dataset contains both word and latex。
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								<a name="PostProcess"></a>
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								## 3. PostProcess
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								```python
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# follow the above code
 								# filter areas for a specific text type
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
 								figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# text areas may be detected within the image area, delete these areas
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								text_blocks = lp.Layout([b for b in text_blocks \
 								                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# sort text areas and assign ID
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								h, w = image.shape[:2]
 								left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)
 								left_blocks = text_blocks.filter_by(left_interval, center=True)
 								left_blocks.sort(key = lambda b:b.coordinates[1])
 								right_blocks = [b for b in text_blocks if b not in left_blocks]
 								right_blocks.sort(key = lambda b:b.coordinates[1])
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# the two lists are merged and the indexes are added in order
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								# display result
-												add img show code

											
										
										
											2021-08-01 11:50:11 +08:00
+								show_img = lp.draw_box(image, text_blocks,
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								            box_width=3,
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								            show_element_id=True)
-												add img show code

											
										
										
											2021-08-01 11:50:11 +08:00
+								show_img.show()
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
+								```
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								Displays results with only the "Text" category：
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								<div align="center">
 								<img src="../../doc/table/result_text.jpg"  width = "600" />
 								</div>
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								<a name="Results"></a>
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								## 4. Results
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
 								| Dataset   | mAP  | CPU time cost | GPU time cost |
 								| --------- | ---- | ------------- | ------------- |
 								| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
 								| TableBank | 96.2 | 1968.4ms      | 65.1ms        |
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								**Envrionment：**
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								    **CPU：**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz，24core
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								    **GPU：**  a single NVIDIA Tesla P40
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								<a name="Training"></a>
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												test=documents_fix, test=dygraph

											
										
										
											2021-08-02 23:42:52 +08:00
+								## 5. Training
-												add layout model

											
										
										
											2021-06-16 16:05:37 +08:00
-												Update README.md
											
										
										
											2021-08-03 10:40:22 +08:00
+								The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model，please refer to：[train_layoutparser_model](train_layoutparser_model.md)