Merge remote-tracking branch 'upstream/dygraph' into dy3
|
@ -1,21 +1,27 @@
|
|||
English | [简体中文](README_ch.md)
|
||||
|
||||
# PPOCRLabel
|
||||
|
||||
PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,使用python3和pyqt5编写,支持矩形框标注和四点标注模式,导出格式可直接用于PPOCR检测和识别模型的训练。
|
||||
PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models.
|
||||
|
||||
<img src="./data/gif/steps.gif" width="100%"/>
|
||||
<img src="./data/gif/steps_en.gif" width="100%"/>
|
||||
|
||||
## 安装
|
||||
## Installation
|
||||
|
||||
### 1. 安装PaddleOCR
|
||||
参考[PaddleOCR安装文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md)准备好PaddleOCR
|
||||
### 1. Install PaddleOCR
|
||||
|
||||
Refer to [PaddleOCR installation document](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md) to prepare PaddleOCR
|
||||
|
||||
### 2. Install PPOCRLabel
|
||||
|
||||
### 2. 安装PPOCRLabel
|
||||
#### Windows + Anaconda
|
||||
|
||||
Download and install [Anaconda](https://www.anaconda.com/download/#download) (Python 3+)
|
||||
|
||||
```
|
||||
pip install pyqt5
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python PPOCRLabel.py --lang ch
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python PPOCRLabel.py
|
||||
```
|
||||
|
||||
#### Ubuntu Linux
|
||||
|
@ -23,78 +29,97 @@ python PPOCRLabel.py --lang ch
|
|||
```
|
||||
pip3 install pyqt5
|
||||
pip3 install trash-cli
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python3 PPOCRLabel.py --lang ch
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python3 PPOCRLabel.py
|
||||
```
|
||||
|
||||
#### macOS
|
||||
```
|
||||
pip3 install pyqt5
|
||||
pip3 uninstall opencv-python # 由于mac版本的opencv与pyqt有冲突,需先手动卸载opencv
|
||||
pip3 install opencv-contrib-python-headless # 安装headless版本的open-cv
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python3 PPOCRLabel.py --lang ch
|
||||
pip3 uninstall opencv-python # Uninstall opencv manually as it conflicts with pyqt
|
||||
pip3 install opencv-contrib-python-headless # Install the headless version of opencv
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python3 PPOCRLabel.py
|
||||
```
|
||||
|
||||
## 使用
|
||||
## Usage
|
||||
|
||||
### 操作步骤
|
||||
### Steps
|
||||
|
||||
1. 安装与运行:使用上述命令安装与运行程序。
|
||||
2. 打开文件夹:在菜单栏点击 “文件” - "打开目录" 选择待标记图片的文件夹<sup>[1]</sup>.
|
||||
3. 自动标注:点击 ”自动标注“,使用PPOCR超轻量模型对图片文件名前图片状态<sup>[2]</sup>为 “X” 的图片进行自动标注。
|
||||
4. 手动标注:点击 “矩形标注”(推荐直接在英文模式下点击键盘中的 “W”),用户可对当前图片中模型未检出的部分进行手动绘制标记框。点击键盘P,则使用四点标注模式(或点击“编辑” - “四点标注”),用户依次点击4个点后,双击左键表示标注完成。
|
||||
5. 标记框绘制完成后,用户点击 “确认”,检测框会先被预分配一个 “待识别” 标签。
|
||||
6. 重新识别:将图片中的所有检测画绘制/调整完成后,点击 “重新识别”,PPOCR模型会对当前图片中的**所有检测框**重新识别<sup>[3]</sup>。
|
||||
7. 内容更改:双击识别结果,对不准确的识别结果进行手动更改。
|
||||
8. 确认标记:点击 “确认”,图片状态切换为 “√”,跳转至下一张(此时不会直接将结果写入文件)。
|
||||
9. 删除:点击 “删除图像”,图片将会被删除至回收站。
|
||||
10. 保存结果:用户可以通过菜单中“文件-保存标记结果”手动保存,同时程序也会在用户每确认10张图片后自动保存一次。手动确认过的标记将会被存放在所打开图片文件夹下的*Label.txt*中。在菜单栏点击 “文件” - "保存识别结果"后,会将此类图片的识别训练数据保存在*crop_img*文件夹下,识别标签保存在*rec_gt.txt*中<sup>[4]</sup>。
|
||||
1. Build and launch using the instructions above.
|
||||
|
||||
### 注意
|
||||
2. Click 'Open Dir' in Menu/File to select the folder of the picture.<sup>[1]</sup>
|
||||
|
||||
[1] PPOCRLabel以文件夹为基本标记单位,打开待标记的图片文件夹后,不会在窗口栏中显示图片,而是在点击 "选择文件夹" 之后直接将文件夹下的图片导入到程序中。
|
||||
3. Click 'Auto recognition', use PPOCR model to automatically annotate images which marked with 'X' <sup>[2]</sup>before the file name.
|
||||
|
||||
[2] 图片状态表示本张图片用户是否手动保存过,未手动保存过即为 “X”,手动保存过为 “√”。点击 “自动标注”按钮后,PPOCRLabel不会对状态为 “√” 的图片重新标注。
|
||||
4. Create Box:
|
||||
|
||||
[3] 点击“重新识别”后,模型会对图片中的识别结果进行覆盖。因此如果在此之前手动更改过识别结果,有可能在重新识别后产生变动。
|
||||
4.1 Click 'Create RectBox' or press 'W' in English keyboard mode to draw a new rectangle detection box. Click and release left mouse to select a region to annotate the text area.
|
||||
|
||||
[4] PPOCRLabel产生的文件放置于标记图片文件夹下,包括一下几种,请勿手动更改其中内容,否则会引起程序出现异常。
|
||||
4.2 Press 'P' to enter four-point labeling mode which enables you to create any four-point shape by clicking four points with the left mouse button in succession and DOUBLE CLICK the left mouse as the signal of labeling completion.
|
||||
|
||||
| 文件名 | 说明 |
|
||||
5. After the marking frame is drawn, the user clicks "OK", and the detection frame will be pre-assigned a "TEMPORARY" label.
|
||||
|
||||
6. Click 're-Recognition', model will rewrite ALL recognition results in ALL detection box<sup>[3]</sup>.
|
||||
|
||||
7. Double click the result in 'recognition result' list to manually change inaccurate recognition results.
|
||||
|
||||
8. Click "Check", the image status will switch to "√",then the program automatically jump to the next(The results will not be written directly to the file at this time).
|
||||
|
||||
9. Click "Delete Image" and the image will be deleted to the recycle bin.
|
||||
|
||||
10. Labeling result: the user can save manually through the menu "File - Save Label", while the program will also save automatically after every 10 images confirmed by the user.the manually checked label will be stored in *Label.txt* under the opened picture folder.
|
||||
Click "PaddleOCR"-"Save Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*<sup>[4]</sup>.
|
||||
|
||||
### Note
|
||||
|
||||
[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir".
|
||||
|
||||
[2] The image status indicates whether the user has saved the image manually. If it has not been saved manually it is "X", otherwise it is "√", PPOCRLabel will not relabel pictures with a status of "√".
|
||||
|
||||
[3] After clicking "Re-recognize", the model will overwrite ALL recognition results in the picture.
|
||||
Therefore, if the recognition result has been manually changed before, it may change after re-recognition.
|
||||
|
||||
[4] The files produced by PPOCRLabel can be found under the opened picture folder including the following, please do not manually change the contents, otherwise it will cause the program to be abnormal.
|
||||
|
||||
| File name | Description |
|
||||
| :-----------: | :----------------------------------------------------------: |
|
||||
| Label.txt | 检测标签,可直接用于PPOCR检测模型训练。用户每保存10张检测结果后,程序会进行自动写入。当用户关闭应用程序或切换文件路径后同样会进行写入。 |
|
||||
| fileState.txt | 图片状态标记文件,保存当前文件夹下已经被用户手动确认过的图片名称。 |
|
||||
| Cache.cach | 缓存文件,保存模型自动识别的结果。 |
|
||||
| rec_gt.txt | 识别标签。可直接用于PPOCR识别模型训练。需用户手动点击菜单栏“文件” - "保存识别结果"后产生。 |
|
||||
| crop_img | 识别数据。按照检测框切割后的图片。与rec_gt.txt同时产生。 |
|
||||
| Label.txt | The detection label file can be directly used for PPOCR detection model training. After the user saves 10 label results, the file will be automatically saved. It will also be written when the user closes the application or changes the file folder. |
|
||||
| fileState.txt | The picture status file save the image in the current folder that has been manually confirmed by the user. |
|
||||
| Cache.cach | Cache files to save the results of model recognition. |
|
||||
| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Save recognition result". |
|
||||
| crop_img | The recognition data, generated at the same time with *rec_gt.txt* |
|
||||
|
||||
## 说明
|
||||
### 内置模型
|
||||
## Explanation
|
||||
|
||||
- 默认模型:PPOCRLabel默认使用PaddleOCR中的中英文超轻量OCR模型,支持中英文与数字识别,多种语言检测。
|
||||
### Built-in Model
|
||||
|
||||
- 模型语言切换:用户可通过菜单栏中 "PaddleOCR" - "选择模型" 切换内置模型语言,目前支持的语言包括法文、德文、韩文、日文。具体模型下载链接可参考[PaddleOCR模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md).
|
||||
- Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection.
|
||||
|
||||
- 自定义模型:用户可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B),通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110)替换成自己训练的模型。
|
||||
- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languagesinclude French, German, Korean, and Japanese.
|
||||
For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating)
|
||||
|
||||
### 导出部分识别结果
|
||||
- Custom model: The model trained by users can be replaced by modifying PPOCRLabel.py in [PaddleOCR class instantiation](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110) referring [Custom Model Code](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md#use-custom-model)
|
||||
|
||||
针对部分难以识别的数据,通过在识别结果的复选框中**取消勾选**相应的标记,其识别结果不会被导出。
|
||||
### Export partial recognition results
|
||||
|
||||
*注意:识别结果中的复选框状态仍需用户手动点击保存后才能保留*
|
||||
For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox.
|
||||
|
||||
### 错误提示
|
||||
- 如果同时使用whl包安装了paddleocr,其优先级大于通过paddleocr.py调用PaddleOCR类,whl包未更新时会导致程序异常。
|
||||
- PPOCRLabel**不支持对中文文件名**的图片进行自动标注。
|
||||
- 针对Linux用户::如果您在打开软件过程中出现**objc[XXXXX]**开头的错误,证明您的opencv版本太高,建议安装4.2版本:
|
||||
```
|
||||
pip install opencv-python==4.2.0.32
|
||||
```
|
||||
- 如果出现''Missing string id '开头的错误,需要重新编译资源:
|
||||
```
|
||||
pyrcc5 -o libs/resources.py resources.qrc
|
||||
```
|
||||
### 参考资料
|
||||
*Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
|
||||
|
||||
### Error message
|
||||
|
||||
- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated.
|
||||
|
||||
- For Linux users, if you get an error starting with **objc[XXXXX]** when opening the software, it proves that your opencv version is too high. It is recommended to install version 4.2:
|
||||
|
||||
```
|
||||
pip install opencv-python==4.2.0.32
|
||||
```
|
||||
- If you get an error starting with **Missing string id **,you need to recompile resources:
|
||||
```
|
||||
pyrcc5 -o libs/resources.py resources.qrc
|
||||
```
|
||||
### Related
|
||||
|
||||
1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg)
|
||||
|
|
|
@ -0,0 +1,102 @@
|
|||
[English](README.md) | 简体中文
|
||||
|
||||
# PPOCRLabel
|
||||
|
||||
PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,使用python3和pyqt5编写,支持矩形框标注和四点标注模式,导出格式可直接用于PPOCR检测和识别模型的训练。
|
||||
|
||||
<img src="./data/gif/steps.gif" width="100%"/>
|
||||
|
||||
## 安装
|
||||
|
||||
### 1. 安装PaddleOCR
|
||||
参考[PaddleOCR安装文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md)准备好PaddleOCR
|
||||
|
||||
### 2. 安装PPOCRLabel
|
||||
#### Windows + Anaconda
|
||||
|
||||
```
|
||||
pip install pyqt5
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python PPOCRLabel.py --lang ch
|
||||
```
|
||||
|
||||
#### Ubuntu Linux
|
||||
|
||||
```
|
||||
pip3 install pyqt5
|
||||
pip3 install trash-cli
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python3 PPOCRLabel.py --lang ch
|
||||
```
|
||||
|
||||
#### macOS
|
||||
```
|
||||
pip3 install pyqt5
|
||||
pip3 uninstall opencv-python # 由于mac版本的opencv与pyqt有冲突,需先手动卸载opencv
|
||||
pip3 install opencv-contrib-python-headless # 安装headless版本的open-cv
|
||||
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
|
||||
python3 PPOCRLabel.py --lang ch
|
||||
```
|
||||
|
||||
## 使用
|
||||
|
||||
### 操作步骤
|
||||
|
||||
1. 安装与运行:使用上述命令安装与运行程序。
|
||||
2. 打开文件夹:在菜单栏点击 “文件” - "打开目录" 选择待标记图片的文件夹<sup>[1]</sup>.
|
||||
3. 自动标注:点击 ”自动标注“,使用PPOCR超轻量模型对图片文件名前图片状态<sup>[2]</sup>为 “X” 的图片进行自动标注。
|
||||
4. 手动标注:点击 “矩形标注”(推荐直接在英文模式下点击键盘中的 “W”),用户可对当前图片中模型未检出的部分进行手动绘制标记框。点击键盘P,则使用四点标注模式(或点击“编辑” - “四点标注”),用户依次点击4个点后,双击左键表示标注完成。
|
||||
5. 标记框绘制完成后,用户点击 “确认”,检测框会先被预分配一个 “待识别” 标签。
|
||||
6. 重新识别:将图片中的所有检测画绘制/调整完成后,点击 “重新识别”,PPOCR模型会对当前图片中的**所有检测框**重新识别<sup>[3]</sup>。
|
||||
7. 内容更改:双击识别结果,对不准确的识别结果进行手动更改。
|
||||
8. 确认标记:点击 “确认”,图片状态切换为 “√”,跳转至下一张(此时不会直接将结果写入文件)。
|
||||
9. 删除:点击 “删除图像”,图片将会被删除至回收站。
|
||||
10. 保存结果:用户可以通过菜单中“文件-保存标记结果”手动保存,同时程序也会在用户每确认10张图片后自动保存一次。手动确认过的标记将会被存放在所打开图片文件夹下的*Label.txt*中。在菜单栏点击 “文件” - "保存识别结果"后,会将此类图片的识别训练数据保存在*crop_img*文件夹下,识别标签保存在*rec_gt.txt*中<sup>[4]</sup>。
|
||||
|
||||
### 注意
|
||||
|
||||
[1] PPOCRLabel以文件夹为基本标记单位,打开待标记的图片文件夹后,不会在窗口栏中显示图片,而是在点击 "选择文件夹" 之后直接将文件夹下的图片导入到程序中。
|
||||
|
||||
[2] 图片状态表示本张图片用户是否手动保存过,未手动保存过即为 “X”,手动保存过为 “√”。点击 “自动标注”按钮后,PPOCRLabel不会对状态为 “√” 的图片重新标注。
|
||||
|
||||
[3] 点击“重新识别”后,模型会对图片中的识别结果进行覆盖。因此如果在此之前手动更改过识别结果,有可能在重新识别后产生变动。
|
||||
|
||||
[4] PPOCRLabel产生的文件放置于标记图片文件夹下,包括一下几种,请勿手动更改其中内容,否则会引起程序出现异常。
|
||||
|
||||
| 文件名 | 说明 |
|
||||
| :-----------: | :----------------------------------------------------------: |
|
||||
| Label.txt | 检测标签,可直接用于PPOCR检测模型训练。用户每保存10张检测结果后,程序会进行自动写入。当用户关闭应用程序或切换文件路径后同样会进行写入。 |
|
||||
| fileState.txt | 图片状态标记文件,保存当前文件夹下已经被用户手动确认过的图片名称。 |
|
||||
| Cache.cach | 缓存文件,保存模型自动识别的结果。 |
|
||||
| rec_gt.txt | 识别标签。可直接用于PPOCR识别模型训练。需用户手动点击菜单栏“文件” - "保存识别结果"后产生。 |
|
||||
| crop_img | 识别数据。按照检测框切割后的图片。与rec_gt.txt同时产生。 |
|
||||
|
||||
## 说明
|
||||
### 内置模型
|
||||
|
||||
- 默认模型:PPOCRLabel默认使用PaddleOCR中的中英文超轻量OCR模型,支持中英文与数字识别,多种语言检测。
|
||||
|
||||
- 模型语言切换:用户可通过菜单栏中 "PaddleOCR" - "选择模型" 切换内置模型语言,目前支持的语言包括法文、德文、韩文、日文。具体模型下载链接可参考[PaddleOCR模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md).
|
||||
|
||||
- 自定义模型:用户可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B),通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110)替换成自己训练的模型。
|
||||
|
||||
### 导出部分识别结果
|
||||
|
||||
针对部分难以识别的数据,通过在识别结果的复选框中**取消勾选**相应的标记,其识别结果不会被导出。
|
||||
|
||||
*注意:识别结果中的复选框状态仍需用户手动点击保存后才能保留*
|
||||
|
||||
### 错误提示
|
||||
- 如果同时使用whl包安装了paddleocr,其优先级大于通过paddleocr.py调用PaddleOCR类,whl包未更新时会导致程序异常。
|
||||
- PPOCRLabel**不支持对中文文件名**的图片进行自动标注。
|
||||
- 针对Linux用户::如果您在打开软件过程中出现**objc[XXXXX]**开头的错误,证明您的opencv版本太高,建议安装4.2版本:
|
||||
```
|
||||
pip install opencv-python==4.2.0.32
|
||||
```
|
||||
- 如果出现''Missing string id '开头的错误,需要重新编译资源:
|
||||
```
|
||||
pyrcc5 -o libs/resources.py resources.qrc
|
||||
```
|
||||
### 参考资料
|
||||
|
||||
1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg)
|
|
@ -1,123 +0,0 @@
|
|||
# PPOCRLabel
|
||||
|
||||
PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field. It is written in python3 and pyqt5, supporting rectangular box annotation and four-point annotation modes. Annotations can be directly used for the training of PPOCR detection and recognition models.
|
||||
|
||||
<img src="./data/gif/steps_en.gif" width="100%"/>
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Install PaddleOCR
|
||||
|
||||
Refer to [PaddleOCR installation document](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/installation.md) to prepare PaddleOCR
|
||||
|
||||
### 2. Install PPOCRLabel
|
||||
|
||||
#### Windows + Anaconda
|
||||
|
||||
Download and install [Anaconda](https://www.anaconda.com/download/#download) (Python 3+)
|
||||
|
||||
```
|
||||
pip install pyqt5
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python PPOCRLabel.py
|
||||
```
|
||||
|
||||
#### Ubuntu Linux
|
||||
|
||||
```
|
||||
pip3 install pyqt5
|
||||
pip3 install trash-cli
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python3 PPOCRLabel.py
|
||||
```
|
||||
|
||||
#### macOS
|
||||
```
|
||||
pip3 install pyqt5
|
||||
pip3 uninstall opencv-python # Uninstall opencv manually as it conflicts with pyqt
|
||||
pip3 install opencv-contrib-python-headless # Install the headless version of opencv
|
||||
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
|
||||
python3 PPOCRLabel.py
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Steps
|
||||
|
||||
1. Build and launch using the instructions above.
|
||||
|
||||
2. Click 'Open Dir' in Menu/File to select the folder of the picture.<sup>[1]</sup>
|
||||
|
||||
3. Click 'Auto recognition', use PPOCR model to automatically annotate images which marked with 'X' <sup>[2]</sup>before the file name.
|
||||
|
||||
4. Create Box:
|
||||
|
||||
4.1 Click 'Create RectBox' or press 'W' in English keyboard mode to draw a new rectangle detection box. Click and release left mouse to select a region to annotate the text area.
|
||||
|
||||
4.2 Press 'P' to enter four-point labeling mode which enables you to create any four-point shape by clicking four points with the left mouse button in succession and DOUBLE CLICK the left mouse as the signal of labeling completion.
|
||||
|
||||
5. After the marking frame is drawn, the user clicks "OK", and the detection frame will be pre-assigned a "TEMPORARY" label.
|
||||
|
||||
6. Click 're-Recognition', model will rewrite ALL recognition results in ALL detection box<sup>[3]</sup>.
|
||||
|
||||
7. Double click the result in 'recognition result' list to manually change inaccurate recognition results.
|
||||
|
||||
8. Click "Check", the image status will switch to "√",then the program automatically jump to the next(The results will not be written directly to the file at this time).
|
||||
|
||||
9. Click "Delete Image" and the image will be deleted to the recycle bin.
|
||||
|
||||
10. Labeling result: the user can save manually through the menu "File - Save Label", while the program will also save automatically after every 10 images confirmed by the user.the manually checked label will be stored in *Label.txt* under the opened picture folder.
|
||||
Click "PaddleOCR"-"Save Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*<sup>[4]</sup>.
|
||||
|
||||
### Note
|
||||
|
||||
[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir".
|
||||
|
||||
[2] The image status indicates whether the user has saved the image manually. If it has not been saved manually it is "X", otherwise it is "√", PPOCRLabel will not relabel pictures with a status of "√".
|
||||
|
||||
[3] After clicking "Re-recognize", the model will overwrite ALL recognition results in the picture.
|
||||
Therefore, if the recognition result has been manually changed before, it may change after re-recognition.
|
||||
|
||||
[4] The files produced by PPOCRLabel can be found under the opened picture folder including the following, please do not manually change the contents, otherwise it will cause the program to be abnormal.
|
||||
|
||||
| File name | Description |
|
||||
| :-----------: | :----------------------------------------------------------: |
|
||||
| Label.txt | The detection label file can be directly used for PPOCR detection model training. After the user saves 10 label results, the file will be automatically saved. It will also be written when the user closes the application or changes the file folder. |
|
||||
| fileState.txt | The picture status file save the image in the current folder that has been manually confirmed by the user. |
|
||||
| Cache.cach | Cache files to save the results of model recognition. |
|
||||
| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Save recognition result". |
|
||||
| crop_img | The recognition data, generated at the same time with *rec_gt.txt* |
|
||||
|
||||
## Explanation
|
||||
|
||||
### Built-in Model
|
||||
|
||||
- Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection.
|
||||
|
||||
- Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languagesinclude French, German, Korean, and Japanese.
|
||||
For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating)
|
||||
|
||||
- Custom model: The model trained by users can be replaced by modifying PPOCRLabel.py in [PaddleOCR class instantiation](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110) referring [Custom Model Code](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md#use-custom-model)
|
||||
|
||||
### Export partial recognition results
|
||||
|
||||
For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox.
|
||||
|
||||
*Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
|
||||
|
||||
### Error message
|
||||
|
||||
- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated.
|
||||
|
||||
- For Linux users, if you get an error starting with **objc[XXXXX]** when opening the software, it proves that your opencv version is too high. It is recommended to install version 4.2:
|
||||
|
||||
```
|
||||
pip install opencv-python==4.2.0.32
|
||||
```
|
||||
- If you get an error starting with **Missing string id **,you need to recompile resources:
|
||||
```
|
||||
pyrcc5 -o libs/resources.py resources.qrc
|
||||
```
|
||||
### Related
|
||||
|
||||
1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg)
|
121
README.md
|
@ -1,24 +1,28 @@
|
|||
English | [简体中文](README_ch.md)
|
||||
|
||||
## Introduction
|
||||
PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
## Notice
|
||||
PaddleOCR supports both dynamic graph and static graph programming paradigm
|
||||
- Dynamic graph: dygraph branch (default), **supported by paddle 2.0rc1+ ([installation](./doc/doc_en/installation_en.md))**
|
||||
- Static graph: develop branch
|
||||
|
||||
**Recent updates**
|
||||
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](./StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
|
||||
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
|
||||
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
|
||||
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
|
||||
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
|
||||
- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## Features
|
||||
- PPOCR series of high-quality pre-trained models, comparable to commercial effects
|
||||
- Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M
|
||||
- General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M
|
||||
- Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Ultra lightweight ppocr_mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
|
||||
- General ppocr_server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Rich toolkits related to the OCR areas
|
||||
- Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation
|
||||
- Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image
|
||||
- Support user-defined training, provides rich predictive inference deployment solutions
|
||||
- Support PIP installation, easy to use
|
||||
- Support Linux, Windows, MacOS and other systems
|
||||
|
@ -26,12 +30,21 @@ PaddleOCR aims to create rich, leading, and practical OCR tools that help users
|
|||
## Visualization
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1101.jpg" width="800">
|
||||
<img src="doc/imgs_results/1103.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
</div>
|
||||
|
||||
The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md).
|
||||
|
||||
<a name="Community"></a>
|
||||
## Community
|
||||
- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
|
||||
## Quick Experience
|
||||
|
||||
You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
|
||||
|
@ -48,55 +61,62 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
|
|||
|
||||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
## PP-OCR 1.1 series model list(Update on Sep 17)
|
||||
|
||||
## PP-OCR 2.0 series model list(Update on Dec 15)
|
||||
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
|
||||
|
||||
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
|
||||
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||||
| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
|
||||
| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
|
||||
| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|
||||
| Chinese and English ultra-lightweight OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| Chinese and English general OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md)
|
||||
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md).
|
||||
|
||||
For a new language request, please refer to [Guideline for new language_requests](#language_requests).
|
||||
|
||||
## Tutorials
|
||||
- [Installation](./doc/doc_en/installation_en.md)
|
||||
- [Quick Start](./doc/doc_en/quickstart_en.md)
|
||||
- [Code Structure](./doc/doc_en/tree_en.md)
|
||||
- Algorithm introduction
|
||||
- Algorithm Introduction
|
||||
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [PP-OCR Pipline](#PP-OCR-Pipline)
|
||||
- Model training/evaluation
|
||||
- [PP-OCR Pipeline](#PP-OCR-Pipeline)
|
||||
- Model Training/Evaluation
|
||||
- [Text Detection](./doc/doc_en/detection_en.md)
|
||||
- [Text Recognition](./doc/doc_en/recognition_en.md)
|
||||
- [Direction Classification](./doc/doc_en/angle_class_en.md)
|
||||
- [Yml Configuration](./doc/doc_en/config_en.md)
|
||||
- Inference and Deployment
|
||||
- [Quick inference based on pip](./doc/doc_en/whl_en.md)
|
||||
- [Quick Inference Based on PIP](./doc/doc_en/whl_en.md)
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving](./deploy/hubserving/readme_en.md)
|
||||
- [Mobile](./deploy/lite/readme_en.md)
|
||||
- [Model Quantization](./deploy/slim/quantization/README_en.md)
|
||||
- [Model Compression](./deploy/slim/prune/README_en.md)
|
||||
- [Benchmark](./doc/doc_en/benchmark_en.md)
|
||||
- [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md)
|
||||
- [Benchmark](./doc/doc_en/benchmark_en.md)
|
||||
- Data Annotation and Synthesis
|
||||
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
|
||||
- [Data Synthesis Tool: Style-Text](./StyleText/README.md)
|
||||
- [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
|
||||
- [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
|
||||
- Datasets
|
||||
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
|
||||
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
|
||||
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
|
||||
- [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
|
||||
- [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
|
||||
- [Visualization](#Visualization)
|
||||
- [New language requests](#language_requests)
|
||||
- [FAQ](./doc/doc_en/FAQ_en.md)
|
||||
- [Community](#Community)
|
||||
- [References](./doc/doc_en/reference_en.md)
|
||||
- [License](#LICENSE)
|
||||
- [Contribution](#CONTRIBUTION)
|
||||
|
||||
<a name="PP-OCR-Pipline"></a>
|
||||
|
||||
## PP-OCR Pipline
|
||||
|
||||
<a name="PP-OCR-Pipeline"></a>
|
||||
|
||||
## PP-OCR Pipeline
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ppocr_framework.png" width="800">
|
||||
|
@ -109,30 +129,41 @@ PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of thr
|
|||
## Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
- Chinese OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1102.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1104.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1106.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1105.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- English OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/img_12.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- Multilingual OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1110.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1112.jpg" width="800">
|
||||
<img src="./doc/imgs_results/french_0.jpg" width="800">
|
||||
<img src="./doc/imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="Community"></a>
|
||||
## Community
|
||||
Scan the QR code below with your Wechat and completing the questionnaire, you can access to official technical exchange group.
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
<a name="language_requests"></a>
|
||||
## Guideline for new language requests
|
||||
|
||||
If you want to request a new language support, a PR with 2 following files are needed:
|
||||
|
||||
1. In folder [ppocr/utils/dict](./ppocr/utils/dict),
|
||||
it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder.
|
||||
|
||||
2. In folder [ppocr/utils/corpus](./ppocr/utils/corpus),
|
||||
it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language.
|
||||
Maybe, 50000 words per language is necessary at least.
|
||||
Of course, the more, the better.
|
||||
|
||||
If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.
|
||||
|
||||
More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
|
||||
|
||||
|
||||
<a name="LICENSE"></a>
|
||||
## License
|
||||
|
@ -149,3 +180,7 @@ We welcome all the contributions to PaddleOCR and appreciate for your feedback v
|
|||
- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
|
||||
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
|
||||
- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
|
||||
- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment.
|
||||
- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set.
|
||||
- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language.
|
||||
- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。
|
||||
|
|
63
README_ch.md
|
@ -2,16 +2,16 @@
|
|||
|
||||
## 简介
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
## 注意
|
||||
PaddleOCR同时支持动态图与静态图两种编程范式
|
||||
- 动态图版本:dygraph分支(默认),需将paddle版本升级至2.0rc1+([快速安装](./doc/doc_ch/installation.md))
|
||||
- 静态图版本:develop分支
|
||||
|
||||
**近期更新**
|
||||
- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
|
||||
- 2020.12.07 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。
|
||||
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
|
||||
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
|
||||
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
|
||||
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipeline](#PP-OCR)),适合在移动端部署使用。[模型下载](#模型下载)
|
||||
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](#模型下载)
|
||||
- 2020.9.17 更新[英文识别模型](./doc/doc_ch/models_list.md#英文识别模型)和[多语言识别模型](doc/doc_ch/models_list.md#多语言识别模型),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。
|
||||
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./doc/doc_ch/whl.md)
|
||||
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- [More](./doc/doc_ch/update.md)
|
||||
|
||||
|
||||
|
@ -19,11 +19,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
## 特性
|
||||
|
||||
- PPOCR系列高质量预训练模型,准确的识别效果
|
||||
- 超轻量ppocr_mobile移动端系列:检测(2.6M)+方向分类器(0.9M)+ 识别(4.6M)= 8.1M
|
||||
- 通用ppocr_server系列:检测(47.2M)+方向分类器(0.9M)+ 识别(107M)= 155.1M
|
||||
- 超轻量压缩ppocr_mobile_slim系列:检测(1.4M)+方向分类器(0.5M)+ 识别(1.6M)= 3.5M
|
||||
- 支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 支持多语言识别:韩语、日语、德语、法语
|
||||
- 超轻量ppocr_mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
|
||||
- 通用ppocr_server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
|
||||
- 支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 支持多语言识别:韩语、日语、德语、法语
|
||||
- 丰富易用的OCR相关工具组件
|
||||
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
|
||||
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
|
||||
- 支持用户自定义训练,提供丰富的预测推理部署方案
|
||||
- 支持PIP快速安装使用
|
||||
- 可运行于Linux、Windows、MacOS等多种系统
|
||||
|
@ -31,8 +33,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
## 效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1101.jpg" width="800">
|
||||
<img src="doc/imgs_results/1103.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
</div>
|
||||
|
||||
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
|
@ -47,15 +49,15 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- 代码体验:从[快速安装](./doc/doc_ch/installation.md) 开始
|
||||
- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始
|
||||
|
||||
<a name="模型下载"></a>
|
||||
## PP-OCR 2.0系列模型列表(更新中)
|
||||
|
||||
**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
|
||||
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
|
||||
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
|
||||
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| 中英文通用OCR模型(143M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
| 中英文超轻量OCR模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| 中英文通用OCR模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](./doc/doc_ch/models_list.md)
|
||||
|
||||
|
@ -78,27 +80,26 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
|
||||
- [服务化部署](./deploy/hubserving/readme.md)
|
||||
- [端侧部署](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme.md)
|
||||
- [模型量化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README.md)
|
||||
- [模型裁剪](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/README.md)
|
||||
- [Benchmark](./doc/doc_ch/benchmark.md)
|
||||
- 数据集
|
||||
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
|
||||
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
|
||||
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
|
||||
- [常用数据标注工具](./doc/doc_ch/data_annotation.md)
|
||||
- [常用数据合成工具](./doc/doc_ch/data_synthesis.md)
|
||||
- 数据标注与合成
|
||||
- [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
|
||||
- [数据合成工具Style-Text](./StyleText/README_ch.md)
|
||||
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
|
||||
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
|
||||
- [效果展示](#效果展示)
|
||||
- FAQ
|
||||
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用21个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战53个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用30个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战84个问题](./doc/doc_ch/FAQ.md)
|
||||
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
|
||||
- [参考文献](./doc/doc_ch/reference.md)
|
||||
- [许可证书](#许可证书)
|
||||
- [贡献代码](#贡献代码)
|
||||
|
||||
***注意:动态图端侧部署仍在开发中,目前仅支持动态图训练、python端预测,C++预测,
|
||||
如果您有需要移动端部署案例或者量化裁剪,请切换到静态图分支;***
|
||||
|
||||
<a name="PP-OCR"></a>
|
||||
## PP-OCR Pipline
|
||||
|
@ -112,10 +113,10 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
|
|||
## 效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
- 中文模型
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1102.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1104.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1106.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1105.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- 英文模型
|
||||
|
@ -125,8 +126,8 @@ PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框
|
|||
|
||||
- 其他语言模型
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1110.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1112.jpg" width="800">
|
||||
<img src="./doc/imgs_results/french_0.jpg" width="800">
|
||||
<img src="./doc/imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
|
|
186
README_en.md
|
@ -1,186 +0,0 @@
|
|||
English | [简体中文](README_ch.md)
|
||||
|
||||
## Introduction
|
||||
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
**Recent updates**
|
||||
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README_en.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
|
||||
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
|
||||
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipeline](#PP-OCR-Pipeline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
|
||||
- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `English`, `Chinese`, `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
|
||||
- 2020.8.24 Support the use of PaddleOCR through whl package installation,please refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
|
||||
- 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## Features
|
||||
- PPOCR series of high-quality pre-trained models, comparable to commercial effects
|
||||
- Ultra lightweight ppocr_mobile series models: detection (2.6M) + direction classifier (0.9M) + recognition (4.6M) = 8.1M
|
||||
- General ppocr_server series models: detection (47.2M) + direction classifier (0.9M) + recognition (107M) = 155.1M
|
||||
- Ultra lightweight compression ppocr_mobile_slim series models: detection (1.4M) + direction classifier (0.5M) + recognition (1.6M) = 3.5M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Support user-defined training, provides rich predictive inference deployment solutions
|
||||
- Support PIP installation, easy to use
|
||||
- Support Linux, Windows, MacOS and other systems
|
||||
|
||||
## Visualization
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/1101.jpg" width="800">
|
||||
<img src="doc/imgs_results/1103.jpg" width="800">
|
||||
</div>
|
||||
|
||||
The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md).
|
||||
|
||||
<a name="Community"></a>
|
||||
## Community
|
||||
- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
|
||||
## Quick Experience
|
||||
|
||||
You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
|
||||
|
||||
Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
|
||||
|
||||
Also, you can scan the QR code below to install the App (**Android support only**)
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- [**OCR Quick Start**](./doc/doc_en/quickstart_en.md)
|
||||
|
||||
<a name="Supported-Chinese-model-list"></a>
|
||||
|
||||
## PP-OCR 2.0 series model list(Update on Sep 17)
|
||||
|
||||
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
|
||||
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||||
| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
| Chinese and English general OCR model (143M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
|
||||
For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md).
|
||||
|
||||
For a new language request, please refer to [Guideline for new language_requests](#language_requests).
|
||||
|
||||
## Tutorials
|
||||
- [Installation](./doc/doc_en/installation_en.md)
|
||||
- [Quick Start](./doc/doc_en/quickstart_en.md)
|
||||
- [Code Structure](./doc/doc_en/tree_en.md)
|
||||
- Algorithm Introduction
|
||||
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
|
||||
- [PP-OCR Pipeline](#PP-OCR-Pipeline)
|
||||
- Model Training/Evaluation
|
||||
- [Text Detection](./doc/doc_en/detection_en.md)
|
||||
- [Text Recognition](./doc/doc_en/recognition_en.md)
|
||||
- [Direction Classification](./doc/doc_en/angle_class_en.md)
|
||||
- [Yml Configuration](./doc/doc_en/config_en.md)
|
||||
- Inference and Deployment
|
||||
- [Quick Inference Based on PIP](./doc/doc_en/whl_en.md)
|
||||
- [Python Inference](./doc/doc_en/inference_en.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving](./deploy/hubserving/readme_en.md)
|
||||
- [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md)
|
||||
- [Model Quantization](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md)
|
||||
- [Model Compression](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/README_en.md)
|
||||
- [Benchmark](./doc/doc_en/benchmark_en.md)
|
||||
- Data Annotation and Synthesis
|
||||
- [Semi-automatic Annotation Tool](./PPOCRLabel/README_en.md)
|
||||
- [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
|
||||
- [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
|
||||
- Datasets
|
||||
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
|
||||
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
|
||||
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
|
||||
- [Visualization](#Visualization)
|
||||
- [New language requests](#language_requests)
|
||||
- [FAQ](./doc/doc_en/FAQ_en.md)
|
||||
- [Community](#Community)
|
||||
- [References](./doc/doc_en/reference_en.md)
|
||||
- [License](#LICENSE)
|
||||
- [Contribution](#CONTRIBUTION)
|
||||
|
||||
***Note: The dynamic graphs branch is still under development.
|
||||
Currently, only dynamic graph training, python-end prediction, and C++ prediction are supported.
|
||||
If you need mobile-end deployment cases or quantitative demo,
|
||||
please use the static graph branch.***
|
||||
|
||||
|
||||
<a name="PP-OCR-Pipeline"></a>
|
||||
|
||||
## PP-OCR Pipeline
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ppocr_framework.png" width="800">
|
||||
</div>
|
||||
|
||||
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner and PACT quantization is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
|
||||
|
||||
|
||||
|
||||
## Visualization [more](./doc/doc_en/visualization_en.md)
|
||||
- Chinese OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1102.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1104.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1106.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1105.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- English OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/img_12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
- Multilingual OCR model
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/1110.jpg" width="800">
|
||||
<img src="./doc/imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="language_requests"></a>
|
||||
## Guideline for new language requests
|
||||
|
||||
If you want to request a new language support, a PR with 2 following files are needed:
|
||||
|
||||
1. In folder [ppocr/utils/dict](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/dict),
|
||||
it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder.
|
||||
|
||||
2. In folder [ppocr/utils/corpus](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/corpus),
|
||||
it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language.
|
||||
Maybe, 50000 words per language is necessary at least.
|
||||
Of course, the more, the better.
|
||||
|
||||
If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.
|
||||
|
||||
More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
|
||||
|
||||
|
||||
<a name="LICENSE"></a>
|
||||
## License
|
||||
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
|
||||
|
||||
<a name="CONTRIBUTION"></a>
|
||||
## Contribution
|
||||
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
|
||||
|
||||
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) and [Karl Horky](https://github.com/karlhorky) for contributing and revising the English documentation.
|
||||
- Many thanks to [zhangxin](https://github.com/ZhangXinNan) for contributing the new visualize function、add .gitgnore and discard set PYTHONPATH manually.
|
||||
- Many thanks to [lyl120117](https://github.com/lyl120117) for contributing the code for printing the network structure.
|
||||
- Thanks [xiangyubo](https://github.com/xiangyubo) for contributing the handwritten Chinese OCR datasets.
|
||||
- Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
|
||||
- Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
|
||||
- Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
|
||||
- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment.
|
||||
- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set.
|
||||
- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language.
|
||||
- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。
|
|
@ -0,0 +1,195 @@
|
|||
English | [简体中文](README_ch.md)
|
||||
|
||||
## Style Text
|
||||
|
||||
### Contents
|
||||
- [1. Introduction](#Introduction)
|
||||
- [2. Preparation](#Preparation)
|
||||
- [3. Quick Start](#Quick_Start)
|
||||
- [4. Applications](#Applications)
|
||||
- [5. Code Structure](#Code_structure)
|
||||
|
||||
|
||||
<a name="Introduction"></a>
|
||||
### Introduction
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/3.png" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/9.png" width="600">
|
||||
</div>
|
||||
|
||||
|
||||
The Style-Text data synthesis tool is a tool based on Baidu's self-developed text editing algorithm "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).
|
||||
|
||||
Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
|
||||
* (1) Text foreground style transfer module.
|
||||
* (2) Background extraction module.
|
||||
* (3) Fusion module.
|
||||
|
||||
After these three steps, you can quickly realize the image text style transfer. The following figure is some results of the data synthesis tool.
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/10.png" width="1000">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="Preparation"></a>
|
||||
#### Preparation
|
||||
|
||||
1. Please refer the [QUICK INSTALLATION](../doc/doc_en/installation_en.md) to install PaddlePaddle. Python3 environment is strongly recommended.
|
||||
2. Download the pretrained models and unzip:
|
||||
|
||||
```bash
|
||||
cd StyleText
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
|
||||
unzip style_text_models.zip
|
||||
```
|
||||
|
||||
If you save the model in another location, please modify the address of the model file in `configs/config.yml`, and you need to modify these three configurations at the same time:
|
||||
|
||||
```
|
||||
bg_generator:
|
||||
pretrain: style_text_rec/bg_generator
|
||||
...
|
||||
text_generator:
|
||||
pretrain: style_text_models/text_generator
|
||||
...
|
||||
fusion_generator:
|
||||
pretrain: style_text_models/fusion_generator
|
||||
```
|
||||
|
||||
<a name="Quick_Start"></a>
|
||||
### Quick Start
|
||||
|
||||
#### Synthesis single image
|
||||
|
||||
1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder.
|
||||
|
||||
```python
|
||||
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
|
||||
```
|
||||
|
||||
* Note: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
|
||||
|
||||
For example, enter the following image and corpus `PaddleOCR`.
|
||||
|
||||
<div align="center">
|
||||
<img src="examples/style_images/2.jpg" width="300">
|
||||
</div>
|
||||
|
||||
The result `fake_fusion.jpg` will be generated.
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/4.jpg" width="300">
|
||||
</div>
|
||||
|
||||
What's more, the medium result `fake_bg.jpg` will also be saved, which is the background output.
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/7.jpg" width="300">
|
||||
</div>
|
||||
|
||||
|
||||
`fake_text.jpg` * `fake_text.jpg` is the generated image with the same font style as `Style Input`.
|
||||
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/8.jpg" width="300">
|
||||
</div>
|
||||
|
||||
|
||||
#### Batch synthesis
|
||||
|
||||
In actual application scenarios, it is often necessary to synthesize pictures in batches and add them to the training set. StyleText can use a batch of style pictures and corpus to synthesize data in batches. The synthesis process is as follows:
|
||||
|
||||
1. The referenced dataset can be specifed in `configs/dataset_config.yml`:
|
||||
|
||||
* `Global`:
|
||||
* `output_dir:`:Output synthesis data path.
|
||||
* `StyleSampler`:
|
||||
* `image_home`:style images' folder.
|
||||
* `label_file`:Style images' file list. If label is provided, then it is the label file path.
|
||||
* `with_label`:Whether the `label_file` is label file list.
|
||||
* `CorpusGenerator`:
|
||||
* `method`:Method of CorpusGenerator,supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used,No other configuration is needed,otherwise you need to set `corpus_file` and `language`.
|
||||
* `language`:Language of the corpus.
|
||||
* `corpus_file`: Filepath of the corpus.
|
||||
|
||||
|
||||
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/5.png" width="800">
|
||||
</div>
|
||||
|
||||
2. You can run the following command to start synthesis task:
|
||||
|
||||
``` bash
|
||||
python -m tools.synth_dataset.py -c configs/dataset_config.yml
|
||||
```
|
||||
|
||||
|
||||
<a name="Applications"></a>
|
||||
### Applications
|
||||
We take two scenes as examples, which are metal surface English number recognition and general Korean recognition, to illustrate practical cases of using StyleText to synthesize data to improve text recognition. The following figure shows some examples of real scene images and composite images:
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/11.png" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
After adding the above synthetic data for training, the accuracy of the recognition model is improved, which is shown in the following table:
|
||||
|
||||
|
||||
| Scenario | Characters | Raw Data | Test Data | Only Use Raw Data</br>Recognition Accuracy | New Synthetic Data | Simultaneous Use of Synthetic Data</br>Recognition Accuracy | Index Improvement |
|
||||
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
|
||||
| Metal surface | English and numbers | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% |
|
||||
| Random background | Korean | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% |
|
||||
|
||||
|
||||
<a name="Code_structure"></a>
|
||||
### Code Structure
|
||||
|
||||
```
|
||||
StyleText
|
||||
|-- arch // Network module files.
|
||||
| |-- base_module.py
|
||||
| |-- decoder.py
|
||||
| |-- encoder.py
|
||||
| |-- spectral_norm.py
|
||||
| `-- style_text_rec.py
|
||||
|-- configs // Config files.
|
||||
| |-- config.yml
|
||||
| `-- dataset_config.yml
|
||||
|-- engine // Synthesis engines.
|
||||
| |-- corpus_generators.py // Sample corpus from file or generate random corpus.
|
||||
| |-- predictors.py // Predict using network.
|
||||
| |-- style_samplers.py // Sample style images.
|
||||
| |-- synthesisers.py // Manage other engines to synthesis images.
|
||||
| |-- text_drawers.py // Generate standard input text images.
|
||||
| `-- writers.py // Write synthesis images and labels into files.
|
||||
|-- examples // Example files.
|
||||
| |-- corpus
|
||||
| | `-- example.txt
|
||||
| |-- image_list.txt
|
||||
| `-- style_images
|
||||
| |-- 1.jpg
|
||||
| `-- 2.jpg
|
||||
|-- fonts // Font files.
|
||||
| |-- ch_standard.ttf
|
||||
| |-- en_standard.ttf
|
||||
| `-- ko_standard.ttf
|
||||
|-- tools // Program entrance.
|
||||
| |-- __init__.py
|
||||
| |-- synth_dataset.py // Synthesis dataset.
|
||||
| `-- synth_image.py // Synthesis image.
|
||||
`-- utils // Module of basic functions.
|
||||
|-- config.py
|
||||
|-- load_params.py
|
||||
|-- logging.py
|
||||
|-- math_functions.py
|
||||
`-- sys_funcs.py
|
||||
```
|
|
@ -0,0 +1,179 @@
|
|||
简体中文 | [English](README.md)
|
||||
|
||||
## Style Text
|
||||
|
||||
|
||||
### 目录
|
||||
- [一、工具简介](#工具简介)
|
||||
- [二、环境配置](#环境配置)
|
||||
- [三、快速上手](#快速上手)
|
||||
- [四、应用案例](#应用案例)
|
||||
- [五、代码结构](#代码结构)
|
||||
|
||||
<a name="工具简介"></a>
|
||||
### 一、工具简介
|
||||
<div align="center">
|
||||
<img src="doc/images/3.png" width="800">
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/1.png" width="600">
|
||||
</div>
|
||||
|
||||
|
||||
Style-Text数据合成工具是基于百度自研的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047
|
||||
|
||||
不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/2.png" width="1000">
|
||||
</div>
|
||||
|
||||
<a name="环境配置"></a>
|
||||
### 二、环境配置
|
||||
|
||||
1. 参考[快速安装](../doc/doc_ch/installation.md),安装PaddleOCR。
|
||||
2. 进入`StyleText`目录,下载模型,并解压:
|
||||
|
||||
```bash
|
||||
cd StyleText
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
|
||||
unzip style_text_models.zip
|
||||
```
|
||||
|
||||
如果您将模型保存再其他位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置:
|
||||
|
||||
```
|
||||
bg_generator:
|
||||
pretrain: style_text_models/bg_generator
|
||||
...
|
||||
text_generator:
|
||||
pretrain: style_text_models/text_generator
|
||||
...
|
||||
fusion_generator:
|
||||
pretrain: style_text_models/fusion_generator
|
||||
```
|
||||
|
||||
<a name="快速上手"></a>
|
||||
### 三、快速上手
|
||||
|
||||
#### 合成单张图
|
||||
输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下:
|
||||
|
||||
```python
|
||||
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
|
||||
```
|
||||
* 注意:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
|
||||
|
||||
例如,输入如下图片和语料"PaddleOCR":
|
||||
|
||||
<div align="center">
|
||||
<img src="examples/style_images/2.jpg" width="300">
|
||||
</div>
|
||||
|
||||
生成合成数据`fake_fusion.jpg`:
|
||||
<div align="center">
|
||||
<img src="doc/images/4.jpg" width="300">
|
||||
</div>
|
||||
|
||||
除此之外,程序还会生成并保存中间结果`fake_bg.jpg`:为风格参考图去掉文字后的背景;
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/7.jpg" width="300">
|
||||
</div>
|
||||
|
||||
`fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/8.jpg" width="300">
|
||||
</div>
|
||||
|
||||
#### 批量合成
|
||||
在实际应用场景中,经常需要批量合成图片,补充到训练集中。Style-Text可以使用一批风格图片和语料,批量合成数据。合成过程如下:
|
||||
|
||||
1. 在`configs/dataset_config.yml`中配置目标场景风格图像和语料的路径,具体如下:
|
||||
|
||||
* `Global`:
|
||||
* `output_dir:`:保存合成数据的目录。
|
||||
* `StyleSampler`:
|
||||
* `image_home`:风格图片目录;
|
||||
* `label_file`:风格图片路径列表文件,如果所用数据集有label,则label_file为label文件路径;
|
||||
* `with_label`:标志`label_file`是否为label文件。
|
||||
* `CorpusGenerator`:
|
||||
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`;
|
||||
* `language`:语料的语种;
|
||||
* `corpus_file`: 语料文件路径。
|
||||
|
||||
Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像,便于合成场景丰富的文本图像,下图给出了一些示例。
|
||||
|
||||
中英韩5万张通用场景数据: [下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/5.png" width="800">
|
||||
</div>
|
||||
|
||||
2. 运行`tools/synth_dataset`合成数据:
|
||||
|
||||
``` bash
|
||||
python -m tools.synth_dataset -c configs/dataset_config.yml
|
||||
```
|
||||
|
||||
<a name="应用案例"></a>
|
||||
### 四、应用案例
|
||||
下面以金属表面英文数字识别和通用韩语识别两个场景为例,说明使用Style-Text合成数据,来提升文本识别效果的实际案例。下图给出了一些真实场景图像和合成图像的示例:
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/images/6.png" width="800">
|
||||
</div>
|
||||
|
||||
在添加上述合成数据进行训练后,识别模型的效果提升,如下表所示:
|
||||
|
||||
| 场景 | 字符 | 原始数据 | 测试数据 | 只使用原始数据</br>识别准确率 | 新增合成数据 | 同时使用合成数据</br>识别准确率 | 指标提升 |
|
||||
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
|
||||
| 金属表面 | 英文和数字 | 2203 | 650 | 0.5938 | 20000 | 0.7546 | 16% |
|
||||
| 随机背景 | 韩语 | 5631 | 1230 | 0.3012 | 100000 | 0.5057 | 20% |
|
||||
|
||||
|
||||
<a name="代码结构"></a>
|
||||
### 五、代码结构
|
||||
|
||||
```
|
||||
StyleText
|
||||
|-- arch // 网络结构定义文件
|
||||
| |-- base_module.py
|
||||
| |-- decoder.py
|
||||
| |-- encoder.py
|
||||
| |-- spectral_norm.py
|
||||
| `-- style_text_rec.py
|
||||
|-- configs // 配置文件
|
||||
| |-- config.yml
|
||||
| `-- dataset_config.yml
|
||||
|-- engine // 数据合成引擎
|
||||
| |-- corpus_generators.py // 从文本采样或随机生成语料
|
||||
| |-- predictors.py // 调用网络生成数据
|
||||
| |-- style_samplers.py // 采样风格图片
|
||||
| |-- synthesisers.py // 调度各个模块,合成数据
|
||||
| |-- text_drawers.py // 生成标准文字图片,用作输入
|
||||
| `-- writers.py // 将合成的图片和标签写入本地目录
|
||||
|-- examples // 示例文件
|
||||
| |-- corpus
|
||||
| | `-- example.txt
|
||||
| |-- image_list.txt
|
||||
| `-- style_images
|
||||
| |-- 1.jpg
|
||||
| `-- 2.jpg
|
||||
|-- fonts // 字体文件
|
||||
| |-- ch_standard.ttf
|
||||
| |-- en_standard.ttf
|
||||
| `-- ko_standard.ttf
|
||||
|-- tools // 程序入口
|
||||
| |-- __init__.py
|
||||
| |-- synth_dataset.py // 批量合成数据
|
||||
| `-- synth_image.py // 合成单张图片
|
||||
`-- utils // 其他基础功能模块
|
||||
|-- config.py
|
||||
|-- load_params.py
|
||||
|-- logging.py
|
||||
|-- math_functions.py
|
||||
`-- sys_funcs.py
|
||||
```
|
|
@ -0,0 +1,255 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
|
||||
from arch.spectral_norm import spectral_norm
|
||||
|
||||
|
||||
class CBN(nn.Layer):
|
||||
def __init__(self,
|
||||
name,
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size,
|
||||
stride=1,
|
||||
padding=0,
|
||||
dilation=1,
|
||||
groups=1,
|
||||
use_bias=False,
|
||||
norm_layer=None,
|
||||
act=None,
|
||||
act_attr=None):
|
||||
super(CBN, self).__init__()
|
||||
if use_bias:
|
||||
bias_attr = paddle.ParamAttr(name=name + "_bias")
|
||||
else:
|
||||
bias_attr = None
|
||||
self._conv = paddle.nn.Conv2D(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=kernel_size,
|
||||
stride=stride,
|
||||
padding=padding,
|
||||
dilation=dilation,
|
||||
groups=groups,
|
||||
weight_attr=paddle.ParamAttr(name=name + "_weights"),
|
||||
bias_attr=bias_attr)
|
||||
if norm_layer:
|
||||
self._norm_layer = getattr(paddle.nn, norm_layer)(
|
||||
num_features=out_channels, name=name + "_bn")
|
||||
else:
|
||||
self._norm_layer = None
|
||||
if act:
|
||||
if act_attr:
|
||||
self._act = getattr(paddle.nn, act)(**act_attr,
|
||||
name=name + "_" + act)
|
||||
else:
|
||||
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
|
||||
else:
|
||||
self._act = None
|
||||
|
||||
def forward(self, x):
|
||||
out = self._conv(x)
|
||||
if self._norm_layer:
|
||||
out = self._norm_layer(out)
|
||||
if self._act:
|
||||
out = self._act(out)
|
||||
return out
|
||||
|
||||
|
||||
class SNConv(nn.Layer):
|
||||
def __init__(self,
|
||||
name,
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size,
|
||||
stride=1,
|
||||
padding=0,
|
||||
dilation=1,
|
||||
groups=1,
|
||||
use_bias=False,
|
||||
norm_layer=None,
|
||||
act=None,
|
||||
act_attr=None):
|
||||
super(SNConv, self).__init__()
|
||||
if use_bias:
|
||||
bias_attr = paddle.ParamAttr(name=name + "_bias")
|
||||
else:
|
||||
bias_attr = None
|
||||
self._sn_conv = spectral_norm(
|
||||
paddle.nn.Conv2D(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=kernel_size,
|
||||
stride=stride,
|
||||
padding=padding,
|
||||
dilation=dilation,
|
||||
groups=groups,
|
||||
weight_attr=paddle.ParamAttr(name=name + "_weights"),
|
||||
bias_attr=bias_attr))
|
||||
if norm_layer:
|
||||
self._norm_layer = getattr(paddle.nn, norm_layer)(
|
||||
num_features=out_channels, name=name + "_bn")
|
||||
else:
|
||||
self._norm_layer = None
|
||||
if act:
|
||||
if act_attr:
|
||||
self._act = getattr(paddle.nn, act)(**act_attr,
|
||||
name=name + "_" + act)
|
||||
else:
|
||||
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
|
||||
else:
|
||||
self._act = None
|
||||
|
||||
def forward(self, x):
|
||||
out = self._sn_conv(x)
|
||||
if self._norm_layer:
|
||||
out = self._norm_layer(out)
|
||||
if self._act:
|
||||
out = self._act(out)
|
||||
return out
|
||||
|
||||
|
||||
class SNConvTranspose(nn.Layer):
|
||||
def __init__(self,
|
||||
name,
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size,
|
||||
stride=1,
|
||||
padding=0,
|
||||
output_padding=0,
|
||||
dilation=1,
|
||||
groups=1,
|
||||
use_bias=False,
|
||||
norm_layer=None,
|
||||
act=None,
|
||||
act_attr=None):
|
||||
super(SNConvTranspose, self).__init__()
|
||||
if use_bias:
|
||||
bias_attr = paddle.ParamAttr(name=name + "_bias")
|
||||
else:
|
||||
bias_attr = None
|
||||
self._sn_conv_transpose = spectral_norm(
|
||||
paddle.nn.Conv2DTranspose(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=kernel_size,
|
||||
stride=stride,
|
||||
padding=padding,
|
||||
output_padding=output_padding,
|
||||
dilation=dilation,
|
||||
groups=groups,
|
||||
weight_attr=paddle.ParamAttr(name=name + "_weights"),
|
||||
bias_attr=bias_attr))
|
||||
if norm_layer:
|
||||
self._norm_layer = getattr(paddle.nn, norm_layer)(
|
||||
num_features=out_channels, name=name + "_bn")
|
||||
else:
|
||||
self._norm_layer = None
|
||||
if act:
|
||||
if act_attr:
|
||||
self._act = getattr(paddle.nn, act)(**act_attr,
|
||||
name=name + "_" + act)
|
||||
else:
|
||||
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
|
||||
else:
|
||||
self._act = None
|
||||
|
||||
def forward(self, x):
|
||||
out = self._sn_conv_transpose(x)
|
||||
if self._norm_layer:
|
||||
out = self._norm_layer(out)
|
||||
if self._act:
|
||||
out = self._act(out)
|
||||
return out
|
||||
|
||||
|
||||
class MiddleNet(nn.Layer):
|
||||
def __init__(self, name, in_channels, mid_channels, out_channels,
|
||||
use_bias):
|
||||
super(MiddleNet, self).__init__()
|
||||
self._sn_conv1 = SNConv(
|
||||
name=name + "_sn_conv1",
|
||||
in_channels=in_channels,
|
||||
out_channels=mid_channels,
|
||||
kernel_size=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=None,
|
||||
act=None)
|
||||
self._pad2d = nn.Pad2D(padding=[1, 1, 1, 1], mode="replicate")
|
||||
self._sn_conv2 = SNConv(
|
||||
name=name + "_sn_conv2",
|
||||
in_channels=mid_channels,
|
||||
out_channels=mid_channels,
|
||||
kernel_size=3,
|
||||
use_bias=use_bias)
|
||||
self._sn_conv3 = SNConv(
|
||||
name=name + "_sn_conv3",
|
||||
in_channels=mid_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=1,
|
||||
use_bias=use_bias)
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
sn_conv1 = self._sn_conv1.forward(x)
|
||||
pad_2d = self._pad2d.forward(sn_conv1)
|
||||
sn_conv2 = self._sn_conv2.forward(pad_2d)
|
||||
sn_conv3 = self._sn_conv3.forward(sn_conv2)
|
||||
return sn_conv3
|
||||
|
||||
|
||||
class ResBlock(nn.Layer):
|
||||
def __init__(self, name, channels, norm_layer, use_dropout, use_dilation,
|
||||
use_bias):
|
||||
super(ResBlock, self).__init__()
|
||||
if use_dilation:
|
||||
padding_mat = [1, 1, 1, 1]
|
||||
else:
|
||||
padding_mat = [0, 0, 0, 0]
|
||||
self._pad1 = nn.Pad2D(padding_mat, mode="replicate")
|
||||
|
||||
self._sn_conv1 = SNConv(
|
||||
name=name + "_sn_conv1",
|
||||
in_channels=channels,
|
||||
out_channels=channels,
|
||||
kernel_size=3,
|
||||
padding=0,
|
||||
norm_layer=norm_layer,
|
||||
use_bias=use_bias,
|
||||
act="ReLU",
|
||||
act_attr=None)
|
||||
if use_dropout:
|
||||
self._dropout = nn.Dropout(0.5)
|
||||
else:
|
||||
self._dropout = None
|
||||
self._pad2 = nn.Pad2D([1, 1, 1, 1], mode="replicate")
|
||||
self._sn_conv2 = SNConv(
|
||||
name=name + "_sn_conv2",
|
||||
in_channels=channels,
|
||||
out_channels=channels,
|
||||
kernel_size=3,
|
||||
norm_layer=norm_layer,
|
||||
use_bias=use_bias,
|
||||
act="ReLU",
|
||||
act_attr=None)
|
||||
|
||||
def forward(self, x):
|
||||
pad1 = self._pad1.forward(x)
|
||||
sn_conv1 = self._sn_conv1.forward(pad1)
|
||||
pad2 = self._pad2.forward(sn_conv1)
|
||||
sn_conv2 = self._sn_conv2.forward(pad2)
|
||||
return sn_conv2 + x
|
|
@ -0,0 +1,251 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
|
||||
from arch.base_module import SNConv, SNConvTranspose, ResBlock
|
||||
|
||||
|
||||
class Decoder(nn.Layer):
|
||||
def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer,
|
||||
act, act_attr, conv_block_dropout, conv_block_num,
|
||||
conv_block_dilation, out_conv_act, out_conv_act_attr):
|
||||
super(Decoder, self).__init__()
|
||||
conv_blocks = []
|
||||
for i in range(conv_block_num):
|
||||
conv_blocks.append(
|
||||
ResBlock(
|
||||
name="{}_conv_block_{}".format(name, i),
|
||||
channels=encode_dim * 8,
|
||||
norm_layer=norm_layer,
|
||||
use_dropout=conv_block_dropout,
|
||||
use_dilation=conv_block_dilation,
|
||||
use_bias=use_bias))
|
||||
self.conv_blocks = nn.Sequential(*conv_blocks)
|
||||
self._up1 = SNConvTranspose(
|
||||
name=name + "_up1",
|
||||
in_channels=encode_dim * 8,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up2 = SNConvTranspose(
|
||||
name=name + "_up2",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up3 = SNConvTranspose(
|
||||
name=name + "_up3",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
|
||||
self._out_conv = SNConv(
|
||||
name=name + "_out_conv",
|
||||
in_channels=encode_dim,
|
||||
out_channels=out_channels,
|
||||
kernel_size=3,
|
||||
use_bias=use_bias,
|
||||
norm_layer=None,
|
||||
act=out_conv_act,
|
||||
act_attr=out_conv_act_attr)
|
||||
|
||||
def forward(self, x):
|
||||
if isinstance(x, (list, tuple)):
|
||||
x = paddle.concat(x, axis=1)
|
||||
output_dict = dict()
|
||||
output_dict["conv_blocks"] = self.conv_blocks.forward(x)
|
||||
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
|
||||
output_dict["up2"] = self._up2.forward(output_dict["up1"])
|
||||
output_dict["up3"] = self._up3.forward(output_dict["up2"])
|
||||
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
|
||||
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
|
||||
return output_dict
|
||||
|
||||
|
||||
class DecoderUnet(nn.Layer):
|
||||
def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer,
|
||||
act, act_attr, conv_block_dropout, conv_block_num,
|
||||
conv_block_dilation, out_conv_act, out_conv_act_attr):
|
||||
super(DecoderUnet, self).__init__()
|
||||
conv_blocks = []
|
||||
for i in range(conv_block_num):
|
||||
conv_blocks.append(
|
||||
ResBlock(
|
||||
name="{}_conv_block_{}".format(name, i),
|
||||
channels=encode_dim * 8,
|
||||
norm_layer=norm_layer,
|
||||
use_dropout=conv_block_dropout,
|
||||
use_dilation=conv_block_dilation,
|
||||
use_bias=use_bias))
|
||||
self._conv_blocks = nn.Sequential(*conv_blocks)
|
||||
self._up1 = SNConvTranspose(
|
||||
name=name + "_up1",
|
||||
in_channels=encode_dim * 8,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up2 = SNConvTranspose(
|
||||
name=name + "_up2",
|
||||
in_channels=encode_dim * 8,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up3 = SNConvTranspose(
|
||||
name=name + "_up3",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
|
||||
self._out_conv = SNConv(
|
||||
name=name + "_out_conv",
|
||||
in_channels=encode_dim,
|
||||
out_channels=out_channels,
|
||||
kernel_size=3,
|
||||
use_bias=use_bias,
|
||||
norm_layer=None,
|
||||
act=out_conv_act,
|
||||
act_attr=out_conv_act_attr)
|
||||
|
||||
def forward(self, x, y, feature2, feature1):
|
||||
output_dict = dict()
|
||||
output_dict["conv_blocks"] = self._conv_blocks(
|
||||
paddle.concat(
|
||||
(x, y), axis=1))
|
||||
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
|
||||
output_dict["up2"] = self._up2.forward(
|
||||
paddle.concat(
|
||||
(output_dict["up1"], feature2), axis=1))
|
||||
output_dict["up3"] = self._up3.forward(
|
||||
paddle.concat(
|
||||
(output_dict["up2"], feature1), axis=1))
|
||||
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
|
||||
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
|
||||
return output_dict
|
||||
|
||||
|
||||
class SingleDecoder(nn.Layer):
|
||||
def __init__(self, name, encode_dim, out_channels, use_bias, norm_layer,
|
||||
act, act_attr, conv_block_dropout, conv_block_num,
|
||||
conv_block_dilation, out_conv_act, out_conv_act_attr):
|
||||
super(SingleDecoder, self).__init__()
|
||||
conv_blocks = []
|
||||
for i in range(conv_block_num):
|
||||
conv_blocks.append(
|
||||
ResBlock(
|
||||
name="{}_conv_block_{}".format(name, i),
|
||||
channels=encode_dim * 4,
|
||||
norm_layer=norm_layer,
|
||||
use_dropout=conv_block_dropout,
|
||||
use_dilation=conv_block_dilation,
|
||||
use_bias=use_bias))
|
||||
self._conv_blocks = nn.Sequential(*conv_blocks)
|
||||
self._up1 = SNConvTranspose(
|
||||
name=name + "_up1",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up2 = SNConvTranspose(
|
||||
name=name + "_up2",
|
||||
in_channels=encode_dim * 8,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up3 = SNConvTranspose(
|
||||
name=name + "_up3",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
output_padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
|
||||
self._out_conv = SNConv(
|
||||
name=name + "_out_conv",
|
||||
in_channels=encode_dim,
|
||||
out_channels=out_channels,
|
||||
kernel_size=3,
|
||||
use_bias=use_bias,
|
||||
norm_layer=None,
|
||||
act=out_conv_act,
|
||||
act_attr=out_conv_act_attr)
|
||||
|
||||
def forward(self, x, feature2, feature1):
|
||||
output_dict = dict()
|
||||
output_dict["conv_blocks"] = self._conv_blocks.forward(x)
|
||||
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
|
||||
output_dict["up2"] = self._up2.forward(
|
||||
paddle.concat(
|
||||
(output_dict["up1"], feature2), axis=1))
|
||||
output_dict["up3"] = self._up3.forward(
|
||||
paddle.concat(
|
||||
(output_dict["up2"], feature1), axis=1))
|
||||
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
|
||||
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
|
||||
return output_dict
|
|
@ -0,0 +1,186 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
|
||||
from arch.base_module import SNConv, SNConvTranspose, ResBlock
|
||||
|
||||
|
||||
class Encoder(nn.Layer):
|
||||
def __init__(self, name, in_channels, encode_dim, use_bias, norm_layer,
|
||||
act, act_attr, conv_block_dropout, conv_block_num,
|
||||
conv_block_dilation):
|
||||
super(Encoder, self).__init__()
|
||||
self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate")
|
||||
self._in_conv = SNConv(
|
||||
name=name + "_in_conv",
|
||||
in_channels=in_channels,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=7,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down1 = SNConv(
|
||||
name=name + "_down1",
|
||||
in_channels=encode_dim,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down2 = SNConv(
|
||||
name=name + "_down2",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down3 = SNConv(
|
||||
name=name + "_down3",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
conv_blocks = []
|
||||
for i in range(conv_block_num):
|
||||
conv_blocks.append(
|
||||
ResBlock(
|
||||
name="{}_conv_block_{}".format(name, i),
|
||||
channels=encode_dim * 4,
|
||||
norm_layer=norm_layer,
|
||||
use_dropout=conv_block_dropout,
|
||||
use_dilation=conv_block_dilation,
|
||||
use_bias=use_bias))
|
||||
self._conv_blocks = nn.Sequential(*conv_blocks)
|
||||
|
||||
def forward(self, x):
|
||||
out_dict = dict()
|
||||
x = self._pad2d(x)
|
||||
out_dict["in_conv"] = self._in_conv.forward(x)
|
||||
out_dict["down1"] = self._down1.forward(out_dict["in_conv"])
|
||||
out_dict["down2"] = self._down2.forward(out_dict["down1"])
|
||||
out_dict["down3"] = self._down3.forward(out_dict["down2"])
|
||||
out_dict["res_blocks"] = self._conv_blocks.forward(out_dict["down3"])
|
||||
return out_dict
|
||||
|
||||
|
||||
class EncoderUnet(nn.Layer):
|
||||
def __init__(self, name, in_channels, encode_dim, use_bias, norm_layer,
|
||||
act, act_attr):
|
||||
super(EncoderUnet, self).__init__()
|
||||
self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate")
|
||||
self._in_conv = SNConv(
|
||||
name=name + "_in_conv",
|
||||
in_channels=in_channels,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=7,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down1 = SNConv(
|
||||
name=name + "_down1",
|
||||
in_channels=encode_dim,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down2 = SNConv(
|
||||
name=name + "_down2",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down3 = SNConv(
|
||||
name=name + "_down3",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._down4 = SNConv(
|
||||
name=name + "_down4",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up1 = SNConvTranspose(
|
||||
name=name + "_up1",
|
||||
in_channels=encode_dim * 2,
|
||||
out_channels=encode_dim * 2,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
self._up2 = SNConvTranspose(
|
||||
name=name + "_up2",
|
||||
in_channels=encode_dim * 4,
|
||||
out_channels=encode_dim * 4,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act=act,
|
||||
act_attr=act_attr)
|
||||
|
||||
def forward(self, x):
|
||||
output_dict = dict()
|
||||
x = self._pad2d(x)
|
||||
output_dict['in_conv'] = self._in_conv.forward(x)
|
||||
output_dict['down1'] = self._down1.forward(output_dict['in_conv'])
|
||||
output_dict['down2'] = self._down2.forward(output_dict['down1'])
|
||||
output_dict['down3'] = self._down3.forward(output_dict['down2'])
|
||||
output_dict['down4'] = self._down4.forward(output_dict['down3'])
|
||||
output_dict['up1'] = self._up1.forward(output_dict['down4'])
|
||||
output_dict['up2'] = self._up2.forward(
|
||||
paddle.concat(
|
||||
(output_dict['down3'], output_dict['up1']), axis=1))
|
||||
output_dict['concat'] = paddle.concat(
|
||||
(output_dict['down2'], output_dict['up2']), axis=1)
|
||||
return output_dict
|
|
@ -0,0 +1,150 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
import paddle.nn.functional as F
|
||||
|
||||
|
||||
def normal_(x, mean=0., std=1.):
|
||||
temp_value = paddle.normal(mean, std, shape=x.shape)
|
||||
x.set_value(temp_value)
|
||||
return x
|
||||
|
||||
|
||||
class SpectralNorm(object):
|
||||
def __init__(self, name='weight', n_power_iterations=1, dim=0, eps=1e-12):
|
||||
self.name = name
|
||||
self.dim = dim
|
||||
if n_power_iterations <= 0:
|
||||
raise ValueError('Expected n_power_iterations to be positive, but '
|
||||
'got n_power_iterations={}'.format(
|
||||
n_power_iterations))
|
||||
self.n_power_iterations = n_power_iterations
|
||||
self.eps = eps
|
||||
|
||||
def reshape_weight_to_matrix(self, weight):
|
||||
weight_mat = weight
|
||||
if self.dim != 0:
|
||||
# transpose dim to front
|
||||
weight_mat = weight_mat.transpose([
|
||||
self.dim,
|
||||
* [d for d in range(weight_mat.dim()) if d != self.dim]
|
||||
])
|
||||
|
||||
height = weight_mat.shape[0]
|
||||
|
||||
return weight_mat.reshape([height, -1])
|
||||
|
||||
def compute_weight(self, module, do_power_iteration):
|
||||
weight = getattr(module, self.name + '_orig')
|
||||
u = getattr(module, self.name + '_u')
|
||||
v = getattr(module, self.name + '_v')
|
||||
weight_mat = self.reshape_weight_to_matrix(weight)
|
||||
|
||||
if do_power_iteration:
|
||||
with paddle.no_grad():
|
||||
for _ in range(self.n_power_iterations):
|
||||
v.set_value(
|
||||
F.normalize(
|
||||
paddle.matmul(
|
||||
weight_mat,
|
||||
u,
|
||||
transpose_x=True,
|
||||
transpose_y=False),
|
||||
axis=0,
|
||||
epsilon=self.eps, ))
|
||||
|
||||
u.set_value(
|
||||
F.normalize(
|
||||
paddle.matmul(weight_mat, v),
|
||||
axis=0,
|
||||
epsilon=self.eps, ))
|
||||
if self.n_power_iterations > 0:
|
||||
u = u.clone()
|
||||
v = v.clone()
|
||||
|
||||
sigma = paddle.dot(u, paddle.mv(weight_mat, v))
|
||||
weight = weight / sigma
|
||||
return weight
|
||||
|
||||
def remove(self, module):
|
||||
with paddle.no_grad():
|
||||
weight = self.compute_weight(module, do_power_iteration=False)
|
||||
delattr(module, self.name)
|
||||
delattr(module, self.name + '_u')
|
||||
delattr(module, self.name + '_v')
|
||||
delattr(module, self.name + '_orig')
|
||||
|
||||
module.add_parameter(self.name, weight.detach())
|
||||
|
||||
def __call__(self, module, inputs):
|
||||
setattr(
|
||||
module,
|
||||
self.name,
|
||||
self.compute_weight(
|
||||
module, do_power_iteration=module.training))
|
||||
|
||||
@staticmethod
|
||||
def apply(module, name, n_power_iterations, dim, eps):
|
||||
for k, hook in module._forward_pre_hooks.items():
|
||||
if isinstance(hook, SpectralNorm) and hook.name == name:
|
||||
raise RuntimeError(
|
||||
"Cannot register two spectral_norm hooks on "
|
||||
"the same parameter {}".format(name))
|
||||
|
||||
fn = SpectralNorm(name, n_power_iterations, dim, eps)
|
||||
weight = module._parameters[name]
|
||||
|
||||
with paddle.no_grad():
|
||||
weight_mat = fn.reshape_weight_to_matrix(weight)
|
||||
h, w = weight_mat.shape
|
||||
|
||||
# randomly initialize u and v
|
||||
u = module.create_parameter([h])
|
||||
u = normal_(u, 0., 1.)
|
||||
v = module.create_parameter([w])
|
||||
v = normal_(v, 0., 1.)
|
||||
u = F.normalize(u, axis=0, epsilon=fn.eps)
|
||||
v = F.normalize(v, axis=0, epsilon=fn.eps)
|
||||
|
||||
# delete fn.name form parameters, otherwise you can not set attribute
|
||||
del module._parameters[fn.name]
|
||||
module.add_parameter(fn.name + "_orig", weight)
|
||||
# still need to assign weight back as fn.name because all sorts of
|
||||
# things may assume that it exists, e.g., when initializing weights.
|
||||
# However, we can't directly assign as it could be an Parameter and
|
||||
# gets added as a parameter. Instead, we register weight * 1.0 as a plain
|
||||
# attribute.
|
||||
setattr(module, fn.name, weight * 1.0)
|
||||
module.register_buffer(fn.name + "_u", u)
|
||||
module.register_buffer(fn.name + "_v", v)
|
||||
|
||||
module.register_forward_pre_hook(fn)
|
||||
return fn
|
||||
|
||||
|
||||
def spectral_norm(module,
|
||||
name='weight',
|
||||
n_power_iterations=1,
|
||||
eps=1e-12,
|
||||
dim=None):
|
||||
|
||||
if dim is None:
|
||||
if isinstance(module, (nn.Conv1DTranspose, nn.Conv2DTranspose,
|
||||
nn.Conv3DTranspose, nn.Linear)):
|
||||
dim = 1
|
||||
else:
|
||||
dim = 0
|
||||
SpectralNorm.apply(module, name, n_power_iterations, dim, eps)
|
||||
return module
|
|
@ -0,0 +1,285 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
|
||||
from arch.base_module import MiddleNet, ResBlock
|
||||
from arch.encoder import Encoder
|
||||
from arch.decoder import Decoder, DecoderUnet, SingleDecoder
|
||||
from utils.load_params import load_dygraph_pretrain
|
||||
from utils.logging import get_logger
|
||||
|
||||
|
||||
class StyleTextRec(nn.Layer):
|
||||
def __init__(self, config):
|
||||
super(StyleTextRec, self).__init__()
|
||||
self.logger = get_logger()
|
||||
self.text_generator = TextGenerator(config["Predictor"][
|
||||
"text_generator"])
|
||||
self.bg_generator = BgGeneratorWithMask(config["Predictor"][
|
||||
"bg_generator"])
|
||||
self.fusion_generator = FusionGeneratorSimple(config["Predictor"][
|
||||
"fusion_generator"])
|
||||
bg_generator_pretrain = config["Predictor"]["bg_generator"]["pretrain"]
|
||||
text_generator_pretrain = config["Predictor"]["text_generator"][
|
||||
"pretrain"]
|
||||
fusion_generator_pretrain = config["Predictor"]["fusion_generator"][
|
||||
"pretrain"]
|
||||
load_dygraph_pretrain(
|
||||
self.bg_generator,
|
||||
self.logger,
|
||||
path=bg_generator_pretrain,
|
||||
load_static_weights=False)
|
||||
load_dygraph_pretrain(
|
||||
self.text_generator,
|
||||
self.logger,
|
||||
path=text_generator_pretrain,
|
||||
load_static_weights=False)
|
||||
load_dygraph_pretrain(
|
||||
self.fusion_generator,
|
||||
self.logger,
|
||||
path=fusion_generator_pretrain,
|
||||
load_static_weights=False)
|
||||
|
||||
def forward(self, style_input, text_input):
|
||||
text_gen_output = self.text_generator.forward(style_input, text_input)
|
||||
fake_text = text_gen_output["fake_text"]
|
||||
fake_sk = text_gen_output["fake_sk"]
|
||||
bg_gen_output = self.bg_generator.forward(style_input)
|
||||
bg_encode_feature = bg_gen_output["bg_encode_feature"]
|
||||
bg_decode_feature1 = bg_gen_output["bg_decode_feature1"]
|
||||
bg_decode_feature2 = bg_gen_output["bg_decode_feature2"]
|
||||
fake_bg = bg_gen_output["fake_bg"]
|
||||
|
||||
fusion_gen_output = self.fusion_generator.forward(fake_text, fake_bg)
|
||||
fake_fusion = fusion_gen_output["fake_fusion"]
|
||||
return {
|
||||
"fake_fusion": fake_fusion,
|
||||
"fake_text": fake_text,
|
||||
"fake_sk": fake_sk,
|
||||
"fake_bg": fake_bg,
|
||||
}
|
||||
|
||||
|
||||
class TextGenerator(nn.Layer):
|
||||
def __init__(self, config):
|
||||
super(TextGenerator, self).__init__()
|
||||
name = config["module_name"]
|
||||
encode_dim = config["encode_dim"]
|
||||
norm_layer = config["norm_layer"]
|
||||
conv_block_dropout = config["conv_block_dropout"]
|
||||
conv_block_num = config["conv_block_num"]
|
||||
conv_block_dilation = config["conv_block_dilation"]
|
||||
if norm_layer == "InstanceNorm2D":
|
||||
use_bias = True
|
||||
else:
|
||||
use_bias = False
|
||||
self.encoder_text = Encoder(
|
||||
name=name + "_encoder_text",
|
||||
in_channels=3,
|
||||
encode_dim=encode_dim,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation)
|
||||
self.encoder_style = Encoder(
|
||||
name=name + "_encoder_style",
|
||||
in_channels=3,
|
||||
encode_dim=encode_dim,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation)
|
||||
self.decoder_text = Decoder(
|
||||
name=name + "_decoder_text",
|
||||
encode_dim=encode_dim,
|
||||
out_channels=int(encode_dim / 2),
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation,
|
||||
out_conv_act="Tanh",
|
||||
out_conv_act_attr=None)
|
||||
self.decoder_sk = Decoder(
|
||||
name=name + "_decoder_sk",
|
||||
encode_dim=encode_dim,
|
||||
out_channels=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation,
|
||||
out_conv_act="Sigmoid",
|
||||
out_conv_act_attr=None)
|
||||
|
||||
self.middle = MiddleNet(
|
||||
name=name + "_middle_net",
|
||||
in_channels=int(encode_dim / 2) + 1,
|
||||
mid_channels=encode_dim,
|
||||
out_channels=3,
|
||||
use_bias=use_bias)
|
||||
|
||||
def forward(self, style_input, text_input):
|
||||
style_feature = self.encoder_style.forward(style_input)["res_blocks"]
|
||||
text_feature = self.encoder_text.forward(text_input)["res_blocks"]
|
||||
fake_c_temp = self.decoder_text.forward([text_feature,
|
||||
style_feature])["out_conv"]
|
||||
fake_sk = self.decoder_sk.forward([text_feature,
|
||||
style_feature])["out_conv"]
|
||||
fake_text = self.middle(paddle.concat((fake_c_temp, fake_sk), axis=1))
|
||||
return {"fake_sk": fake_sk, "fake_text": fake_text}
|
||||
|
||||
|
||||
class BgGeneratorWithMask(nn.Layer):
|
||||
def __init__(self, config):
|
||||
super(BgGeneratorWithMask, self).__init__()
|
||||
name = config["module_name"]
|
||||
encode_dim = config["encode_dim"]
|
||||
norm_layer = config["norm_layer"]
|
||||
conv_block_dropout = config["conv_block_dropout"]
|
||||
conv_block_num = config["conv_block_num"]
|
||||
conv_block_dilation = config["conv_block_dilation"]
|
||||
self.output_factor = config.get("output_factor", 1.0)
|
||||
|
||||
if norm_layer == "InstanceNorm2D":
|
||||
use_bias = True
|
||||
else:
|
||||
use_bias = False
|
||||
|
||||
self.encoder_bg = Encoder(
|
||||
name=name + "_encoder_bg",
|
||||
in_channels=3,
|
||||
encode_dim=encode_dim,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation)
|
||||
|
||||
self.decoder_bg = SingleDecoder(
|
||||
name=name + "_decoder_bg",
|
||||
encode_dim=encode_dim,
|
||||
out_channels=3,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation,
|
||||
out_conv_act="Tanh",
|
||||
out_conv_act_attr=None)
|
||||
|
||||
self.decoder_mask = Decoder(
|
||||
name=name + "_decoder_mask",
|
||||
encode_dim=encode_dim // 2,
|
||||
out_channels=1,
|
||||
use_bias=use_bias,
|
||||
norm_layer=norm_layer,
|
||||
act="ReLU",
|
||||
act_attr=None,
|
||||
conv_block_dropout=conv_block_dropout,
|
||||
conv_block_num=conv_block_num,
|
||||
conv_block_dilation=conv_block_dilation,
|
||||
out_conv_act="Sigmoid",
|
||||
out_conv_act_attr=None)
|
||||
|
||||
self.middle = MiddleNet(
|
||||
name=name + "_middle_net",
|
||||
in_channels=3 + 1,
|
||||
mid_channels=encode_dim,
|
||||
out_channels=3,
|
||||
use_bias=use_bias)
|
||||
|
||||
def forward(self, style_input):
|
||||
encode_bg_output = self.encoder_bg(style_input)
|
||||
decode_bg_output = self.decoder_bg(encode_bg_output["res_blocks"],
|
||||
encode_bg_output["down2"],
|
||||
encode_bg_output["down1"])
|
||||
|
||||
fake_c_temp = decode_bg_output["out_conv"]
|
||||
fake_bg_mask = self.decoder_mask.forward(encode_bg_output[
|
||||
"res_blocks"])["out_conv"]
|
||||
fake_bg = self.middle(
|
||||
paddle.concat(
|
||||
(fake_c_temp, fake_bg_mask), axis=1))
|
||||
return {
|
||||
"bg_encode_feature": encode_bg_output["res_blocks"],
|
||||
"bg_decode_feature1": decode_bg_output["up1"],
|
||||
"bg_decode_feature2": decode_bg_output["up2"],
|
||||
"fake_bg": fake_bg,
|
||||
"fake_bg_mask": fake_bg_mask,
|
||||
}
|
||||
|
||||
|
||||
class FusionGeneratorSimple(nn.Layer):
|
||||
def __init__(self, config):
|
||||
super(FusionGeneratorSimple, self).__init__()
|
||||
name = config["module_name"]
|
||||
encode_dim = config["encode_dim"]
|
||||
norm_layer = config["norm_layer"]
|
||||
conv_block_dropout = config["conv_block_dropout"]
|
||||
conv_block_dilation = config["conv_block_dilation"]
|
||||
if norm_layer == "InstanceNorm2D":
|
||||
use_bias = True
|
||||
else:
|
||||
use_bias = False
|
||||
|
||||
self._conv = nn.Conv2D(
|
||||
in_channels=6,
|
||||
out_channels=encode_dim,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
groups=1,
|
||||
weight_attr=paddle.ParamAttr(name=name + "_conv_weights"),
|
||||
bias_attr=False)
|
||||
|
||||
self._res_block = ResBlock(
|
||||
name="{}_conv_block".format(name),
|
||||
channels=encode_dim,
|
||||
norm_layer=norm_layer,
|
||||
use_dropout=conv_block_dropout,
|
||||
use_dilation=conv_block_dilation,
|
||||
use_bias=use_bias)
|
||||
|
||||
self._reduce_conv = nn.Conv2D(
|
||||
in_channels=encode_dim,
|
||||
out_channels=3,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
groups=1,
|
||||
weight_attr=paddle.ParamAttr(name=name + "_reduce_conv_weights"),
|
||||
bias_attr=False)
|
||||
|
||||
def forward(self, fake_text, fake_bg):
|
||||
fake_concat = paddle.concat((fake_text, fake_bg), axis=1)
|
||||
fake_concat_tmp = self._conv(fake_concat)
|
||||
output_res = self._res_block(fake_concat_tmp)
|
||||
fake_fusion = self._reduce_conv(output_res)
|
||||
return {"fake_fusion": fake_fusion}
|
|
@ -0,0 +1,54 @@
|
|||
Global:
|
||||
output_num: 10
|
||||
output_dir: output_data
|
||||
use_gpu: false
|
||||
image_height: 32
|
||||
image_width: 320
|
||||
TextDrawer:
|
||||
fonts:
|
||||
en: fonts/en_standard.ttf
|
||||
ch: fonts/ch_standard.ttf
|
||||
ko: fonts/ko_standard.ttf
|
||||
Predictor:
|
||||
method: StyleTextRecPredictor
|
||||
algorithm: StyleTextRec
|
||||
scale: 0.00392156862745098
|
||||
mean:
|
||||
- 0.5
|
||||
- 0.5
|
||||
- 0.5
|
||||
std:
|
||||
- 0.5
|
||||
- 0.5
|
||||
- 0.5
|
||||
expand_result: false
|
||||
bg_generator:
|
||||
pretrain: style_text_models/bg_generator
|
||||
module_name: bg_generator
|
||||
generator_type: BgGeneratorWithMask
|
||||
encode_dim: 64
|
||||
norm_layer: null
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
output_factor: 1.05
|
||||
text_generator:
|
||||
pretrain: style_text_models/text_generator
|
||||
module_name: text_generator
|
||||
generator_type: TextGenerator
|
||||
encode_dim: 64
|
||||
norm_layer: InstanceNorm2D
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
fusion_generator:
|
||||
pretrain: style_text_models/fusion_generator
|
||||
module_name: fusion_generator
|
||||
generator_type: FusionGeneratorSimple
|
||||
encode_dim: 64
|
||||
norm_layer: null
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
Writer:
|
||||
method: SimpleWriter
|
|
@ -0,0 +1,64 @@
|
|||
Global:
|
||||
output_num: 10
|
||||
output_dir: output_data
|
||||
use_gpu: false
|
||||
image_height: 32
|
||||
image_width: 320
|
||||
standard_font: fonts/en_standard.ttf
|
||||
TextDrawer:
|
||||
fonts:
|
||||
en: fonts/en_standard.ttf
|
||||
ch: fonts/ch_standard.ttf
|
||||
ko: fonts/ko_standard.ttf
|
||||
StyleSampler:
|
||||
method: DatasetSampler
|
||||
image_home: examples
|
||||
label_file: examples/image_list.txt
|
||||
with_label: true
|
||||
CorpusGenerator:
|
||||
method: FileCorpus
|
||||
language: ch
|
||||
corpus_file: examples/corpus/example.txt
|
||||
Predictor:
|
||||
method: StyleTextRecPredictor
|
||||
algorithm: StyleTextRec
|
||||
scale: 0.00392156862745098
|
||||
mean:
|
||||
- 0.5
|
||||
- 0.5
|
||||
- 0.5
|
||||
std:
|
||||
- 0.5
|
||||
- 0.5
|
||||
- 0.5
|
||||
expand_result: false
|
||||
bg_generator:
|
||||
pretrain: models/style_text_rec/bg_generator
|
||||
module_name: bg_generator
|
||||
generator_type: BgGeneratorWithMask
|
||||
encode_dim: 64
|
||||
norm_layer: null
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
output_factor: 1.05
|
||||
text_generator:
|
||||
pretrain: models/style_text_rec/text_generator
|
||||
module_name: text_generator
|
||||
generator_type: TextGenerator
|
||||
encode_dim: 64
|
||||
norm_layer: InstanceNorm2D
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
fusion_generator:
|
||||
pretrain: models/style_text_rec/fusion_generator
|
||||
module_name: fusion_generator
|
||||
generator_type: FusionGeneratorSimple
|
||||
encode_dim: 64
|
||||
norm_layer: null
|
||||
conv_block_num: 4
|
||||
conv_block_dropout: false
|
||||
conv_block_dilation: true
|
||||
Writer:
|
||||
method: SimpleWriter
|
After ![]() (image error) Size: 168 KiB |
After ![]() (image error) Size: 192 KiB |
After ![]() (image error) Size: 126 KiB |
After ![]() (image error) Size: 201 KiB |
After ![]() (image error) Size: 68 KiB |
After ![]() (image error) Size: 2.6 KiB |
After ![]() (image error) Size: 118 KiB |
After ![]() (image error) Size: 125 KiB |
After ![]() (image error) Size: 1.5 KiB |
After ![]() (image error) Size: 2.4 KiB |
After ![]() (image error) Size: 154 KiB |
|
@ -0,0 +1,66 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import random
|
||||
|
||||
from utils.logging import get_logger
|
||||
|
||||
|
||||
class FileCorpus(object):
|
||||
def __init__(self, config):
|
||||
self.logger = get_logger()
|
||||
self.logger.info("using FileCorpus")
|
||||
|
||||
self.char_list = " 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
|
||||
|
||||
corpus_file = config["CorpusGenerator"]["corpus_file"]
|
||||
self.language = config["CorpusGenerator"]["language"]
|
||||
with open(corpus_file, 'r') as f:
|
||||
corpus_raw = f.read()
|
||||
self.corpus_list = corpus_raw.split("\n")[:-1]
|
||||
assert len(self.corpus_list) > 0
|
||||
random.shuffle(self.corpus_list)
|
||||
self.index = 0
|
||||
|
||||
def generate(self, corpus_length=0):
|
||||
if self.index >= len(self.corpus_list):
|
||||
self.index = 0
|
||||
random.shuffle(self.corpus_list)
|
||||
corpus = self.corpus_list[self.index]
|
||||
if corpus_length != 0:
|
||||
corpus = corpus[0:corpus_length]
|
||||
if corpus_length > len(corpus):
|
||||
self.logger.warning("generated corpus is shorter than expected.")
|
||||
self.index += 1
|
||||
return self.language, corpus
|
||||
|
||||
|
||||
class EnNumCorpus(object):
|
||||
def __init__(self, config):
|
||||
self.logger = get_logger()
|
||||
self.logger.info("using NumberCorpus")
|
||||
self.num_list = "0123456789"
|
||||
self.en_char_list = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
|
||||
self.height = config["Global"]["image_height"]
|
||||
self.max_width = config["Global"]["image_width"]
|
||||
|
||||
def generate(self, corpus_length=0):
|
||||
corpus = ""
|
||||
if corpus_length == 0:
|
||||
corpus_length = random.randint(5, 15)
|
||||
for i in range(corpus_length):
|
||||
if random.random() < 0.2:
|
||||
corpus += "{}".format(random.choice(self.en_char_list))
|
||||
else:
|
||||
corpus += "{}".format(random.choice(self.num_list))
|
||||
return "en", corpus
|
|
@ -0,0 +1,115 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import numpy as np
|
||||
import cv2
|
||||
import math
|
||||
import paddle
|
||||
|
||||
from arch import style_text_rec
|
||||
from utils.sys_funcs import check_gpu
|
||||
from utils.logging import get_logger
|
||||
|
||||
|
||||
class StyleTextRecPredictor(object):
|
||||
def __init__(self, config):
|
||||
algorithm = config['Predictor']['algorithm']
|
||||
assert algorithm in ["StyleTextRec"
|
||||
], "Generator {} not supported.".format(algorithm)
|
||||
use_gpu = config["Global"]['use_gpu']
|
||||
check_gpu(use_gpu)
|
||||
self.logger = get_logger()
|
||||
self.generator = getattr(style_text_rec, algorithm)(config)
|
||||
self.height = config["Global"]["image_height"]
|
||||
self.width = config["Global"]["image_width"]
|
||||
self.scale = config["Predictor"]["scale"]
|
||||
self.mean = config["Predictor"]["mean"]
|
||||
self.std = config["Predictor"]["std"]
|
||||
self.expand_result = config["Predictor"]["expand_result"]
|
||||
|
||||
def predict(self, style_input, text_input):
|
||||
style_input = self.rep_style_input(style_input, text_input)
|
||||
tensor_style_input = self.preprocess(style_input)
|
||||
tensor_text_input = self.preprocess(text_input)
|
||||
style_text_result = self.generator.forward(tensor_style_input,
|
||||
tensor_text_input)
|
||||
fake_fusion = self.postprocess(style_text_result["fake_fusion"])
|
||||
fake_text = self.postprocess(style_text_result["fake_text"])
|
||||
fake_sk = self.postprocess(style_text_result["fake_sk"])
|
||||
fake_bg = self.postprocess(style_text_result["fake_bg"])
|
||||
bbox = self.get_text_boundary(fake_text)
|
||||
if bbox:
|
||||
left, right, top, bottom = bbox
|
||||
fake_fusion = fake_fusion[top:bottom, left:right, :]
|
||||
fake_text = fake_text[top:bottom, left:right, :]
|
||||
fake_sk = fake_sk[top:bottom, left:right, :]
|
||||
fake_bg = fake_bg[top:bottom, left:right, :]
|
||||
|
||||
# fake_fusion = self.crop_by_text(img_fake_fusion, img_fake_text)
|
||||
return {
|
||||
"fake_fusion": fake_fusion,
|
||||
"fake_text": fake_text,
|
||||
"fake_sk": fake_sk,
|
||||
"fake_bg": fake_bg,
|
||||
}
|
||||
|
||||
def preprocess(self, img):
|
||||
img = (img.astype('float32') * self.scale - self.mean) / self.std
|
||||
img_height, img_width, channel = img.shape
|
||||
assert channel == 3, "Please use an rgb image."
|
||||
ratio = img_width / float(img_height)
|
||||
if math.ceil(self.height * ratio) > self.width:
|
||||
resized_w = self.width
|
||||
else:
|
||||
resized_w = int(math.ceil(self.height * ratio))
|
||||
img = cv2.resize(img, (resized_w, self.height))
|
||||
|
||||
new_img = np.zeros([self.height, self.width, 3]).astype('float32')
|
||||
new_img[:, 0:resized_w, :] = img
|
||||
img = new_img.transpose((2, 0, 1))
|
||||
img = img[np.newaxis, :, :, :]
|
||||
return paddle.to_tensor(img)
|
||||
|
||||
def postprocess(self, tensor):
|
||||
img = tensor.numpy()[0]
|
||||
img = img.transpose((1, 2, 0))
|
||||
img = (img * self.std + self.mean) / self.scale
|
||||
img = np.maximum(img, 0.0)
|
||||
img = np.minimum(img, 255.0)
|
||||
img = img.astype('uint8')
|
||||
return img
|
||||
|
||||
def rep_style_input(self, style_input, text_input):
|
||||
rep_num = int(1.2 * (text_input.shape[1] / text_input.shape[0]) /
|
||||
(style_input.shape[1] / style_input.shape[0])) + 1
|
||||
style_input = np.tile(style_input, reps=[1, rep_num, 1])
|
||||
max_width = int(self.width / self.height * style_input.shape[0])
|
||||
style_input = style_input[:, :max_width, :]
|
||||
return style_input
|
||||
|
||||
def get_text_boundary(self, text_img):
|
||||
img_height = text_img.shape[0]
|
||||
img_width = text_img.shape[1]
|
||||
bounder = 3
|
||||
text_canny_img = cv2.Canny(text_img, 10, 20)
|
||||
edge_num_h = text_canny_img.sum(axis=0)
|
||||
no_zero_list_h = np.where(edge_num_h > 0)[0]
|
||||
edge_num_w = text_canny_img.sum(axis=1)
|
||||
no_zero_list_w = np.where(edge_num_w > 0)[0]
|
||||
if len(no_zero_list_h) == 0 or len(no_zero_list_w) == 0:
|
||||
return None
|
||||
left = max(no_zero_list_h[0] - bounder, 0)
|
||||
right = min(no_zero_list_h[-1] + bounder, img_width)
|
||||
top = max(no_zero_list_w[0] - bounder, 0)
|
||||
bottom = min(no_zero_list_w[-1] + bounder, img_height)
|
||||
return [left, right, top, bottom]
|
|
@ -0,0 +1,62 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import numpy as np
|
||||
import random
|
||||
import cv2
|
||||
|
||||
|
||||
class DatasetSampler(object):
|
||||
def __init__(self, config):
|
||||
self.image_home = config["StyleSampler"]["image_home"]
|
||||
label_file = config["StyleSampler"]["label_file"]
|
||||
self.dataset_with_label = config["StyleSampler"]["with_label"]
|
||||
self.height = config["Global"]["image_height"]
|
||||
self.index = 0
|
||||
with open(label_file, "r") as f:
|
||||
label_raw = f.read()
|
||||
self.path_label_list = label_raw.split("\n")[:-1]
|
||||
assert len(self.path_label_list) > 0
|
||||
random.shuffle(self.path_label_list)
|
||||
|
||||
def sample(self):
|
||||
if self.index >= len(self.path_label_list):
|
||||
random.shuffle(self.path_label_list)
|
||||
self.index = 0
|
||||
if self.dataset_with_label:
|
||||
path_label = self.path_label_list[self.index]
|
||||
rel_image_path, label = path_label.split('\t')
|
||||
else:
|
||||
rel_image_path = self.path_label_list[self.index]
|
||||
label = None
|
||||
img_path = "{}/{}".format(self.image_home, rel_image_path)
|
||||
image = cv2.imread(img_path)
|
||||
origin_height = image.shape[0]
|
||||
ratio = self.height / origin_height
|
||||
width = int(image.shape[1] * ratio)
|
||||
height = int(image.shape[0] * ratio)
|
||||
image = cv2.resize(image, (width, height))
|
||||
|
||||
self.index += 1
|
||||
if label:
|
||||
return {"image": image, "label": label}
|
||||
else:
|
||||
return {"image": image}
|
||||
|
||||
|
||||
def duplicate_image(image, width):
|
||||
image_width = image.shape[1]
|
||||
dup_num = width // image_width + 1
|
||||
image = np.tile(image, reps=[1, dup_num, 1])
|
||||
cropped_image = image[:, :width, :]
|
||||
return cropped_image
|
|
@ -0,0 +1,71 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
|
||||
from utils.config import ArgsParser, load_config, override_config
|
||||
from utils.logging import get_logger
|
||||
from engine import style_samplers, corpus_generators, text_drawers, predictors, writers
|
||||
|
||||
|
||||
class ImageSynthesiser(object):
|
||||
def __init__(self):
|
||||
self.FLAGS = ArgsParser().parse_args()
|
||||
self.config = load_config(self.FLAGS.config)
|
||||
self.config = override_config(self.config, options=self.FLAGS.override)
|
||||
self.output_dir = self.config["Global"]["output_dir"]
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.mkdir(self.output_dir)
|
||||
self.logger = get_logger(
|
||||
log_file='{}/predict.log'.format(self.output_dir))
|
||||
|
||||
self.text_drawer = text_drawers.StdTextDrawer(self.config)
|
||||
|
||||
predictor_method = self.config["Predictor"]["method"]
|
||||
assert predictor_method is not None
|
||||
self.predictor = getattr(predictors, predictor_method)(self.config)
|
||||
|
||||
def synth_image(self, corpus, style_input, language="en"):
|
||||
corpus, text_input = self.text_drawer.draw_text(corpus, language)
|
||||
synth_result = self.predictor.predict(style_input, text_input)
|
||||
return synth_result
|
||||
|
||||
|
||||
class DatasetSynthesiser(ImageSynthesiser):
|
||||
def __init__(self):
|
||||
super(DatasetSynthesiser, self).__init__()
|
||||
self.tag = self.FLAGS.tag
|
||||
self.output_num = self.config["Global"]["output_num"]
|
||||
corpus_generator_method = self.config["CorpusGenerator"]["method"]
|
||||
self.corpus_generator = getattr(corpus_generators,
|
||||
corpus_generator_method)(self.config)
|
||||
|
||||
style_sampler_method = self.config["StyleSampler"]["method"]
|
||||
assert style_sampler_method is not None
|
||||
self.style_sampler = style_samplers.DatasetSampler(self.config)
|
||||
self.writer = writers.SimpleWriter(self.config, self.tag)
|
||||
|
||||
def synth_dataset(self):
|
||||
for i in range(self.output_num):
|
||||
style_data = self.style_sampler.sample()
|
||||
style_input = style_data["image"]
|
||||
corpus_language, text_input_label = self.corpus_generator.generate(
|
||||
)
|
||||
text_input_label, text_input = self.text_drawer.draw_text(
|
||||
text_input_label, corpus_language)
|
||||
|
||||
synth_result = self.predictor.predict(style_input, text_input)
|
||||
fake_fusion = synth_result["fake_fusion"]
|
||||
self.writer.save_image(fake_fusion, text_input_label)
|
||||
self.writer.save_label()
|
||||
self.writer.merge_label()
|
|
@ -0,0 +1,57 @@
|
|||
from PIL import Image, ImageDraw, ImageFont
|
||||
import numpy as np
|
||||
from utils.logging import get_logger
|
||||
|
||||
|
||||
class StdTextDrawer(object):
|
||||
def __init__(self, config):
|
||||
self.logger = get_logger()
|
||||
self.max_width = config["Global"]["image_width"]
|
||||
self.char_list = " 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
|
||||
self.height = config["Global"]["image_height"]
|
||||
self.font_dict = {}
|
||||
self.load_fonts(config["TextDrawer"]["fonts"])
|
||||
self.support_languages = list(self.font_dict)
|
||||
|
||||
def load_fonts(self, fonts_config):
|
||||
for language in fonts_config:
|
||||
font_path = fonts_config[language]
|
||||
font_height = self.get_valid_height(font_path)
|
||||
font = ImageFont.truetype(font_path, font_height)
|
||||
self.font_dict[language] = font
|
||||
|
||||
def get_valid_height(self, font_path):
|
||||
font = ImageFont.truetype(font_path, self.height - 4)
|
||||
_, font_height = font.getsize(self.char_list)
|
||||
if font_height <= self.height - 4:
|
||||
return self.height - 4
|
||||
else:
|
||||
return int((self.height - 4)**2 / font_height)
|
||||
|
||||
def draw_text(self, corpus, language="en", crop=True):
|
||||
if language not in self.support_languages:
|
||||
self.logger.warning(
|
||||
"language {} not supported, use en instead.".format(language))
|
||||
language = "en"
|
||||
if crop:
|
||||
width = min(self.max_width, len(corpus) * self.height) + 4
|
||||
else:
|
||||
width = len(corpus) * self.height + 4
|
||||
bg = Image.new("RGB", (width, self.height), color=(127, 127, 127))
|
||||
draw = ImageDraw.Draw(bg)
|
||||
|
||||
char_x = 2
|
||||
font = self.font_dict[language]
|
||||
for i, char_i in enumerate(corpus):
|
||||
char_size = font.getsize(char_i)[0]
|
||||
draw.text((char_x, 2), char_i, fill=(0, 0, 0), font=font)
|
||||
char_x += char_size
|
||||
if char_x >= width:
|
||||
corpus = corpus[0:i + 1]
|
||||
self.logger.warning("corpus length exceed limit: {}".format(
|
||||
corpus))
|
||||
break
|
||||
|
||||
text_input = np.array(bg).astype(np.uint8)
|
||||
text_input = text_input[:, 0:char_x, :]
|
||||
return corpus, text_input
|
|
@ -0,0 +1,71 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import cv2
|
||||
import glob
|
||||
|
||||
from utils.logging import get_logger
|
||||
|
||||
|
||||
class SimpleWriter(object):
|
||||
def __init__(self, config, tag):
|
||||
self.logger = get_logger()
|
||||
self.output_dir = config["Global"]["output_dir"]
|
||||
self.counter = 0
|
||||
self.label_dict = {}
|
||||
self.tag = tag
|
||||
self.label_file_index = 0
|
||||
|
||||
def save_image(self, image, text_input_label):
|
||||
image_home = os.path.join(self.output_dir, "images", self.tag)
|
||||
if not os.path.exists(image_home):
|
||||
os.makedirs(image_home)
|
||||
|
||||
image_path = os.path.join(image_home, "{}.png".format(self.counter))
|
||||
# todo support continue synth
|
||||
cv2.imwrite(image_path, image)
|
||||
self.logger.info("generate image: {}".format(image_path))
|
||||
|
||||
image_name = os.path.join(self.tag, "{}.png".format(self.counter))
|
||||
self.label_dict[image_name] = text_input_label
|
||||
|
||||
self.counter += 1
|
||||
if not self.counter % 100:
|
||||
self.save_label()
|
||||
|
||||
def save_label(self):
|
||||
label_raw = ""
|
||||
label_home = os.path.join(self.output_dir, "label")
|
||||
if not os.path.exists(label_home):
|
||||
os.mkdir(label_home)
|
||||
for image_path in self.label_dict:
|
||||
label = self.label_dict[image_path]
|
||||
label_raw += "{}\t{}\n".format(image_path, label)
|
||||
label_file_path = os.path.join(label_home,
|
||||
"{}_label.txt".format(self.tag))
|
||||
with open(label_file_path, "w") as f:
|
||||
f.write(label_raw)
|
||||
self.label_file_index += 1
|
||||
|
||||
def merge_label(self):
|
||||
label_raw = ""
|
||||
label_file_regex = os.path.join(self.output_dir, "label",
|
||||
"*_label.txt")
|
||||
label_file_list = glob.glob(label_file_regex)
|
||||
for label_file_i in label_file_list:
|
||||
with open(label_file_i, "r") as f:
|
||||
label_raw += f.read()
|
||||
label_file_path = os.path.join(self.output_dir, "label.txt")
|
||||
with open(label_file_path, "w") as f:
|
||||
f.write(label_raw)
|
|
@ -0,0 +1,2 @@
|
|||
PaddleOCR
|
||||
飞桨文字识别
|
|
@ -0,0 +1,2 @@
|
|||
style_images/1.jpg NEATNESS
|
||||
style_images/2.jpg 锁店君和宾馆
|
After ![]() (image error) Size: 2.5 KiB |
After ![]() (image error) Size: 3.3 KiB |
|
@ -0,0 +1,23 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from engine.synthesisers import DatasetSynthesiser
|
||||
|
||||
|
||||
def synth_dataset():
|
||||
dataset_synthesiser = DatasetSynthesiser()
|
||||
dataset_synthesiser.synth_dataset()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
synth_dataset()
|
|
@ -0,0 +1,82 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import cv2
|
||||
import sys
|
||||
import glob
|
||||
|
||||
from utils.config import ArgsParser
|
||||
from engine.synthesisers import ImageSynthesiser
|
||||
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
|
||||
|
||||
|
||||
def synth_image():
|
||||
args = ArgsParser().parse_args()
|
||||
image_synthesiser = ImageSynthesiser()
|
||||
style_image_path = args.style_image
|
||||
img = cv2.imread(style_image_path)
|
||||
text_corpus = args.text_corpus
|
||||
language = args.language
|
||||
|
||||
synth_result = image_synthesiser.synth_image(text_corpus, img, language)
|
||||
fake_fusion = synth_result["fake_fusion"]
|
||||
fake_text = synth_result["fake_text"]
|
||||
fake_bg = synth_result["fake_bg"]
|
||||
cv2.imwrite("fake_fusion.jpg", fake_fusion)
|
||||
cv2.imwrite("fake_text.jpg", fake_text)
|
||||
cv2.imwrite("fake_bg.jpg", fake_bg)
|
||||
|
||||
|
||||
def batch_synth_images():
|
||||
image_synthesiser = ImageSynthesiser()
|
||||
|
||||
corpus_file = "../StyleTextRec_data/test_20201208/test_text_list.txt"
|
||||
style_data_dir = "../StyleTextRec_data/test_20201208/style_images/"
|
||||
save_path = "./output_data/"
|
||||
corpus_list = []
|
||||
with open(corpus_file, "rb") as fin:
|
||||
lines = fin.readlines()
|
||||
for line in lines:
|
||||
substr = line.decode("utf-8").strip("\n").split("\t")
|
||||
corpus_list.append(substr)
|
||||
style_img_list = glob.glob("{}/*.jpg".format(style_data_dir))
|
||||
corpus_num = len(corpus_list)
|
||||
style_img_num = len(style_img_list)
|
||||
for cno in range(corpus_num):
|
||||
for sno in range(style_img_num):
|
||||
corpus, lang = corpus_list[cno]
|
||||
style_img_path = style_img_list[sno]
|
||||
img = cv2.imread(style_img_path)
|
||||
synth_result = image_synthesiser.synth_image(corpus, img, lang)
|
||||
fake_fusion = synth_result["fake_fusion"]
|
||||
fake_text = synth_result["fake_text"]
|
||||
fake_bg = synth_result["fake_bg"]
|
||||
for tp in range(2):
|
||||
if tp == 0:
|
||||
prefix = "%s/c%d_s%d_" % (save_path, cno, sno)
|
||||
else:
|
||||
prefix = "%s/s%d_c%d_" % (save_path, sno, cno)
|
||||
cv2.imwrite("%s_fake_fusion.jpg" % prefix, fake_fusion)
|
||||
cv2.imwrite("%s_fake_text.jpg" % prefix, fake_text)
|
||||
cv2.imwrite("%s_fake_bg.jpg" % prefix, fake_bg)
|
||||
cv2.imwrite("%s_input_style.jpg" % prefix, img)
|
||||
print(cno, corpus_num, sno, style_img_num)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# batch_synth_images()
|
||||
synth_image()
|
|
@ -0,0 +1,224 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import yaml
|
||||
import os
|
||||
from argparse import ArgumentParser, RawDescriptionHelpFormatter
|
||||
|
||||
|
||||
def override(dl, ks, v):
|
||||
"""
|
||||
Recursively replace dict of list
|
||||
|
||||
Args:
|
||||
dl(dict or list): dict or list to be replaced
|
||||
ks(list): list of keys
|
||||
v(str): value to be replaced
|
||||
"""
|
||||
|
||||
def str2num(v):
|
||||
try:
|
||||
return eval(v)
|
||||
except Exception:
|
||||
return v
|
||||
|
||||
assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
|
||||
assert len(ks) > 0, ('lenght of keys should larger than 0')
|
||||
if isinstance(dl, list):
|
||||
k = str2num(ks[0])
|
||||
if len(ks) == 1:
|
||||
assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
|
||||
dl[k] = str2num(v)
|
||||
else:
|
||||
override(dl[k], ks[1:], v)
|
||||
else:
|
||||
if len(ks) == 1:
|
||||
#assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
|
||||
if not ks[0] in dl:
|
||||
logger.warning('A new filed ({}) detected!'.format(ks[0], dl))
|
||||
dl[ks[0]] = str2num(v)
|
||||
else:
|
||||
assert ks[0] in dl, (
|
||||
'({}) doesn\'t exist in {}, a new dict field is invalid'.
|
||||
format(ks[0], dl))
|
||||
override(dl[ks[0]], ks[1:], v)
|
||||
|
||||
|
||||
def override_config(config, options=None):
|
||||
"""
|
||||
Recursively override the config
|
||||
|
||||
Args:
|
||||
config(dict): dict to be replaced
|
||||
options(list): list of pairs(key0.key1.idx.key2=value)
|
||||
such as: [
|
||||
'topk=2',
|
||||
'VALID.transforms.1.ResizeImage.resize_short=300'
|
||||
]
|
||||
|
||||
Returns:
|
||||
config(dict): replaced config
|
||||
"""
|
||||
if options is not None:
|
||||
for opt in options:
|
||||
assert isinstance(opt, str), (
|
||||
"option({}) should be a str".format(opt))
|
||||
assert "=" in opt, (
|
||||
"option({}) should contain a ="
|
||||
"to distinguish between key and value".format(opt))
|
||||
pair = opt.split('=')
|
||||
assert len(pair) == 2, ("there can be only a = in the option")
|
||||
key, value = pair
|
||||
keys = key.split('.')
|
||||
override(config, keys, value)
|
||||
|
||||
return config
|
||||
|
||||
|
||||
class ArgsParser(ArgumentParser):
|
||||
def __init__(self):
|
||||
super(ArgsParser, self).__init__(
|
||||
formatter_class=RawDescriptionHelpFormatter)
|
||||
self.add_argument("-c", "--config", help="configuration file to use")
|
||||
self.add_argument(
|
||||
"-t", "--tag", default="0", help="tag for marking worker")
|
||||
self.add_argument(
|
||||
'-o',
|
||||
'--override',
|
||||
action='append',
|
||||
default=[],
|
||||
help='config options to be overridden')
|
||||
self.add_argument(
|
||||
"--style_image", default="examples/style_images/1.jpg", help="tag for marking worker")
|
||||
self.add_argument(
|
||||
"--text_corpus", default="PaddleOCR", help="tag for marking worker")
|
||||
self.add_argument(
|
||||
"--language", default="en", help="tag for marking worker")
|
||||
|
||||
def parse_args(self, argv=None):
|
||||
args = super(ArgsParser, self).parse_args(argv)
|
||||
assert args.config is not None, \
|
||||
"Please specify --config=configure_file_path."
|
||||
return args
|
||||
|
||||
|
||||
def load_config(file_path):
|
||||
"""
|
||||
Load config from yml/yaml file.
|
||||
Args:
|
||||
file_path (str): Path of the config file to be loaded.
|
||||
Returns: config
|
||||
"""
|
||||
ext = os.path.splitext(file_path)[1]
|
||||
assert ext in ['.yml', '.yaml'], "only support yaml files for now"
|
||||
with open(file_path, 'rb') as f:
|
||||
config = yaml.load(f, Loader=yaml.Loader)
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def gen_config():
|
||||
base_config = {
|
||||
"Global": {
|
||||
"algorithm": "SRNet",
|
||||
"use_gpu": True,
|
||||
"start_epoch": 1,
|
||||
"stage1_epoch_num": 100,
|
||||
"stage2_epoch_num": 100,
|
||||
"log_smooth_window": 20,
|
||||
"print_batch_step": 2,
|
||||
"save_model_dir": "./output/SRNet",
|
||||
"use_visualdl": False,
|
||||
"save_epoch_step": 10,
|
||||
"vgg_pretrain": "./pretrained/VGG19_pretrained",
|
||||
"vgg_load_static_pretrain": True
|
||||
},
|
||||
"Architecture": {
|
||||
"model_type": "data_aug",
|
||||
"algorithm": "SRNet",
|
||||
"net_g": {
|
||||
"name": "srnet_net_g",
|
||||
"encode_dim": 64,
|
||||
"norm": "batch",
|
||||
"use_dropout": False,
|
||||
"init_type": "xavier",
|
||||
"init_gain": 0.02,
|
||||
"use_dilation": 1
|
||||
},
|
||||
# input_nc, ndf, netD,
|
||||
# n_layers_D=3, norm='instance', use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_id='cuda:0'
|
||||
"bg_discriminator": {
|
||||
"name": "srnet_bg_discriminator",
|
||||
"input_nc": 6,
|
||||
"ndf": 64,
|
||||
"netD": "basic",
|
||||
"norm": "none",
|
||||
"init_type": "xavier",
|
||||
},
|
||||
"fusion_discriminator": {
|
||||
"name": "srnet_fusion_discriminator",
|
||||
"input_nc": 6,
|
||||
"ndf": 64,
|
||||
"netD": "basic",
|
||||
"norm": "none",
|
||||
"init_type": "xavier",
|
||||
}
|
||||
},
|
||||
"Loss": {
|
||||
"lamb": 10,
|
||||
"perceptual_lamb": 1,
|
||||
"muvar_lamb": 50,
|
||||
"style_lamb": 500
|
||||
},
|
||||
"Optimizer": {
|
||||
"name": "Adam",
|
||||
"learning_rate": {
|
||||
"name": "lambda",
|
||||
"lr": 0.0002,
|
||||
"lr_decay_iters": 50
|
||||
},
|
||||
"beta1": 0.5,
|
||||
"beta2": 0.999,
|
||||
},
|
||||
"Train": {
|
||||
"batch_size_per_card": 8,
|
||||
"num_workers_per_card": 4,
|
||||
"dataset": {
|
||||
"delimiter": "\t",
|
||||
"data_dir": "/",
|
||||
"label_file": "tmp/label.txt",
|
||||
"transforms": [{
|
||||
"DecodeImage": {
|
||||
"to_rgb": True,
|
||||
"to_np": False,
|
||||
"channel_first": False
|
||||
}
|
||||
}, {
|
||||
"NormalizeImage": {
|
||||
"scale": 1. / 255.,
|
||||
"mean": [0.485, 0.456, 0.406],
|
||||
"std": [0.229, 0.224, 0.225],
|
||||
"order": None
|
||||
}
|
||||
}, {
|
||||
"ToCHWImage": None
|
||||
}]
|
||||
}
|
||||
}
|
||||
}
|
||||
with open("config.yml", "w") as f:
|
||||
yaml.dump(base_config, f)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
gen_config()
|
|
@ -0,0 +1,27 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import paddle
|
||||
|
||||
__all__ = ['load_dygraph_pretrain']
|
||||
|
||||
|
||||
def load_dygraph_pretrain(model, logger, path=None, load_static_weights=False):
|
||||
if not os.path.exists(path + '.pdparams'):
|
||||
raise ValueError("Model pretrain path {} does not "
|
||||
"exists.".format(path))
|
||||
param_state_dict = paddle.load(path + '.pdparams')
|
||||
model.set_state_dict(param_state_dict)
|
||||
logger.info("load pretrained model from {}".format(path))
|
||||
return
|
|
@ -0,0 +1,65 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import os
|
||||
import sys
|
||||
import logging
|
||||
import functools
|
||||
import paddle.distributed as dist
|
||||
|
||||
logger_initialized = {}
|
||||
|
||||
|
||||
@functools.lru_cache()
|
||||
def get_logger(name='srnet', log_file=None, log_level=logging.INFO):
|
||||
"""Initialize and get a logger by name.
|
||||
If the logger has not been initialized, this method will initialize the
|
||||
logger by adding one or two handlers, otherwise the initialized logger will
|
||||
be directly returned. During initialization, a StreamHandler will always be
|
||||
added. If `log_file` is specified a FileHandler will also be added.
|
||||
Args:
|
||||
name (str): Logger name.
|
||||
log_file (str | None): The log filename. If specified, a FileHandler
|
||||
will be added to the logger.
|
||||
log_level (int): The logger level. Note that only the process of
|
||||
rank 0 is affected, and other processes will set the level to
|
||||
"Error" thus be silent most of the time.
|
||||
Returns:
|
||||
logging.Logger: The expected logger.
|
||||
"""
|
||||
logger = logging.getLogger(name)
|
||||
if name in logger_initialized:
|
||||
return logger
|
||||
for logger_name in logger_initialized:
|
||||
if name.startswith(logger_name):
|
||||
return logger
|
||||
|
||||
formatter = logging.Formatter(
|
||||
'[%(asctime)s] %(name)s %(levelname)s: %(message)s',
|
||||
datefmt="%Y/%m/%d %H:%M:%S")
|
||||
|
||||
stream_handler = logging.StreamHandler(stream=sys.stdout)
|
||||
stream_handler.setFormatter(formatter)
|
||||
logger.addHandler(stream_handler)
|
||||
if log_file is not None and dist.get_rank() == 0:
|
||||
log_file_folder = os.path.split(log_file)[0]
|
||||
os.makedirs(log_file_folder, exist_ok=True)
|
||||
file_handler = logging.FileHandler(log_file, 'a')
|
||||
file_handler.setFormatter(formatter)
|
||||
logger.addHandler(file_handler)
|
||||
if dist.get_rank() == 0:
|
||||
logger.setLevel(log_level)
|
||||
else:
|
||||
logger.setLevel(logging.ERROR)
|
||||
logger_initialized[name] = True
|
||||
return logger
|
|
@ -0,0 +1,45 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import paddle
|
||||
|
||||
|
||||
def compute_mean_covariance(img):
|
||||
batch_size = img.shape[0]
|
||||
channel_num = img.shape[1]
|
||||
height = img.shape[2]
|
||||
width = img.shape[3]
|
||||
num_pixels = height * width
|
||||
|
||||
# batch_size * channel_num * 1 * 1
|
||||
mu = img.mean(2, keepdim=True).mean(3, keepdim=True)
|
||||
|
||||
# batch_size * channel_num * num_pixels
|
||||
img_hat = img - mu.expand_as(img)
|
||||
img_hat = img_hat.reshape([batch_size, channel_num, num_pixels])
|
||||
# batch_size * num_pixels * channel_num
|
||||
img_hat_transpose = img_hat.transpose([0, 2, 1])
|
||||
# batch_size * channel_num * channel_num
|
||||
covariance = paddle.bmm(img_hat, img_hat_transpose)
|
||||
covariance = covariance / num_pixels
|
||||
|
||||
return mu, covariance
|
||||
|
||||
|
||||
def dice_coefficient(y_true_cls, y_pred_cls, training_mask):
|
||||
eps = 1e-5
|
||||
intersection = paddle.sum(y_true_cls * y_pred_cls * training_mask)
|
||||
union = paddle.sum(y_true_cls * training_mask) + paddle.sum(
|
||||
y_pred_cls * training_mask) + eps
|
||||
loss = 1. - (2 * intersection / union)
|
||||
return loss
|
|
@ -0,0 +1,67 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import sys
|
||||
import os
|
||||
import errno
|
||||
import paddle
|
||||
|
||||
|
||||
def get_check_global_params(mode):
|
||||
check_params = [
|
||||
'use_gpu', 'max_text_length', 'image_shape', 'image_shape',
|
||||
'character_type', 'loss_type'
|
||||
]
|
||||
if mode == "train_eval":
|
||||
check_params = check_params + [
|
||||
'train_batch_size_per_card', 'test_batch_size_per_card'
|
||||
]
|
||||
elif mode == "test":
|
||||
check_params = check_params + ['test_batch_size_per_card']
|
||||
return check_params
|
||||
|
||||
|
||||
def check_gpu(use_gpu):
|
||||
"""
|
||||
Log error and exit when set use_gpu=true in paddlepaddle
|
||||
cpu version.
|
||||
"""
|
||||
err = "Config use_gpu cannot be set as true while you are " \
|
||||
"using paddlepaddle cpu version ! \nPlease try: \n" \
|
||||
"\t1. Install paddlepaddle-gpu to run model on GPU \n" \
|
||||
"\t2. Set use_gpu as false in config file to run " \
|
||||
"model on CPU"
|
||||
if use_gpu:
|
||||
try:
|
||||
if not paddle.is_compiled_with_cuda():
|
||||
print(err)
|
||||
sys.exit(1)
|
||||
except:
|
||||
print("Fail to check gpu state.")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def _mkdir_if_not_exist(path, logger):
|
||||
"""
|
||||
mkdir if not exists, ignore the exception when multiprocess mkdir together
|
||||
"""
|
||||
if not os.path.exists(path):
|
||||
try:
|
||||
os.makedirs(path)
|
||||
except OSError as e:
|
||||
if e.errno == errno.EEXIST and os.path.isdir(path):
|
||||
logger.warning(
|
||||
'be happy if some process has already created {}'.format(
|
||||
path))
|
||||
else:
|
||||
raise OSError('Failed to mkdir {}'.format(path))
|
|
@ -61,8 +61,8 @@ Train:
|
|||
dataset:
|
||||
name: SimpleDataSet
|
||||
data_dir: ./train_data/
|
||||
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
|
||||
data_ratio_list: [0.5, 0.5]
|
||||
label_file_list: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
|
||||
ratio_list: [0.1, 0.45, 0.3, 0.15]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
|
|
|
@ -60,8 +60,8 @@ Metric:
|
|||
Train:
|
||||
dataset:
|
||||
name: SimpleDataSet
|
||||
label_file_list: [./train_data/icdar2013/train_label_json.txt, ./train_data/icdar2015/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
|
||||
ratio_list: [0.1, 0.45, 0.3, 0.15]
|
||||
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
|
||||
data_ratio_list: [0.5, 0.5]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
|
|
|
@ -36,12 +36,13 @@ Architecture:
|
|||
algorithm: CRNN
|
||||
Transform:
|
||||
Backbone:
|
||||
name: ResNet
|
||||
layers: 34
|
||||
name: MobileNetV3
|
||||
scale: 0.5
|
||||
model_name: large
|
||||
Neck:
|
||||
name: SequenceEncoder
|
||||
encoder_type: rnn
|
||||
hidden_size: 256
|
||||
hidden_size: 96
|
||||
Head:
|
||||
name: CTCHead
|
||||
fc_decay: 0
|
||||
|
|
|
@ -12,7 +12,7 @@ def read_params():
|
|||
cfg = Config()
|
||||
|
||||
#params for text classifier
|
||||
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
|
||||
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v2.0_cls_infer/"
|
||||
cfg.cls_image_shape = "3, 48, 192"
|
||||
cfg.label_list = ['0', '180']
|
||||
cfg.cls_batch_num = 30
|
||||
|
|
|
@ -13,7 +13,7 @@ def read_params():
|
|||
|
||||
#params for text detector
|
||||
cfg.det_algorithm = "DB"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/"
|
||||
cfg.det_limit_side_len = 960
|
||||
cfg.det_limit_type = 'max'
|
||||
|
||||
|
@ -27,16 +27,6 @@ def read_params():
|
|||
# cfg.det_east_cover_thresh = 0.1
|
||||
# cfg.det_east_nms_thresh = 0.2
|
||||
|
||||
# #params for text recognizer
|
||||
# cfg.rec_algorithm = "CRNN"
|
||||
# cfg.rec_model_dir = "./inference/ch_det_mv3_crnn/"
|
||||
|
||||
# cfg.rec_image_shape = "3, 32, 320"
|
||||
# cfg.rec_char_type = 'ch'
|
||||
# cfg.rec_batch_num = 30
|
||||
# cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
|
||||
# cfg.use_space_char = True
|
||||
|
||||
cfg.use_zero_copy_run = False
|
||||
cfg.use_pdserving = False
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ def read_params():
|
|||
|
||||
#params for text detector
|
||||
cfg.det_algorithm = "DB"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v1.1_det_infer/"
|
||||
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/"
|
||||
cfg.det_limit_side_len = 960
|
||||
cfg.det_limit_type = 'max'
|
||||
|
||||
|
@ -29,7 +29,7 @@ def read_params():
|
|||
|
||||
#params for text recognizer
|
||||
cfg.rec_algorithm = "CRNN"
|
||||
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v1.1_rec_infer/"
|
||||
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v2.0_rec_infer/"
|
||||
|
||||
cfg.rec_image_shape = "3, 32, 320"
|
||||
cfg.rec_char_type = 'ch'
|
||||
|
@ -41,7 +41,7 @@ def read_params():
|
|||
|
||||
#params for text classifier
|
||||
cfg.use_angle_cls = True
|
||||
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v1.1_cls_infer/"
|
||||
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v2.0_cls_infer/"
|
||||
cfg.cls_image_shape = "3, 48, 192"
|
||||
cfg.label_list = ['0', '180']
|
||||
cfg.cls_batch_num = 30
|
||||
|
@ -49,5 +49,6 @@ def read_params():
|
|||
|
||||
cfg.use_zero_copy_run = False
|
||||
cfg.use_pdserving = False
|
||||
cfg.drop_score = 0.5
|
||||
|
||||
return cfg
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
PaddleOCR提供2种服务部署方式:
|
||||
- 基于PaddleHub Serving的部署:代码路径为"`./deploy/hubserving`",按照本教程使用;
|
||||
- 基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../deploy/pdserving/readme.md)。
|
||||
- (coming soon)基于PaddleServing的部署:代码路径为"`./deploy/pdserving`",使用方法参考[文档](../../deploy/pdserving/readme.md)。
|
||||
|
||||
# 基于PaddleHub Serving的服务部署
|
||||
|
||||
|
@ -33,11 +33,11 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
```
|
||||
|
||||
### 2. 下载推理模型
|
||||
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v1.1版的超轻量模型,默认模型路径为:
|
||||
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v2.0版的超轻量模型,默认模型路径为:
|
||||
```
|
||||
检测模型:./inference/ch_ppocr_mobile_v1.1_det_infer/
|
||||
识别模型:./inference/ch_ppocr_mobile_v1.1_rec_infer/
|
||||
方向分类器:./inference/ch_ppocr_mobile_v1.1_cls_infer/
|
||||
检测模型:./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
识别模型:./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
方向分类器:./inference/ch_ppocr_mobile_v2.0_cls_infer/
|
||||
```
|
||||
|
||||
**模型路径可在`params.py`中查看和修改。** 更多模型可以从PaddleOCR提供的[模型库](../../doc/doc_ch/models_list.md)下载,也可以替换成自己训练转换好的模型。
|
||||
|
|
|
@ -2,7 +2,7 @@ English | [简体中文](readme.md)
|
|||
|
||||
PaddleOCR provides 2 service deployment methods:
|
||||
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please follow this tutorial.
|
||||
- Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../deploy/pdserving/readme.md) for usage.
|
||||
- (coming soon)Based on **PaddleServing**: Code path is "`./deploy/pdserving`". Please refer to the [tutorial](../../deploy/pdserving/readme.md) for usage.
|
||||
|
||||
# Service deployment based on PaddleHub Serving
|
||||
|
||||
|
@ -34,11 +34,11 @@ pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
```
|
||||
|
||||
### 2. Download inference model
|
||||
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v1.1 is used, and the default model path is:
|
||||
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v2.0 is used, and the default model path is:
|
||||
```
|
||||
detection model: ./inference/ch_ppocr_mobile_v1.1_det_infer/
|
||||
recognition model: ./inference/ch_ppocr_mobile_v1.1_rec_infer/
|
||||
text direction classifier: ./inference/ch_ppocr_mobile_v1.1_cls_infer/
|
||||
detection model: ./inference/ch_ppocr_mobile_v2.0_det_infer/
|
||||
recognition model: ./inference/ch_ppocr_mobile_v2.0_rec_infer/
|
||||
text direction classifier: ./inference/ch_ppocr_mobile_v2.0_cls_infer/
|
||||
```
|
||||
|
||||
**The model path can be found and modified in `params.py`.** More models provided by PaddleOCR can be obtained from the [model library](../../doc/doc_en/models_list_en.md). You can also use models trained by yourself.
|
||||
|
|
|
@ -1,79 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_det(self):
|
||||
self.det_preprocess = Sequential([
|
||||
ResizeByFactor(32, 960), Div(255),
|
||||
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
|
||||
(2, 0, 1))
|
||||
])
|
||||
self.filter_func = FilterBoxes(10, 10)
|
||||
self.post_func = DBPostProcess({
|
||||
"thresh": 0.3,
|
||||
"box_thresh": 0.5,
|
||||
"max_candidates": 1000,
|
||||
"unclip_ratio": 1.5,
|
||||
"min_size": 3
|
||||
})
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
data = base64.b64decode(feed[0]["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
self.ori_h, self.ori_w, _ = im.shape
|
||||
det_img = self.det_preprocess(im)
|
||||
_, self.new_h, self.new_w = det_img.shape
|
||||
return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"]
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
det_out = fetch_map["concat_1.tmp_0"]
|
||||
ratio_list = [
|
||||
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
|
||||
]
|
||||
dt_boxes_list = self.post_func(det_out, [ratio_list])
|
||||
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
|
||||
return {"dt_boxes": dt_boxes.tolist()}
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_det_model")
|
||||
ocr_service.init_det()
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.set_gpus("0")
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
ocr_service.run_debugger_service(gpu=True)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292)
|
||||
ocr_service.run_debugger_service()
|
||||
ocr_service.init_det()
|
||||
ocr_service.run_web_service()
|
|
@ -1,78 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_det(self):
|
||||
self.det_preprocess = Sequential([
|
||||
ResizeByFactor(32, 960), Div(255),
|
||||
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
|
||||
(2, 0, 1))
|
||||
])
|
||||
self.filter_func = FilterBoxes(10, 10)
|
||||
self.post_func = DBPostProcess({
|
||||
"thresh": 0.3,
|
||||
"box_thresh": 0.5,
|
||||
"max_candidates": 1000,
|
||||
"unclip_ratio": 1.5,
|
||||
"min_size": 3
|
||||
})
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
data = base64.b64decode(feed[0]["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
self.ori_h, self.ori_w, _ = im.shape
|
||||
det_img = self.det_preprocess(im)
|
||||
_, self.new_h, self.new_w = det_img.shape
|
||||
print(det_img)
|
||||
return {"image": det_img}, ["concat_1.tmp_0"]
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
det_out = fetch_map["concat_1.tmp_0"]
|
||||
ratio_list = [
|
||||
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
|
||||
]
|
||||
dt_boxes_list = self.post_func(det_out, [ratio_list])
|
||||
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
|
||||
return {"dt_boxes": dt_boxes.tolist()}
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_det_model")
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.set_gpus("0")
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
|
||||
ocr_service.init_det()
|
||||
ocr_service.run_rpc_service()
|
||||
ocr_service.run_web_service()
|
|
@ -1,114 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import OCRReader
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
from paddle_serving_app.local_predict import Debugger
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_det_debugger(self, det_model_config):
|
||||
self.det_preprocess = Sequential([
|
||||
ResizeByFactor(32, 960), Div(255),
|
||||
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
|
||||
(2, 0, 1))
|
||||
])
|
||||
self.det_client = Debugger()
|
||||
if sys.argv[1] == 'gpu':
|
||||
self.det_client.load_model_config(
|
||||
det_model_config, gpu=True, profile=False)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
self.det_client.load_model_config(
|
||||
det_model_config, gpu=False, profile=False)
|
||||
self.ocr_reader = OCRReader()
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
data = base64.b64decode(feed[0]["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
ori_h, ori_w, _ = im.shape
|
||||
det_img = self.det_preprocess(im)
|
||||
_, new_h, new_w = det_img.shape
|
||||
det_img = det_img[np.newaxis, :]
|
||||
det_img = det_img.copy()
|
||||
det_out = self.det_client.predict(
|
||||
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
|
||||
filter_func = FilterBoxes(10, 10)
|
||||
post_func = DBPostProcess({
|
||||
"thresh": 0.3,
|
||||
"box_thresh": 0.5,
|
||||
"max_candidates": 1000,
|
||||
"unclip_ratio": 1.5,
|
||||
"min_size": 3
|
||||
})
|
||||
sorted_boxes = SortedBoxes()
|
||||
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
|
||||
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
|
||||
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
|
||||
dt_boxes = sorted_boxes(dt_boxes)
|
||||
get_rotate_crop_image = GetRotateCropImage()
|
||||
img_list = []
|
||||
max_wh_ratio = 0
|
||||
for i, dtbox in enumerate(dt_boxes):
|
||||
boximg = get_rotate_crop_image(im, dt_boxes[i])
|
||||
img_list.append(boximg)
|
||||
h, w = boximg.shape[0:2]
|
||||
wh_ratio = w * 1.0 / h
|
||||
max_wh_ratio = max(max_wh_ratio, wh_ratio)
|
||||
if len(img_list) == 0:
|
||||
return [], []
|
||||
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
|
||||
max_wh_ratio).shape
|
||||
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
|
||||
for id, img in enumerate(img_list):
|
||||
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
|
||||
imgs[id] = norm_img
|
||||
feed = {"image": imgs.copy()}
|
||||
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
|
||||
return feed, fetch
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
|
||||
res_lst = []
|
||||
for res in rec_res:
|
||||
res_lst.append(res[0])
|
||||
res = {"res": res_lst}
|
||||
return res
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_rec_model")
|
||||
ocr_service.init_det_debugger(det_model_config="ocr_det_model")
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
ocr_service.run_debugger_service(gpu=True)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
|
||||
ocr_service.run_debugger_service()
|
||||
ocr_service.run_web_service()
|
|
@ -1,37 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
import requests
|
||||
import json
|
||||
import cv2
|
||||
import base64
|
||||
import os, sys
|
||||
import time
|
||||
|
||||
def cv2_to_base64(image):
|
||||
#data = cv2.imencode('.jpg', image)[1]
|
||||
return base64.b64encode(image).decode(
|
||||
'utf8') #data.tostring()).decode('utf8')
|
||||
|
||||
headers = {"Content-type": "application/json"}
|
||||
url = "http://127.0.0.1:9292/ocr/prediction"
|
||||
test_img_dir = "../../doc/imgs/"
|
||||
for img_file in os.listdir(test_img_dir):
|
||||
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
|
||||
image_data1 = file.read()
|
||||
image = cv2_to_base64(image_data1)
|
||||
data = {"feed": [{"image": image}], "fetch": ["res"]}
|
||||
r = requests.post(url=url, headers=headers, data=json.dumps(data))
|
||||
print(r.json())
|
|
@ -1,105 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import OCRReader
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_det_client(self, det_port, det_client_config):
|
||||
self.det_preprocess = Sequential([
|
||||
ResizeByFactor(32, 960), Div(255),
|
||||
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
|
||||
(2, 0, 1))
|
||||
])
|
||||
self.det_client = Client()
|
||||
self.det_client.load_client_config(det_client_config)
|
||||
self.det_client.connect(["127.0.0.1:{}".format(det_port)])
|
||||
self.ocr_reader = OCRReader()
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
data = base64.b64decode(feed[0]["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
ori_h, ori_w, _ = im.shape
|
||||
det_img = self.det_preprocess(im)
|
||||
det_out = self.det_client.predict(
|
||||
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
|
||||
_, new_h, new_w = det_img.shape
|
||||
filter_func = FilterBoxes(10, 10)
|
||||
post_func = DBPostProcess({
|
||||
"thresh": 0.3,
|
||||
"box_thresh": 0.5,
|
||||
"max_candidates": 1000,
|
||||
"unclip_ratio": 1.5,
|
||||
"min_size": 3
|
||||
})
|
||||
sorted_boxes = SortedBoxes()
|
||||
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
|
||||
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
|
||||
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
|
||||
dt_boxes = sorted_boxes(dt_boxes)
|
||||
get_rotate_crop_image = GetRotateCropImage()
|
||||
feed_list = []
|
||||
img_list = []
|
||||
max_wh_ratio = 0
|
||||
for i, dtbox in enumerate(dt_boxes):
|
||||
boximg = get_rotate_crop_image(im, dt_boxes[i])
|
||||
img_list.append(boximg)
|
||||
h, w = boximg.shape[0:2]
|
||||
wh_ratio = w * 1.0 / h
|
||||
max_wh_ratio = max(max_wh_ratio, wh_ratio)
|
||||
for img in img_list:
|
||||
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
|
||||
feed = {"image": norm_img}
|
||||
feed_list.append(feed)
|
||||
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
|
||||
return feed_list, fetch
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
|
||||
res_lst = []
|
||||
for res in rec_res:
|
||||
res_lst.append(res[0])
|
||||
res = {"res": res_lst}
|
||||
return res
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_rec_model")
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.set_gpus("0")
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292)
|
||||
ocr_service.init_det_client(
|
||||
det_port=9293,
|
||||
det_client_config="ocr_det_client/serving_client_conf.prototxt")
|
||||
ocr_service.run_rpc_service()
|
||||
ocr_service.run_web_service()
|
|
@ -1,132 +0,0 @@
|
|||
# Paddle Serving 服务部署(Beta)
|
||||
|
||||
本教程将介绍基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)部署PaddleOCR在线预测服务的详细步骤。
|
||||
|
||||
## 快速启动服务
|
||||
|
||||
### 1. 准备环境
|
||||
我们先安装Paddle Serving相关组件
|
||||
我们推荐用户使用GPU来做Paddle Serving的OCR服务部署
|
||||
|
||||
**CUDA版本:9.0**
|
||||
|
||||
**CUDNN版本:7.0**
|
||||
|
||||
**操作系统版本:CentOS 6以上**
|
||||
|
||||
**Python3操作指南:**
|
||||
```
|
||||
#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线
|
||||
#GPU用户下载server包使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_server_gpu-0.3.2-py3-none-any.whl
|
||||
#CPU版本使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_server-0.3.2-py3-none-any.whl
|
||||
#客户端和App包使用以下链接(CPU,GPU通用)
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp36-none-any.whl
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py3-none-any.whl
|
||||
python -m pip install paddle_serving_app-0.1.2-py3-none-any.whl paddle_serving_client-0.3.2-cp36-none-any.whl
|
||||
```
|
||||
|
||||
**Python2操作指南:**
|
||||
```
|
||||
#以下提供beta版本的paddle serving whl包,欢迎试用,正式版会在8月中正式上线
|
||||
#GPU用户下载server包使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server_gpu-0.3.2-py2-none-any.whl
|
||||
python -m pip install paddle_serving_server_gpu-0.3.2-py2-none-any.whl
|
||||
#CPU版本使用这个链接
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_server-0.3.2-py2-none-any.whl
|
||||
python -m pip install paddle_serving_server-0.3.2-py2-none-any.whl
|
||||
|
||||
#客户端和App包使用以下链接(CPU,GPU通用)
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_app-0.1.2-py2-none-any.whl
|
||||
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/others/paddle_serving_client-0.3.2-cp27-none-any.whl
|
||||
python -m pip install paddle_serving_app-0.1.2-py2-none-any.whl paddle_serving_client-0.3.2-cp27-none-any.whl
|
||||
```
|
||||
|
||||
### 2. 模型转换
|
||||
可以使用`paddle_serving_app`提供的模型,执行下列命令
|
||||
```
|
||||
python -m paddle_serving_app.package --get_model ocr_rec
|
||||
tar -xzvf ocr_rec.tar.gz
|
||||
python -m paddle_serving_app.package --get_model ocr_det
|
||||
tar -xzvf ocr_det.tar.gz
|
||||
```
|
||||
执行上述命令会下载`db_crnn_mobile`的模型,如果想要下载规模更大的`db_crnn_server`模型,可以在下载预测模型并解压之后。参考[如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md)。
|
||||
|
||||
我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在:
|
||||
|
||||
```
|
||||
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
|
||||
tar xf ch_rec_r34_vd_crnn_infer.tar
|
||||
```
|
||||
因此我们按照Serving模型转换教程,运行下列python文件。
|
||||
```
|
||||
from paddle_serving_client.io import inference_model_to_serving
|
||||
inference_model_dir = "ch_rec_r34_vd_crnn"
|
||||
serving_client_dir = "serving_client_dir"
|
||||
serving_server_dir = "serving_server_dir"
|
||||
feed_var_names, fetch_var_names = inference_model_to_serving(
|
||||
inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
|
||||
```
|
||||
最终会在`serving_client_dir`和`serving_server_dir`生成客户端和服务端的模型配置。
|
||||
|
||||
### 3. 启动服务
|
||||
启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表:
|
||||
|
||||
|版本|特点|适用场景|
|
||||
|-|-|-|
|
||||
|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况|
|
||||
|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景|
|
||||
|
||||
#### 方式1. 启动标准版服务
|
||||
|
||||
```
|
||||
# cpu,gpu启动二选一,以下是cpu启动
|
||||
python -m paddle_serving_server.serve --model ocr_det_model --port 9293
|
||||
python ocr_web_server.py cpu
|
||||
# gpu启动
|
||||
python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
|
||||
python ocr_web_server.py gpu
|
||||
```
|
||||
|
||||
#### 方式2. 启动快速版服务
|
||||
|
||||
```
|
||||
# cpu,gpu启动二选一,以下是cpu启动
|
||||
python ocr_local_server.py cpu
|
||||
# gpu启动
|
||||
python ocr_local_server.py gpu
|
||||
```
|
||||
|
||||
## 发送预测请求
|
||||
|
||||
```
|
||||
python ocr_web_client.py
|
||||
```
|
||||
|
||||
## 返回结果格式说明
|
||||
|
||||
返回结果是json格式
|
||||
```
|
||||
{u'result': {u'res': [u'\u571f\u5730\u6574\u6cbb\u4e0e\u571f\u58e4\u4fee\u590d\u7814\u7a76\u4e2d\u5fc3', u'\u534e\u5357\u519c\u4e1a\u5927\u5b661\u7d20\u56fe']}}
|
||||
```
|
||||
我们也可以打印结果json串中`res`字段的每一句话
|
||||
```
|
||||
土地整治与土壤修复研究中心
|
||||
华南农业大学1素图
|
||||
```
|
||||
|
||||
## 自定义修改服务逻辑
|
||||
|
||||
在`ocr_web_server.py`或是`ocr_local_server.py`当中的`preprocess`函数里面做了检测服务和识别服务的前处理,`postprocess`函数里面做了识别的后处理服务,可以在相应的函数中做修改。调用了`paddle_serving_app`库提供的常见CV模型的前处理/后处理库。
|
||||
|
||||
如果想要单独启动Paddle Serving的检测服务和识别服务,参见下列表格, 执行对应的脚本即可,并且在命令行参数注明用的CPU或是GPU来提供服务。
|
||||
|
||||
| 模型 | 标准版 | 快速版 |
|
||||
| ---- | ----------------- | ------------------- |
|
||||
| 检测 | det_web_server.py | det_local_server.py |
|
||||
| 识别 | rec_web_server.py | rec_local_server.py |
|
||||
|
||||
更多信息参见[Paddle Serving](https://github.com/PaddlePaddle/Serving)
|
|
@ -1,79 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import OCRReader
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_rec(self):
|
||||
self.ocr_reader = OCRReader()
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
img_list = []
|
||||
for feed_data in feed:
|
||||
data = base64.b64decode(feed_data["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
img_list.append(im)
|
||||
max_wh_ratio = 0
|
||||
for i, boximg in enumerate(img_list):
|
||||
h, w = boximg.shape[0:2]
|
||||
wh_ratio = w * 1.0 / h
|
||||
max_wh_ratio = max(max_wh_ratio, wh_ratio)
|
||||
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
|
||||
max_wh_ratio).shape
|
||||
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
|
||||
for i, img in enumerate(img_list):
|
||||
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
|
||||
imgs[i] = norm_img
|
||||
feed = {"image": imgs.copy()}
|
||||
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
|
||||
return feed, fetch
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
|
||||
res_lst = []
|
||||
for res in rec_res:
|
||||
res_lst.append(res[0])
|
||||
res = {"res": res_lst}
|
||||
return res
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_rec_model")
|
||||
ocr_service.init_rec()
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.set_gpus("0")
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
ocr_service.run_debugger_service(gpu=True)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
|
||||
ocr_service.run_debugger_service()
|
||||
ocr_service.run_web_service()
|
|
@ -1,77 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import OCRReader
|
||||
import cv2
|
||||
import sys
|
||||
import numpy as np
|
||||
import os
|
||||
from paddle_serving_client import Client
|
||||
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
|
||||
from paddle_serving_app.reader import Div, Normalize, Transpose
|
||||
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
|
||||
if sys.argv[1] == 'gpu':
|
||||
from paddle_serving_server_gpu.web_service import WebService
|
||||
elif sys.argv[1] == 'cpu':
|
||||
from paddle_serving_server.web_service import WebService
|
||||
import time
|
||||
import re
|
||||
import base64
|
||||
|
||||
|
||||
class OCRService(WebService):
|
||||
def init_rec(self):
|
||||
self.ocr_reader = OCRReader()
|
||||
|
||||
def preprocess(self, feed=[], fetch=[]):
|
||||
# TODO: to handle batch rec images
|
||||
img_list = []
|
||||
for feed_data in feed:
|
||||
data = base64.b64decode(feed_data["image"].encode('utf8'))
|
||||
data = np.fromstring(data, np.uint8)
|
||||
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
|
||||
img_list.append(im)
|
||||
feed_list = []
|
||||
max_wh_ratio = 0
|
||||
for i, boximg in enumerate(img_list):
|
||||
h, w = boximg.shape[0:2]
|
||||
wh_ratio = w * 1.0 / h
|
||||
max_wh_ratio = max(max_wh_ratio, wh_ratio)
|
||||
for img in img_list:
|
||||
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
|
||||
feed = {"image": norm_img}
|
||||
feed_list.append(feed)
|
||||
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
|
||||
return feed_list, fetch
|
||||
|
||||
def postprocess(self, feed={}, fetch=[], fetch_map=None):
|
||||
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
|
||||
res_lst = []
|
||||
for res in rec_res:
|
||||
res_lst.append(res[0])
|
||||
res = {"res": res_lst}
|
||||
return res
|
||||
|
||||
|
||||
ocr_service = OCRService(name="ocr")
|
||||
ocr_service.load_model_config("ocr_rec_model")
|
||||
ocr_service.init_rec()
|
||||
if sys.argv[1] == 'gpu':
|
||||
ocr_service.set_gpus("0")
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
|
||||
elif sys.argv[1] == 'cpu':
|
||||
ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
|
||||
ocr_service.run_rpc_service()
|
||||
ocr_service.run_web_service()
|
|
@ -1,6 +1,6 @@
|
|||
<a name="算法介绍"></a>
|
||||
## 算法介绍
|
||||
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v1.1 系列模型下载](./models_list.md)。
|
||||
本文给出了PaddleOCR已支持的文本检测算法和文本识别算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。
|
||||
|
||||
- [1.文本检测算法](#文本检测算法)
|
||||
- [2.文本识别算法](#文本识别算法)
|
||||
|
@ -9,25 +9,25 @@
|
|||
### 1.文本检测算法
|
||||
|
||||
PaddleOCR开源的文本检测算法列表:
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))(ppocr推荐)
|
||||
- [x] DB([paper]( https://arxiv.org/abs/1911.08947) )(ppocr推荐)
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[下载链接](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](link)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](link)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](link)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](link))|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|EAST|ResNet50_vd|88.76%|81.36%|84.90%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
|
||||
|EAST|MobileNetV3|78.24%|79.15%|78.69%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)|
|
||||
|DB|ResNet50_vd|86.41%|78.72%|82.38%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||
|DB|MobileNetV3|77.29%|73.08%|75.12%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||
|SAST|ResNet50_vd|91.83%|81.80%|86.52%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|SAST|ResNet50_vd|89.05%|76.80%|82.47%|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)|
|
||||
|
||||
**说明:** SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:[百度云地址](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (提取码: 2bpi)
|
||||
|
||||
|
@ -38,9 +38,9 @@ PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训
|
|||
### 2.文本识别算法
|
||||
|
||||
PaddleOCR基于动态图开源的文本识别算法列表:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))(ppocr推荐)
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717) )(ppocr推荐)
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) coming soon
|
||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294)) coming soon
|
||||
|
||||
|
@ -48,12 +48,9 @@ PaddleOCR基于动态图开源的文本识别算法列表:
|
|||
|
||||
|模型|骨干网络|Avg Accuracy|模型存储命名|下载链接|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[下载链接](link)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[下载链接](link)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[下载链接](link)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[下载链接](link)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[下载链接](link)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](link)|
|
||||
|
||||
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|
||||
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
|
||||
|
|
|
@ -62,9 +62,9 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。
|
|||
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
|
||||
|
||||
```
|
||||
# GPU训练 支持单卡,多卡训练,通过selected_gpus指定卡号
|
||||
# GPU训练 支持单卡,多卡训练,通过 '--gpus' 指定卡号,如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||
# 启动训练,下面的命令已经写入train.sh文件中,只需修改文件里的配置文件路径即可
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- 数据增强
|
||||
|
@ -74,7 +74,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
|
|||
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。
|
||||
|
||||
训练过程中除随机数据增强外每种扰动方式以50%的概率被选择,具体代码实现请参考:
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
|
||||
|
||||
*由于OpenCV的兼容性问题,扰动操作暂时只支持linux*
|
||||
|
|
|
@ -76,8 +76,8 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
|
|||
# 单机单卡训练 mv3_db 模型
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||
# 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID;
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID;如果使用的paddle版本小于2.0rc1,请使用'--select_gpus'参数选择要使用的GPU
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
|
||||
```
|
||||
|
||||
|
@ -107,17 +107,13 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall
|
|||
|
||||
运行如下代码,根据配置文件`det_db_mv3.yml`中`save_res_path`指定的测试集检测结果文件,计算评估指标。
|
||||
|
||||
评估时设置后处理参数`box_thresh=0.6`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化
|
||||
```shell
|
||||
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
|
||||
```
|
||||
评估时设置后处理参数`box_thresh=0.5`,`unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化
|
||||
训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。
|
||||
|
||||
比如:
|
||||
```shell
|
||||
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
|
||||
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.5 PostProcess.unclip_ratio=1.5
|
||||
```
|
||||
|
||||
|
||||
* 注:`box_thresh`、`unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置
|
||||
|
||||
## 测试检测效果
|
||||
|
|
|
@ -22,9 +22,8 @@ inference 模型(`paddle.jit.save`保存的模型)
|
|||
- [三、文本识别模型推理](#文本识别模型推理)
|
||||
- [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
|
||||
- [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
|
||||
- [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
|
||||
- [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [5. 多语言模型的推理](#多语言模型的推理)
|
||||
- [3. 自定义文本识别字典的推理](#自定义文本识别字典的推理)
|
||||
- [4. 多语言模型的推理](#多语言模型的推理)
|
||||
|
||||
- [四、方向分类模型推理](#方向识别模型推理)
|
||||
- [1. 方向分类模型推理](#方向分类模型推理)
|
||||
|
@ -129,24 +128,32 @@ python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_mo
|
|||
超轻量中文检测模型推理,可以执行如下命令:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
|
||||
# 下载超轻量中文检测模型:
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./ch_ppocr_mobile_v2.0_det_infer/"
|
||||
```
|
||||
|
||||
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
|
||||
|
||||

|
||||

|
||||
|
||||
通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制限,`limit_type=max`为限制长边长度<`det_limit_side_len`,`limit_type=min`为限制短边长度>`det_limit_side_len`,
|
||||
图片不满足限制条件时(`limit_type=max`时长边长度>`det_limit_side_len`或`limit_type=min`时短边长度<`det_limit_side_len`),将对图片进行等比例缩放。
|
||||
该参数默认设置为`limit_type='max',det_max_side_len=960`。 如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以执行如下命令:
|
||||
通过参数`limit_type`和`det_limit_side_len`来对图片的尺寸进行限制,
|
||||
`litmit_type`可选参数为[`max`, `min`],
|
||||
`det_limit_size_len` 为正整数,一般设置为32 的倍数,比如960。
|
||||
|
||||
参数默认设置为`limit_type='max', det_limit_side_len=960`。表示网络输入图像的最长边不能超过960,
|
||||
如果超过这个值,会对图像做等宽比的resize操作,确保最长边为`det_limit_side_len`。
|
||||
设置为`limit_type='min', det_limit_side_len=960` 则表示限制图像的最短边为960。
|
||||
|
||||
如果输入图片的分辨率比较大,而且想使用更大的分辨率预测,可以设置det_limit_side_len 为想要的值,比如1216:
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1200
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216
|
||||
```
|
||||
|
||||
如果想使用CPU进行预测,执行命令如下
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
|
||||
```
|
||||
|
||||
<a name="DB文本检测模型推理"></a>
|
||||
|
@ -173,7 +180,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
|
|||
<a name="EAST文本检测模型推理"></a>
|
||||
### 3. EAST文本检测模型推理
|
||||
|
||||
首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址 (coming soon)](link) ),可以使用如下命令进行转换:
|
||||
首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar) ),可以使用如下命令进行转换:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_east
|
||||
|
@ -186,7 +193,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/img
|
|||
```
|
||||
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
**注意**:本代码库中,EAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。
|
||||
|
||||
|
@ -194,7 +201,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/img
|
|||
<a name="SAST文本检测模型推理"></a>
|
||||
### 4. SAST文本检测模型推理
|
||||
#### (1). 四边形文本检测模型(ICDAR2015)
|
||||
首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址(coming soon)](link)),可以使用如下命令进行转换:
|
||||
首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)),可以使用如下命令进行转换:
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_ic15
|
||||
|
||||
|
@ -205,10 +212,10 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img
|
|||
```
|
||||
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
#### (2). 弯曲文本检测模型(Total-Text)
|
||||
首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址(coming soon)](link)),可以使用如下命令进行转换:
|
||||
首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)),可以使用如下命令进行转换:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_tt
|
||||
|
@ -221,7 +228,7 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img
|
|||
```
|
||||
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
**注意**:本代码库中,SAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。
|
||||
|
||||
|
@ -268,16 +275,6 @@ CRNN 文本识别模型推理,可以执行如下命令:
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||
```
|
||||
|
||||
<a name="基于Attention损失的识别模型推理"></a>
|
||||
### 3. 基于Attention损失的识别模型推理
|
||||
|
||||
基于Attention损失的识别模型与ctc不同,需要额外设置识别算法参数 --rec_algorithm="RARE"
|
||||
RARE 文本识别模型推理,可以执行如下命令:
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
||||
|
||||
```
|
||||
|
||||

|
||||
|
||||
执行命令后,上面图像的识别结果如下:
|
||||
|
@ -297,7 +294,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
|||
dict_character = list(self.character_str)
|
||||
```
|
||||
|
||||
### 4. 自定义文本识别字典的推理
|
||||
### 3. 自定义文本识别字典的推理
|
||||
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
|
||||
|
||||
```
|
||||
|
@ -305,7 +302,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
|||
```
|
||||
|
||||
<a name="多语言模型的推理"></a>
|
||||
### 5. 多语言模型的推理
|
||||
### 4. 多语言模型的推理
|
||||
如果您需要预测的是其他语言模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,
|
||||
需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/` 路径下有默认提供的小语种字体,例如韩文识别:
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
|
||||
PaddleOCR 工作环境
|
||||
- PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0
|
||||
- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0rc1
|
||||
- python3.7
|
||||
- glibc 2.23
|
||||
- cuDNN 7.6+ (GPU)
|
||||
|
@ -35,11 +35,11 @@ sudo docker container exec -it ppocr /bin/bash
|
|||
pip3 install --upgrade pip
|
||||
|
||||
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
|
||||
python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle-gpu==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
如果您的机器是CPU,请运行以下命令安装
|
||||
|
||||
python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle==2.0.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
|
||||
```
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
## OCR模型列表(V2.0,2020年12月12日更新)
|
||||
**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
|
||||
|
||||
- [一、文本检测模型](#文本检测模型)
|
||||
- [二、文本识别模型](#文本识别模型)
|
||||
|
@ -21,7 +22,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |[推理模型 (coming soon)](link) / [slim模型 (coming soon)](link)|
|
||||
|ch_ppocr_mobile_slim_v2.0_det|slim裁剪版超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |推理模型 (coming soon) / slim模型 (coming soon)|
|
||||
|ch_ppocr_mobile_v2.0_det|原始超轻量模型,支持中英文、多语种文本检测|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|
||||
|ch_ppocr_server_v2.0_det|通用模型,支持中英文、多语种文本检测,比超轻量模型更大,但效果更好|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
|
||||
|
||||
|
@ -34,7 +35,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |[推理模型 (coming soon)](link) / [slim模型 (coming soon)](link) |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |推理模型 (coming soon) / slim模型 (coming soon) |
|
||||
|ch_ppocr_mobile_v2.0_rec|原始超轻量模型,支持中英文、数字识别|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|3.71M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
|ch_ppocr_server_v2.0_rec|通用模型,支持中英文、数字识别|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
|
@ -45,7 +46,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |[推理模型 (coming soon )](link) / [slim模型 (coming soon)](link) |
|
||||
|en_number_mobile_slim_v2.0_rec|slim裁剪量化版超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| | 推理模型 (coming soon) / slim模型 (coming soon) |
|
||||
|en_number_mobile_v2.0_rec|原始超轻量模型,支持英文、数字识别|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.56M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
|
||||
|
||||
<a name="多语言识别模型"></a>
|
||||
|
@ -64,11 +65,5 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
|
|||
|
||||
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |[推理模型 (coming soon)](link) / [训练模型](link) / [slim模型](link) |
|
||||
|ch_ppocr_mobile_slim_v2.0_cls|slim量化版模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |推理模型 (coming soon) / 训练模型 / slim模型 |
|
||||
|ch_ppocr_mobile_v2.0_cls|原始模型|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
|
||||
|
||||
|
||||
## OCR模型列表(V1.1,2020年9月22日更新)
|
||||
|
||||
[1.1系列模型地址](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/models_list.md)
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
- [字典](#字典)
|
||||
- [支持空格](#支持空格)
|
||||
|
||||
- [二、启动训练](#文本检测模型推理)
|
||||
- [二、启动训练](#启动训练)
|
||||
- [1. 数据增强](#数据增强)
|
||||
- [2. 训练](#训练)
|
||||
- [3. 小语种](#小语种)
|
||||
|
@ -167,7 +167,7 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
|
|||
|
||||
```
|
||||
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号
|
||||
# 训练icdar15英文数据 并将训练日志保存为 tain_rec.log
|
||||
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
```
|
||||
<a name="数据增强"></a>
|
||||
|
@ -200,11 +200,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
|
|||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
||||
|
||||
训练中文数据,推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml),如您希望尝试其他算法在中文数据集上的效果,请参考下列说明修改配置文件:
|
||||
|
||||
|
@ -356,8 +353,7 @@ python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkp
|
|||
|
||||
```
|
||||
infer_img: doc/imgs_words/en/word_1.png
|
||||
index: [19 24 18 23 29]
|
||||
word : joint
|
||||
result: ('joint', 0.9998967)
|
||||
```
|
||||
|
||||
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml` 完成了中文模型的训练,
|
||||
|
@ -376,6 +372,5 @@ python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v
|
|||
|
||||
```
|
||||
infer_img: doc/imgs_words/ch/word_1.jpg
|
||||
index: [2092 177 312 2503]
|
||||
word : 韩国小馆
|
||||
result: ('韩国小馆', 0.997218)
|
||||
```
|
||||
|
|
|
@ -1,7 +1,10 @@
|
|||
# 更新
|
||||
- 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
|
||||
- 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。
|
||||
- 2020.11.25 更新半自动标注工具[PPOCRLabel](../../PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
|
||||
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
|
||||
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](../../README_ch.md#PP-OCR)),适合在移动端部署使用。[模型下载](../../README_ch.md#模型下载)
|
||||
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](../../README_ch.md#模型下载)
|
||||
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见PP-OCR Pipline),适合在移动端部署使用。
|
||||
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。
|
||||
- 2020.9.17 更新[英文识别模型](./models_list.md#english-recognition-model)和[多语种识别模型](./models_list.md#english-recognition-model),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。
|
||||
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./FAQ.md)
|
||||
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)
|
||||
|
|
|
@ -1,49 +1,32 @@
|
|||
# 效果展示
|
||||
|
||||
<a name="通用ppocr_server_1.1效果展示"></a>
|
||||
## 通用ppocr_server_1.1效果展示
|
||||
<a name="超轻量ppocr_server_2.0效果展示"></a>
|
||||
## 通用ppocr_server_2.0 效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1101.jpg" width="800">
|
||||
<img src="../imgs_results/1102.jpg" width="800">
|
||||
<img src="../imgs_results/1103.jpg" width="800">
|
||||
<img src="../imgs_results/1104.jpg" width="800">
|
||||
<img src="../imgs_results/1105.jpg" width="800">
|
||||
<img src="../imgs_results/1106.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="英文识别模型效果展示"></a>
|
||||
## 英文识别模型效果展示
|
||||
<div align="center">
|
||||
<img src="../imgs_results/img_12.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="多语言识别模型效果展示"></a>
|
||||
## 多语言识别模型效果展示
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1110.jpg" width="800">
|
||||
<img src="../imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="超轻量ppocr_mobile_1.0效果展示"></a>
|
||||
## 超轻量ppocr_mobile_1.0效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1.jpg" width="800">
|
||||
<img src="../imgs_results/7.jpg" width="800">
|
||||
<img src="../imgs_results/6.jpg" width="800">
|
||||
<img src="../imgs_results/16.png" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="通用ppocr_server_1.0效果展示"></a>
|
||||
## 通用ppocr_server_1.0效果展示
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
|
||||
<img src="../imgs_results/french_0.jpg" width="800">
|
||||
<img src="../imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
pip安装
|
||||
```bash
|
||||
pip install paddleocr
|
||||
pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
|
||||
```
|
||||
|
||||
本地构建并安装
|
||||
|
@ -166,7 +166,7 @@ paddleocr -h
|
|||
|
||||
* 检测+分类+识别全流程
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true
|
||||
```
|
||||
结果是一个list,每个item包含了文本框,文字和识别置信度
|
||||
```bash
|
||||
|
@ -190,7 +190,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
|
|||
|
||||
* 分类+识别
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false
|
||||
```
|
||||
|
||||
结果是一个list,每个item只包含识别结果和识别置信度
|
||||
|
@ -222,7 +222,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
|
|||
|
||||
* 单独执行分类
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false --rec false
|
||||
```
|
||||
|
||||
结果是一个list,每个item只包含分类结果和分类置信度
|
||||
|
@ -258,7 +258,7 @@ im_show.save('result.jpg')
|
|||
### 通过命令行使用
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
|
||||
```
|
||||
|
||||
### 使用网络图片或者numpy数组作为输入
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
<a name="Algorithm_introduction"></a>
|
||||
## Algorithm introduction
|
||||
|
||||
This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v1.1 models list](./models_list_en.md).
|
||||
This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
|
||||
|
||||
|
||||
- [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM)
|
||||
|
@ -13,27 +13,27 @@ This tutorial lists the text detection algorithms and text recognition algorithm
|
|||
PaddleOCR open source text detection algorithms list:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498) )(Baidu Self-Research)
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](link)|
|
||||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](link)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](link)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](link)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|EAST|ResNet50_vd|88.76%|81.36%|84.90%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
|
||||
|EAST|MobileNetV3|78.24%|79.15%|78.69%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)|
|
||||
|DB|ResNet50_vd|86.41%|78.72%|82.38%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||
|DB|MobileNetV3|77.29%|73.08%|75.12%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||
|SAST|ResNet50_vd|91.83%|81.80%|86.52%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](link)|
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
|SAST|ResNet50_vd|89.05%|76.80%|82.47%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)|
|
||||
|
||||
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
|
||||
|
||||
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
|
||||
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md)
|
||||
|
||||
<a name="TEXTRECOGNITIONALGORITHM"></a>
|
||||
### 2. Text Recognition Algorithm
|
||||
|
@ -41,20 +41,17 @@ For the training guide and use of PaddleOCR text detection algorithms, please re
|
|||
PaddleOCR open-source text recognition algorithms list:
|
||||
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
|
||||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [ ] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) coming soon
|
||||
- [ ] RARE([paper](https://arxiv.org/abs/1603.03915v1)) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research) coming soon
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294) )(Baidu Self-Research) coming soon
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|
||||
|-|-|-|-|-|
|
||||
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|
||||
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|
||||
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|
||||
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|
||||
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|
||||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|
||||
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|
||||
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|
||||
|
||||
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
|
||||
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
|
||||
|
|
|
@ -65,9 +65,9 @@ Start training:
|
|||
```
|
||||
# Set PYTHONPATH path
|
||||
export PYTHONPATH=$PYTHONPATH:.
|
||||
# GPU training Support single card and multi-card training, specify the card number through selected_gpus
|
||||
# GPU training Support single card and multi-card training, specify the card number through --gpus. If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py -c configs/cls/cls_mv3.yml
|
||||
```
|
||||
|
||||
- Data Augmentation
|
||||
|
@ -77,7 +77,7 @@ PaddleOCR provides a variety of data augmentation methods. If you want to add di
|
|||
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
|
||||
|
||||
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
|
||||
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
|
||||
|
||||
|
||||
|
|
|
@ -76,8 +76,10 @@ You can also use `-o` to change the training parameters without modifying the ym
|
|||
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
|
||||
# multi-GPU training
|
||||
# Set the GPU ID used by the '--select_gpus' parameter;
|
||||
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
# Set the GPU ID used by the '--gpus' parameter; If your paddle version is less than 2.0rc1, please use '--selected_gpus'
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
|
||||
|
||||
|
||||
```
|
||||
|
||||
#### load trained model and continue training
|
||||
|
@ -99,15 +101,11 @@ Run the following code to calculate the evaluation indicators. The result will b
|
|||
|
||||
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
|
||||
|
||||
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
|
||||
```shell
|
||||
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
|
||||
```
|
||||
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
|
||||
|
||||
Such as:
|
||||
```shell
|
||||
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
|
||||
```
|
||||
|
||||
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
|
||||
|
||||
|
|
|
@ -25,9 +25,8 @@ Next, we first introduce how to convert a trained model into an inference model,
|
|||
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
|
||||
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
|
||||
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
|
||||
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
|
||||
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
- [5. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
- [3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
|
||||
- [4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
|
||||
|
||||
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
|
||||
|
@ -135,24 +134,33 @@ Because EAST and DB algorithms are very different, when inference, it is necessa
|
|||
For lightweight Chinese detection model inference, you can execute the following commands:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
|
||||
# download DB text detection inference model
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
|
||||
# predict
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/"
|
||||
```
|
||||
|
||||
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
|
||||
|
||||

|
||||

|
||||
|
||||
The size of the image is limited by the parameters `limit_type` and `det_limit_side_len`, `limit_type=max` is to limit the length of the long side <`det_limit_side_len`, and `limit_type=min` is to limit the length of the short side>`det_limit_side_len`,
|
||||
When the picture does not meet the restriction conditions (for `limit_type=max`and long side >`det_limit_side_len` or for `min` and short side <`det_limit_side_len`), the image will be scaled proportionally.
|
||||
This parameter is set to `limit_type='max', det_max_side_len=960` by default. If the resolution of the input picture is relatively large, and you want to use a larger resolution prediction, you can execute the following command:
|
||||
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
|
||||
The optional parameters of `litmit_type` are [`max`, `min`], and
|
||||
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
|
||||
|
||||
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
|
||||
If this value is exceeded, the image will be resized with the same width ratio to ensure that the longest side is `det_limit_side_len`.
|
||||
Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest side of the image is limited to 960.
|
||||
|
||||
If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1200
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216
|
||||
```
|
||||
|
||||
If you want to use the CPU for prediction, execute the command as follows
|
||||
```
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
|
||||
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/22.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
|
||||
```
|
||||
|
||||
<a name="DB_DETECTION"></a>
|
||||
|
@ -179,7 +187,7 @@ The visualized text detection results are saved to the `./inference_results` fol
|
|||
<a name="EAST_DETECTION"></a>
|
||||
### 3. EAST TEXT DETECTION MODEL INFERENCE
|
||||
|
||||
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link (coming soon)](link)), you can use the following command to convert:
|
||||
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_east
|
||||
|
@ -192,7 +200,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
|
|||
|
||||
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
**Note**: EAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
|
||||
|
||||
|
@ -200,7 +208,7 @@ The visualized text detection results are saved to the `./inference_results` fol
|
|||
<a name="SAST_DETECTION"></a>
|
||||
### 4. SAST TEXT DETECTION MODEL INFERENCE
|
||||
#### (1). Quadrangle text detection model (ICDAR2015)
|
||||
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link (coming soon)](link)), you can use the following command to convert:
|
||||
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)), you can use the following command to convert:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_ic15
|
||||
|
@ -214,10 +222,10 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img
|
|||
|
||||
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
#### (2). Curved text detection model (Total-Text)
|
||||
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link (coming soon)](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)), you can use the following command to convert:
|
||||
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)), you can use the following command to convert:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_tt
|
||||
|
@ -231,7 +239,7 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img
|
|||
|
||||
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
|
||||
|
||||
(coming soon)
|
||||

|
||||
|
||||
**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
|
||||
|
||||
|
@ -275,15 +283,6 @@ For CRNN text recognition model inference, execute the following commands:
|
|||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
|
||||
```
|
||||
|
||||
<a name="ATTENTION-BASED_RECOGNITION"></a>
|
||||
### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
|
||||
|
||||
The recognition model based on Attention loss is different from ctc, and additional recognition algorithm parameters need to be set --rec_algorithm="RARE"
|
||||
After executing the command, the recognition result of the above image is as follows:
|
||||
```bash
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rare/" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_algorithm="RARE"
|
||||
```
|
||||
|
||||

|
||||
|
||||
After executing the command, the recognition result of the above image is as follows:
|
||||
|
@ -303,7 +302,7 @@ dict_character = list(self.character_str)
|
|||
```
|
||||
|
||||
<a name="USING_CUSTOM_CHARACTERS"></a>
|
||||
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
|
||||
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
|
||||
|
||||
```
|
||||
|
@ -311,7 +310,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
|
|||
```
|
||||
|
||||
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
|
||||
### 5. MULTILINGAUL MODEL INFERENCE
|
||||
### 4. MULTILINGAUL MODEL INFERENCE
|
||||
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
|
||||
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
|
||||
|
||||
PaddleOCR working environment:
|
||||
- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0
|
||||
- PaddlePaddle 1.8+, Recommend PaddlePaddle 2.0rc1
|
||||
- python3.7
|
||||
- glibc 2.23
|
||||
|
||||
|
@ -38,10 +38,10 @@ sudo docker container exec -it ppocr /bin/bash
|
|||
pip3 install --upgrade pip
|
||||
|
||||
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
|
||||
python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle-gpu==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
# If you only have cpu on your machine, please run the following command to install
|
||||
python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple
|
||||
python3 -m pip install paddlepaddle==2.0rc1 -i https://mirror.baidu.com/pypi/simple
|
||||
```
|
||||
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
|
||||
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
## OCR model list(V1.1, updated on 2020.12.12)
|
||||
## OCR model list(V2.0, updated on 2020.12.12)
|
||||
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
|
||||
|
||||
- [1. Text Detection Model](#Detection)
|
||||
- [2. Text Recognition Model](#Recognition)
|
||||
|
@ -20,7 +21,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |[inference model (coming soon)](link) / [slim model (coming soon)](link)|
|
||||
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)| |inference model (coming soon) / slim model (coming soon)|
|
||||
|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|
||||
|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
|
||||
|
||||
|
@ -32,7 +33,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |[inference model (coming soon)](link) / [slim model (coming soon)](link) |
|
||||
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| |inference model (coming soon) / slim model (coming soon) |
|
||||
|ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|3.71M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|
||||
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|
||||
|
||||
|
@ -44,7 +45,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |[inference model (coming soon )](link) / [slim model (coming soon)](link) |
|
||||
|en_number_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)| |inference model (coming soon ) / slim model (coming soon) |
|
||||
|en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.56M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
|
||||
|
||||
<a name="Multilingual"></a>
|
||||
|
@ -62,10 +63,6 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|
|||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |[inference model (coming soon)](link) / [trained model](link) / [slim model](link) |
|
||||
|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| |inference model (coming soon) / trained model / slim model|
|
||||
|ch_ppocr_mobile_v2.0_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
|
||||
|
||||
|
||||
## OCR model list (V1.1,updated on 2020.9.22)
|
||||
|
||||
[1.1 series model address](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/models_list.md)
|
||||
|
|
|
@ -162,7 +162,7 @@ Start training:
|
|||
|
||||
```
|
||||
# GPU training Support single card and multi-card training, specify the card number through --gpus
|
||||
# Training icdar15 English data and saving the log as train_rec.log
|
||||
# Training icdar15 English data and The training log will be automatically saved as train.log under "{save_model_dir}"
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
```
|
||||
<a name="Data_Augmentation"></a>
|
||||
|
@ -193,11 +193,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
|
|||
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
|
||||
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
|
||||
| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc |
|
||||
| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention |
|
||||
| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc |
|
||||
| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc |
|
||||
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
|
||||
|
||||
For training Chinese data, it is recommended to use
|
||||
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
|
||||
|
@ -350,8 +347,7 @@ Get the prediction result of the input image:
|
|||
|
||||
```
|
||||
infer_img: doc/imgs_words/en/word_1.png
|
||||
index: [19 24 18 23 29]
|
||||
word : joint
|
||||
result: ('joint', 0.9998967)
|
||||
```
|
||||
|
||||
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml`, you can use the following command to predict the Chinese model:
|
||||
|
@ -369,6 +365,5 @@ Get the prediction result of the input image:
|
|||
|
||||
```
|
||||
infer_img: doc/imgs_words/ch/word_1.jpg
|
||||
index: [2092 177 312 2503]
|
||||
word : 韩国小馆
|
||||
result: ('韩国小馆', 0.997218)
|
||||
```
|
||||
|
|
|
@ -1,8 +1,9 @@
|
|||
# RECENT UPDATES
|
||||
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
|
||||
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
|
||||
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
|
||||
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](../../README.md#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](../../README.md#Supported-Chinese-model-list)
|
||||
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](../../README.md#Supported-Chinese-model-list)
|
||||
- 2020.9.17 update [English recognition model](./models_list_en.md#english-recognition-model) and [Multilingual recognition model](./models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
|
||||
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M, suitable for mobile deployment.
|
||||
- 2020.9.17 update English recognition model and Multilingual recognition model, `English`, `Chinese`, `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
|
||||
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
|
||||
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
|
|
|
@ -1,49 +1,34 @@
|
|||
# Visualization
|
||||
|
||||
<a name="ppocr_server_1.1"></a>
|
||||
## ch_ppocr_server_1.1
|
||||
|
||||
<a name="ppocr_server_2.0"></a>
|
||||
## ch_ppocr_server_2.0
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1101.jpg" width="800">
|
||||
<img src="../imgs_results/1102.jpg" width="800">
|
||||
<img src="../imgs_results/1103.jpg" width="800">
|
||||
<img src="../imgs_results/1104.jpg" width="800">
|
||||
<img src="../imgs_results/1105.jpg" width="800">
|
||||
<img src="../imgs_results/1106.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00009282.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="en_ppocr_mobile_1.1"></a>
|
||||
## en_ppocr_mobile_1.1
|
||||
|
||||
<a name="en_ppocr_mobile_2.0"></a>
|
||||
## en_ppocr_mobile_2.0
|
||||
<div align="center">
|
||||
<img src="../imgs_results/img_12.jpg" width="800">
|
||||
<img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="multilingual"></a>
|
||||
## (multilingual)_ppocr_mobile_1.1
|
||||
## (multilingual)_ppocr_mobile_2.0
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1110.jpg" width="800">
|
||||
<img src="../imgs_results/1112.jpg" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="ppocr_mobile_1.0"></a>
|
||||
## ppocr_mobile_1.0
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/1.jpg" width="800">
|
||||
<img src="../imgs_results/7.jpg" width="800">
|
||||
<img src="../imgs_results/6.jpg" width="800">
|
||||
<img src="../imgs_results/16.png" width="800">
|
||||
</div>
|
||||
|
||||
|
||||
<a name="ppocr_server_1.0"></a>
|
||||
## ppocr_server_1.0
|
||||
|
||||
<div align="center">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
|
||||
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
|
||||
<img src="../imgs_results/french_0.jpg" width="800">
|
||||
<img src="../imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
### install package
|
||||
install by pypi
|
||||
```bash
|
||||
pip install paddleocr
|
||||
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
|
||||
```
|
||||
|
||||
build own whl package and install
|
||||
|
@ -172,7 +172,7 @@ paddleocr -h
|
|||
|
||||
* detection classification and recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true -cls true --lang en
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains bounding box, text and recognition confidence
|
||||
|
@ -198,7 +198,7 @@ Output will be a list, each item contains bounding box, text and recognition con
|
|||
|
||||
* classification and recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --lang en
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains text and recognition confidence
|
||||
|
@ -221,7 +221,7 @@ Output will be a list, each item only contains bounding box
|
|||
|
||||
* only recognition
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --cls false --lang en
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en
|
||||
```
|
||||
|
||||
Output will be a list, each item contains text and recognition confidence
|
||||
|
@ -231,7 +231,7 @@ Output will be a list, each item contains text and recognition confidence
|
|||
|
||||
* only classification
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true -cls true --det false --rec false
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false
|
||||
```
|
||||
|
||||
Output will be a list, each item contains classification result and confidence
|
||||
|
@ -268,7 +268,7 @@ im_show.save('result.jpg')
|
|||
### Use by command line
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
|
||||
```
|
||||
|
||||
### Use web images or numpy array as input
|
||||
|
|
Before ![]() (image error) Size: 983 KiB After ![]() (image error) Size: 39 KiB ![]() ![]() |
Before ![]() (image error) Size: 129 KiB |
Before ![]() (image error) Size: 94 KiB |
Before ![]() (image error) Size: 236 KiB |
Before ![]() (image error) Size: 82 KiB |
Before ![]() (image error) Size: 147 KiB |
Before ![]() (image error) Size: 124 KiB |
Before ![]() (image error) Size: 164 KiB |
Before ![]() (image error) Size: 137 KiB |
Before ![]() (image error) Size: 284 KiB |
Before ![]() (image error) Size: 244 KiB |
Before ![]() (image error) Size: 146 KiB |