commit
757501995a
|
@ -1,4 +1,12 @@
|
|||
## 文字角度分类
|
||||
### 方法介绍
|
||||
文字角度分类主要用于图片非0度的场景下,在这种场景下需要对图片里检测到的文本行进行一个转正的操作。在PaddleOCR系统内,
|
||||
文字检测之后得到的文本行图片经过仿射变换之后送入识别模型,此时只需要对文字进行一个0和180度的角度分类,因此PaddleOCR内置的
|
||||
文字角度分类器**只支持了0和180度的分类**。如果想支持更多角度,可以自己修改算法进行支持。
|
||||
|
||||
0和180度数据样本例子:
|
||||
|
||||
![](../imgs_results/angle_class_example.jpg)
|
||||
|
||||
### 数据准备
|
||||
|
||||
|
@ -13,7 +21,7 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
|
|||
请参考下文组织您的数据。
|
||||
- 训练集
|
||||
|
||||
首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(cls_gt_train.txt)记录图片路径和标签。
|
||||
首先建议将训练图片放入同一个文件夹,并用一个txt文件(cls_gt_train.txt)记录图片路径和标签。
|
||||
|
||||
**注意:** 默认请将图片路径和图片标签用 `\t` 分割,如用其他方式分割将造成训练报错
|
||||
|
||||
|
@ -21,8 +29,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
|
|||
|
||||
```
|
||||
" 图像文件名 图像标注信息 "
|
||||
train/word_001.jpg 0
|
||||
train/word_002.jpg 180
|
||||
train/cls/train/word_001.jpg 0
|
||||
train/cls/train/word_002.jpg 180
|
||||
```
|
||||
|
||||
最终训练集应有如下文件结构:
|
||||
|
|
|
@ -1,61 +1,94 @@
|
|||
## 文字识别
|
||||
|
||||
|
||||
- [一、数据准备](#数据准备)
|
||||
- [数据下载](#数据下载)
|
||||
- [自定义数据集](#自定义数据集)
|
||||
- [字典](#字典)
|
||||
- [支持空格](#支持空格)
|
||||
- [1 数据准备](#数据准备)
|
||||
- [1.1 自定义数据集](#自定义数据集)
|
||||
- [1.2 数据下载](#数据下载)
|
||||
- [1.3 字典](#字典)
|
||||
- [1.4 支持空格](#支持空格)
|
||||
|
||||
- [二、启动训练](#启动训练)
|
||||
- [1. 数据增强](#数据增强)
|
||||
- [2. 训练](#训练)
|
||||
- [3. 小语种](#小语种)
|
||||
- [2 启动训练](#启动训练)
|
||||
- [2.1 数据增强](#数据增强)
|
||||
- [2.2 训练](#训练)
|
||||
- [2.3 小语种](#小语种)
|
||||
|
||||
- [三、评估](#评估)
|
||||
- [3 评估](#评估)
|
||||
|
||||
- [四、预测](#预测)
|
||||
- [1. 训练引擎预测](#训练引擎预测)
|
||||
- [4 预测](#预测)
|
||||
- [4.1 训练引擎预测](#训练引擎预测)
|
||||
|
||||
|
||||
<a name="数据准备"></a>
|
||||
### 数据准备
|
||||
### 1. 数据准备
|
||||
|
||||
|
||||
PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算法; `通用数据` 训练自己的数据:
|
||||
|
||||
请按如下步骤设置数据集:
|
||||
PaddleOCR 支持两种数据格式:
|
||||
- `lmdb` 用于训练以lmdb格式存储的数据集;
|
||||
- `通用数据` 用于训练以文本文件存储的数据集:
|
||||
|
||||
训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
|
||||
|
||||
```
|
||||
# linux and mac os
|
||||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
|
||||
# windows
|
||||
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
|
||||
```
|
||||
|
||||
<a name="数据下载"></a>
|
||||
* 数据下载
|
||||
<a name="准备数据集"></a>
|
||||
#### 1.1 自定义数据集
|
||||
下面以通用数据集为例, 介绍如何准备数据集:
|
||||
|
||||
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),下载 benchmark 所需的lmdb格式数据集。
|
||||
如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
|
||||
* 训练集
|
||||
|
||||
<a name="自定义数据集"></a>
|
||||
* 使用自己数据集
|
||||
建议将训练图片放入同一个文件夹,并用一个txt文件(rec_gt_train.txt)记录图片路径和标签,txt文件里的内容如下:
|
||||
|
||||
若您希望使用自己的数据进行训练,请参考下文组织您的数据。
|
||||
|
||||
- 训练集
|
||||
|
||||
首先请将训练图片放入同一个文件夹(train_images),并用一个txt文件(rec_gt_train.txt)记录图片路径和标签。
|
||||
|
||||
**注意:** 默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错
|
||||
**注意:** txt文件中默认请将图片路径和图片标签用 \t 分割,如用其他方式分割将造成训练报错。
|
||||
|
||||
```
|
||||
" 图像文件名 图像标注信息 "
|
||||
|
||||
train_data/train_0001.jpg 简单可依赖
|
||||
train_data/train_0002.jpg 用科技让复杂的世界更简单
|
||||
train_data/rec/train/word_001.jpg 简单可依赖
|
||||
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
|
||||
...
|
||||
```
|
||||
PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载:
|
||||
|
||||
最终训练集应有如下文件结构:
|
||||
```
|
||||
|-train_data
|
||||
|-rec
|
||||
|- rec_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- 测试集
|
||||
|
||||
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-rec
|
||||
|- rec_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
<a name="数据下载"></a>
|
||||
|
||||
1.2 数据下载
|
||||
|
||||
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,下载 benchmark 所需的lmdb格式数据集。
|
||||
|
||||
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载:
|
||||
|
||||
如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
|
||||
|
||||
```
|
||||
# 训练集标签
|
||||
|
@ -71,34 +104,8 @@ PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支
|
|||
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
|
||||
```
|
||||
|
||||
最终训练集应有如下文件结构:
|
||||
```
|
||||
|-train_data
|
||||
|-ic15_data
|
||||
|- rec_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- 测试集
|
||||
|
||||
同训练集类似,测试集也需要提供一个包含所有图片的文件夹(test)和一个rec_gt_test.txt,测试集的结构如下所示:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-ic15_data
|
||||
|- rec_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
<a name="字典"></a>
|
||||
- 字典
|
||||
1.3 字典
|
||||
|
||||
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
|
||||
|
||||
|
@ -115,6 +122,10 @@ n
|
|||
|
||||
word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1]
|
||||
|
||||
* 内置字典
|
||||
|
||||
PaddleOCR内置了一部分字典,可以按需使用。
|
||||
|
||||
`ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典
|
||||
|
||||
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典
|
||||
|
@ -130,7 +141,7 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,
|
|||
`ppocr/utils/dict/en_dict.txt` 是一个包含63个字符的英文字典
|
||||
|
||||
|
||||
您可以按需使用。
|
||||
|
||||
|
||||
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**,
|
||||
如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict),我们会在Repo中感谢您。
|
||||
|
@ -141,13 +152,13 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,
|
|||
并将 `character_type` 设置为 `ch`。
|
||||
|
||||
<a name="支持空格"></a>
|
||||
- 添加空格类别
|
||||
1.4 添加空格类别
|
||||
|
||||
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`。
|
||||
|
||||
|
||||
<a name="启动训练"></a>
|
||||
### 启动训练
|
||||
### 2. 启动训练
|
||||
|
||||
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
|
||||
|
||||
|
@ -172,7 +183,7 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
|
|||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
```
|
||||
<a name="数据增强"></a>
|
||||
- 数据增强
|
||||
#### 2.1 数据增强
|
||||
|
||||
PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`。
|
||||
|
||||
|
@ -183,7 +194,7 @@ PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入
|
|||
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
|
||||
|
||||
<a name="训练"></a>
|
||||
- 训练
|
||||
#### 2.2 训练
|
||||
|
||||
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` 。
|
||||
|
||||
|
@ -272,7 +283,7 @@ Eval:
|
|||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
<a name="小语种"></a>
|
||||
- 小语种
|
||||
#### 2.3 小语种
|
||||
|
||||
PaddleOCR目前已支持26种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
|
||||
|
||||
|
@ -415,7 +426,7 @@ Eval:
|
|||
...
|
||||
```
|
||||
<a name="评估"></a>
|
||||
### 评估
|
||||
### 3 评估
|
||||
|
||||
评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
|
||||
|
||||
|
@ -425,10 +436,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
|
|||
```
|
||||
|
||||
<a name="预测"></a>
|
||||
### 预测
|
||||
### 4 预测
|
||||
|
||||
<a name="训练引擎预测"></a>
|
||||
* 训练引擎的预测
|
||||
#### 4.1 训练引擎的预测
|
||||
|
||||
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
|
||||
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# paddleocr package使用说明
|
||||
|
||||
## 快速上手
|
||||
## 1 快速上手
|
||||
|
||||
### 安装whl包
|
||||
### 1.1 安装whl包
|
||||
|
||||
pip安装
|
||||
```bash
|
||||
|
@ -14,9 +14,12 @@ pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
|
|||
python3 setup.py bdist_wheel
|
||||
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号
|
||||
```
|
||||
### 1. 代码使用
|
||||
|
||||
* 检测+分类+识别全流程
|
||||
## 2 使用
|
||||
### 2.1 代码使用
|
||||
paddleocr whl包会自动下载ppocr轻量级模型作为默认模型,可以根据第3节**自定义模型**进行自定义更换。
|
||||
|
||||
* 检测+方向分类器+识别全流程
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
|
||||
|
@ -84,7 +87,7 @@ im_show.save('result.jpg')
|
|||
</div>
|
||||
|
||||
|
||||
* 分类+识别
|
||||
* 方向分类器+识别
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
|
||||
|
@ -143,7 +146,7 @@ for line in result:
|
|||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行分类
|
||||
* 单独执行方向分类器
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
|
||||
|
@ -157,14 +160,14 @@ for line in result:
|
|||
['0', 0.9999924]
|
||||
```
|
||||
|
||||
### 通过命令行使用
|
||||
### 2.2 通过命令行使用
|
||||
|
||||
查看帮助信息
|
||||
```bash
|
||||
paddleocr -h
|
||||
```
|
||||
|
||||
* 检测+分类+识别全流程
|
||||
* 检测+方向分类器+识别全流程
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true
|
||||
```
|
||||
|
@ -188,7 +191,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
|
|||
......
|
||||
```
|
||||
|
||||
* 分类+识别
|
||||
* 方向分类器+识别
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false
|
||||
```
|
||||
|
@ -220,7 +223,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
|
|||
['韩国小馆', 0.9907421]
|
||||
```
|
||||
|
||||
* 单独执行分类
|
||||
* 单独执行方向分类器
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --det false --rec false
|
||||
```
|
||||
|
@ -230,11 +233,11 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls tru
|
|||
['0', 0.9999924]
|
||||
```
|
||||
|
||||
## 自定义模型
|
||||
## 3 自定义模型
|
||||
当内置模型无法满足需求时,需要使用到自己训练的模型。
|
||||
首先,参照[inference.md](./inference.md) 第一节转换将检测、分类和识别模型转换为inference模型,然后按照如下方式使用
|
||||
|
||||
### 代码使用
|
||||
### 3.1 代码使用
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
# 模型路径下必须含有model和params文件
|
||||
|
@ -255,17 +258,17 @@ im_show = Image.fromarray(im_show)
|
|||
im_show.save('result.jpg')
|
||||
```
|
||||
|
||||
### 通过命令行使用
|
||||
### 3.2 通过命令行使用
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
|
||||
```
|
||||
|
||||
### 使用网络图片或者numpy数组作为输入
|
||||
## 4 使用网络图片或者numpy数组作为输入
|
||||
|
||||
1. 网络图片
|
||||
### 4.1 网络图片
|
||||
|
||||
代码使用
|
||||
- 代码使用
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
|
||||
|
@ -286,12 +289,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
|
|||
im_show = Image.fromarray(im_show)
|
||||
im_show.save('result.jpg')
|
||||
```
|
||||
命令行模式
|
||||
- 命令行模式
|
||||
```bash
|
||||
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
|
||||
```
|
||||
|
||||
2. numpy数组
|
||||
### 4.2 numpy数组
|
||||
仅通过代码使用时支持numpy数组作为输入
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
|
@ -301,7 +304,7 @@ ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to downlo
|
|||
img_path = 'PaddleOCR/doc/imgs/11.jpg'
|
||||
img = cv2.imread(img_path)
|
||||
# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), 如果你自己训练的模型支持灰度图,可以将这句话的注释取消
|
||||
result = ocr.ocr(img_path, cls=True)
|
||||
result = ocr.ocr(img, cls=True)
|
||||
for line in result:
|
||||
print(line)
|
||||
|
||||
|
@ -316,7 +319,7 @@ im_show = Image.fromarray(im_show)
|
|||
im_show.save('result.jpg')
|
||||
```
|
||||
|
||||
## 参数说明
|
||||
## 5 参数说明
|
||||
|
||||
| 字段 | 说明 | 默认值 |
|
||||
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
|
||||
|
|
|
@ -1,5 +1,12 @@
|
|||
## TEXT ANGLE CLASSIFICATION
|
||||
|
||||
### Method introduction
|
||||
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
|
||||
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
|
||||
|
||||
Example of 0 and 180 degree data samples:
|
||||
|
||||
![](../imgs_results/angle_class_example.jpg)
|
||||
### DATA PREPARATION
|
||||
|
||||
Please organize the dataset as follows:
|
||||
|
|
|
@ -1,59 +1,95 @@
|
|||
## TEXT RECOGNITION
|
||||
|
||||
- [DATA PREPARATION](#DATA_PREPARATION)
|
||||
- [Dataset Download](#Dataset_download)
|
||||
- [Costom Dataset](#Costom_Dataset)
|
||||
- [Dictionary](#Dictionary)
|
||||
- [Add Space Category](#Add_space_category)
|
||||
- [1 DATA PREPARATION](#DATA_PREPARATION)
|
||||
- [1.1 Costom Dataset](#Costom_Dataset)
|
||||
- [1.2 Dataset Download](#Dataset_download)
|
||||
- [1.3 Dictionary](#Dictionary)
|
||||
- [1.4 Add Space Category](#Add_space_category)
|
||||
|
||||
- [TRAINING](#TRAINING)
|
||||
- [Data Augmentation](#Data_Augmentation)
|
||||
- [Training](#Training)
|
||||
- [Multi-language](#Multi_language)
|
||||
- [2 TRAINING](#TRAINING)
|
||||
- [2.1 Data Augmentation](#Data_Augmentation)
|
||||
- [2.2 Training](#Training)
|
||||
- [2.3 Multi-language](#Multi_language)
|
||||
|
||||
- [EVALUATION](#EVALUATION)
|
||||
- [3 EVALUATION](#EVALUATION)
|
||||
|
||||
- [PREDICTION](#PREDICTION)
|
||||
- [Training engine prediction](#Training_engine_prediction)
|
||||
- [4 PREDICTION](#PREDICTION)
|
||||
- [4.1 Training engine prediction](#Training_engine_prediction)
|
||||
|
||||
<a name="DATA_PREPARATION"></a>
|
||||
### DATA PREPARATION
|
||||
|
||||
|
||||
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
|
||||
PaddleOCR supports two data formats:
|
||||
- `LMDB` is used to train data sets stored in lmdb format;
|
||||
- `general data` is used to train data sets stored in text files:
|
||||
|
||||
Please organize the dataset as follows:
|
||||
|
||||
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
|
||||
|
||||
```
|
||||
# linux and mac os
|
||||
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
|
||||
# windows
|
||||
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
|
||||
```
|
||||
|
||||
<a name="Dataset_download"></a>
|
||||
* Dataset download
|
||||
|
||||
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
|
||||
|
||||
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
|
||||
|
||||
<a name="Costom_Dataset"></a>
|
||||
* Use your own dataset:
|
||||
#### 1.1 Costom dataset
|
||||
|
||||
If you want to use your own data for training, please refer to the following to organize your data.
|
||||
|
||||
- Training set
|
||||
|
||||
First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
|
||||
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
|
||||
|
||||
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
|
||||
|
||||
```
|
||||
" Image file name Image annotation "
|
||||
|
||||
train_data/train_0001.jpg 简单可依赖
|
||||
train_data/train_0002.jpg 用科技让复杂的世界更简单
|
||||
train_data/rec/train/word_001.jpg 简单可依赖
|
||||
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
|
||||
...
|
||||
```
|
||||
|
||||
The final training set should have the following file structure:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-rec
|
||||
|- rec_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- Test set
|
||||
|
||||
Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-rec
|
||||
|-ic15_data
|
||||
|- rec_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
<a name="Dataset_download"></a>
|
||||
#### 1.2 Dataset download
|
||||
|
||||
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
|
||||
|
||||
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
|
||||
|
||||
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
|
||||
|
||||
```
|
||||
|
@ -63,35 +99,8 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
|
|||
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
|
||||
```
|
||||
|
||||
The final training set should have the following file structure:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-ic15_data
|
||||
|- rec_gt_train.txt
|
||||
|- train
|
||||
|- word_001.png
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
|
||||
- Test set
|
||||
|
||||
Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows:
|
||||
|
||||
```
|
||||
|-train_data
|
||||
|-ic15_data
|
||||
|- rec_gt_test.txt
|
||||
|- test
|
||||
|- word_001.jpg
|
||||
|- word_002.jpg
|
||||
|- word_003.jpg
|
||||
| ...
|
||||
```
|
||||
<a name="Dictionary"></a>
|
||||
- Dictionary
|
||||
#### 1.3 Dictionary
|
||||
|
||||
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
|
||||
|
||||
|
@ -108,6 +117,8 @@ n
|
|||
|
||||
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
|
||||
|
||||
PaddleOCR has built-in dictionaries, which can be used on demand.
|
||||
|
||||
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
|
||||
|
||||
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
|
||||
|
@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
|
|||
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
|
||||
|
||||
|
||||
You can use it on demand.
|
||||
|
||||
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
|
||||
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
|
||||
|
||||
|
@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
|
|||
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
|
||||
|
||||
<a name="Add_space_category"></a>
|
||||
- Add space category
|
||||
#### 1.4 Add space category
|
||||
|
||||
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
|
||||
|
||||
**Note: use_space_char only takes effect when character_type=ch**
|
||||
|
||||
<a name="TRAINING"></a>
|
||||
### TRAINING
|
||||
### 2 TRAINING
|
||||
|
||||
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
|
||||
|
||||
|
@ -166,7 +175,7 @@ Start training:
|
|||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
|
||||
```
|
||||
<a name="Data_Augmentation"></a>
|
||||
- Data Augmentation
|
||||
#### 2.1 Data Augmentation
|
||||
|
||||
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
|
||||
|
||||
|
@ -175,7 +184,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
|
|||
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
|
||||
|
||||
<a name="Training"></a>
|
||||
- Training
|
||||
#### 2.2 Training
|
||||
|
||||
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
|
||||
|
||||
|
@ -268,7 +277,7 @@ Eval:
|
|||
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
|
||||
|
||||
<a name="Multi_language"></a>
|
||||
- Multi-language
|
||||
#### 2.3 Multi-language
|
||||
|
||||
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
|
||||
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
|
||||
|
@ -420,7 +429,7 @@ Eval:
|
|||
```
|
||||
|
||||
<a name="EVALUATION"></a>
|
||||
### EVALUATION
|
||||
### 3 EVALUATION
|
||||
|
||||
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
|
||||
|
||||
|
@ -430,10 +439,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
|
|||
```
|
||||
|
||||
<a name="PREDICTION"></a>
|
||||
### PREDICTION
|
||||
### 4 PREDICTION
|
||||
|
||||
<a name="Training_engine_prediction"></a>
|
||||
* Training engine prediction
|
||||
#### 4.1 Training engine prediction
|
||||
|
||||
Using the model trained by paddleocr, you can quickly get prediction through the following script.
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# paddleocr package
|
||||
|
||||
## Get started quickly
|
||||
### install package
|
||||
## 1 Get started quickly
|
||||
### 1.1 install package
|
||||
install by pypi
|
||||
```bash
|
||||
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
|
||||
|
@ -12,9 +12,11 @@ build own whl package and install
|
|||
python3 setup.py bdist_wheel
|
||||
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
|
||||
```
|
||||
### 1. Use by code
|
||||
## 2 Use
|
||||
### 2.1 Use by code
|
||||
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
|
||||
|
||||
* detection classification and recognition
|
||||
* detection angle classification and recognition
|
||||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
|
||||
|
@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
|
|||
['0', 0.99999964]
|
||||
```
|
||||
|
||||
### Use by command line
|
||||
### 2.2 Use by command line
|
||||
|
||||
show help information
|
||||
```bash
|
||||
|
@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
|
|||
['0', 0.99999964]
|
||||
```
|
||||
|
||||
## Use custom model
|
||||
## 3 Use custom model
|
||||
When the built-in model cannot meet the needs, you need to use your own trained model.
|
||||
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
|
||||
|
||||
### 1. Use by code
|
||||
### 3.1 Use by code
|
||||
|
||||
```python
|
||||
from paddleocr import PaddleOCR,draw_ocr
|
||||
|
@ -265,17 +267,17 @@ im_show = Image.fromarray(im_show)
|
|||
im_show.save('result.jpg')
|
||||
```
|
||||
|
||||
### Use by command line
|
||||
### 3.2 Use by command line
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
|
||||
```
|
||||
|
||||
### Use web images or numpy array as input
|
||||
## 4 Use web images or numpy array as input
|
||||
|
||||
1. Web image
|
||||
### 4.1 Web image
|
||||
|
||||
Use by code
|
||||
- Use by code
|
||||
```python
|
||||
from paddleocr import PaddleOCR, draw_ocr
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
|
||||
|
@ -294,12 +296,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
|
|||
im_show = Image.fromarray(im_show)
|
||||
im_show.save('result.jpg')
|
||||
```
|
||||
Use by command line
|
||||
- Use by command line
|
||||
```bash
|
||||
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
|
||||
```
|
||||
|
||||
2. Numpy array
|
||||
### 4.2 Numpy array
|
||||
Support numpy array as input only when used by code
|
||||
|
||||
```python
|
||||
|
@ -324,7 +326,7 @@ im_show.save('result.jpg')
|
|||
```
|
||||
|
||||
|
||||
## Parameter Description
|
||||
## 5 Parameter Description
|
||||
|
||||
| Parameter | Description | Default value |
|
||||
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 61 KiB |
Loading…
Reference in New Issue