fix recognition doc
This commit is contained in:
parent
25d7eb87c2
commit
a0f13f4340
|
@ -97,9 +97,23 @@ n
|
|||
word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“and” 将被映射成 [2 5 1]
|
||||
|
||||
`ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典,
|
||||
|
||||
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典,
|
||||
|
||||
`ppocr/utils/french_dict.txt` 是一个包含118个字符的法文字典
|
||||
|
||||
`ppocr/utils/japan_dict.txt` 是一个包含4399个字符的法文字典
|
||||
|
||||
`ppocr/utils/korean_dict.txt` 是一个包含3636个字符的法文字典
|
||||
|
||||
`ppocr/utils/german_dict.txt` 是一个包含131个字符的法文字典
|
||||
|
||||
|
||||
您可以按需使用。
|
||||
|
||||
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**,
|
||||
如您愿意可将字典文件提交至 [utils](../../ppocr/utils) ,我们会在Repo中感谢您。
|
||||
|
||||
- 自定义字典
|
||||
|
||||
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
|
||||
|
@ -222,7 +236,39 @@ PaddleOCR也提供了多语言的, `configs/rec/multi_languages` 路径下的
|
|||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 韩语 |
|
||||
|
||||
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体和测试数据可以在[百度网盘]()上下载。
|
||||
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载,提取码:frgi。
|
||||
|
||||
如您希望在现有模型效果的基础上调优,请参考下列说明修改配置文件:
|
||||
|
||||
以 `rec_french_lite_train` 为例:
|
||||
```
|
||||
Global:
|
||||
...
|
||||
# 添加自定义字典,如修改字典请将路径指向新字典
|
||||
character_dict_path: ./ppocr/utils/french_dict.txt
|
||||
# 训练时添加数据增强
|
||||
distort: true
|
||||
# 识别空格
|
||||
use_space_char: true
|
||||
...
|
||||
# 修改reader类型
|
||||
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
同时需要修改数据读取文件 `rec_french_reader.yml`:
|
||||
|
||||
```
|
||||
TrainReader:
|
||||
...
|
||||
# 修改训练数据存放的目录名
|
||||
img_set_dir: ./train_data
|
||||
# 修改 label 文件名称
|
||||
label_file_path: ./train_data/french_train.txt
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
### 评估
|
||||
|
||||
|
|
|
@ -92,9 +92,21 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
|
|||
|
||||
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
|
||||
|
||||
`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters.
|
||||
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
|
||||
|
||||
`ppocr/utils/french_dict.txt` is a French dictionary with 118 characters
|
||||
|
||||
`ppocr/utils/japan_dict.txt` is a French dictionary with 4399 characters
|
||||
|
||||
`ppocr/utils/korean_dict.txt` is a French dictionary with 3636 characters
|
||||
|
||||
`ppocr/utils/german_dict.txt` is a French dictionary with 131 characters
|
||||
|
||||
You can use it on demand.
|
||||
|
||||
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
|
||||
If you like, you can submit the dictionary file to [utils](../../ppocr/utils) and we will thank you in the Repo.
|
||||
|
||||
You can use them if needed.
|
||||
|
||||
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
|
||||
|
||||
|
@ -215,7 +227,29 @@ PaddleOCR also provides multi-language. The configuration file in `configs/rec/m
|
|||
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
|
||||
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
|
||||
|
||||
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk]().
|
||||
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
|
||||
|
||||
If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
|
||||
|
||||
Take `rec_french_lite_train` as an example:
|
||||
|
||||
```
|
||||
Global:
|
||||
...
|
||||
# Add a custom dictionary, if you modify the dictionary
|
||||
# please point the path to the new dictionary
|
||||
character_dict_path: ./ppocr/utils/french_dict.txt
|
||||
# Add data augmentation during training
|
||||
distort: true
|
||||
# Identify spaces
|
||||
use_space_char: true
|
||||
...
|
||||
# Modify reader type
|
||||
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
|
||||
### EVALUATION
|
||||
|
||||
|
|
Loading…
Reference in New Issue