fix recognition doc

2020-09-21 10:52:58 +00:00 · 2020-09-21 10:52:58 +00:00 · a0f13f4340
parent 25d7eb87c2
commit a0f13f4340
2 changed files with 84 additions and 4 deletions
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@ -97,9 +97,23 @@ n
 word_dict.txt 每行有一个单字，将字符与数字索引映射在一起，“and” 将被映射成 [2 5 1]

 `ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典，
+
 `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典，
+
+`ppocr/utils/french_dict.txt` 是一个包含118个字符的法文字典
+
+`ppocr/utils/japan_dict.txt` 是一个包含4399个字符的法文字典
+
+`ppocr/utils/korean_dict.txt` 是一个包含3636个字符的法文字典
+
+`ppocr/utils/german_dict.txt` 是一个包含131个字符的法文字典
+
+
 您可以按需使用。

+目前的多语言模型仍处在demo阶段，会持续优化模型并补充语种，**非常欢迎您为我们提供其他语言的字典和字体**，
+如您愿意可将字典文件提交至 [utils](../../ppocr/utils) ，我们会在Repo中感谢您。
+
 - 自定义字典

 如需自定义dic文件，请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
@ -222,7 +236,39 @@ PaddleOCR也提供了多语言的， `configs/rec/multi_languages` 路径下的
 | rec_japan_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | 日语   ｜
 | rec_korean_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | 韩语   ｜

-多语言模型训练方式与中文模型一致，训练数据集均为100w的合成数据，少量的字体和测试数据可以在[百度网盘]()上下载。
+多语言模型训练方式与中文模型一致，训练数据集均为100w的合成数据，少量的字体可以在 [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 上下载，提取码：frgi。
+
+如您希望在现有模型效果的基础上调优，请参考下列说明修改配置文件：
+
+以 `rec_french_lite_train` 为例：
+```
+Global:
+  ...
+  # 添加自定义字典，如修改字典请将路径指向新字典
+  character_dict_path: ./ppocr/utils/french_dict.txt
+  # 训练时添加数据增强
+  distort: true
+  # 识别空格
+  use_space_char: true
+  ...
+  # 修改reader类型
+  reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
+  ...
+...
+```
+
+同时需要修改数据读取文件 `rec_french_reader.yml`：
+
+```
+TrainReader:
+  ...
+  # 修改训练数据存放的目录名
+  img_set_dir: ./train_data
+  # 修改 label 文件名称
+  label_file_path: ./train_data/french_train.txt
+
+...
+```

 ### 评估

--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@ -92,9 +92,21 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a

 `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.

-`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters.
+`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
+
+`ppocr/utils/french_dict.txt` is a French dictionary with 118 characters
+
+`ppocr/utils/japan_dict.txt` is a French dictionary with 4399 characters
+
+`ppocr/utils/korean_dict.txt` is a French dictionary with 3636 characters
+
+`ppocr/utils/german_dict.txt` is a French dictionary with 131 characters
+
+You can use it on demand.
+
+The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
+If you like, you can submit the dictionary file to [utils](../../ppocr/utils) and we will thank you in the Repo.

-You can use them if needed.

 To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.

@ -215,7 +227,29 @@ PaddleOCR also provides multi-language. The configuration file in `configs/rec/m
 | rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese ｜
 | rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean ｜

-The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk]().
+The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
+
+If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
+
+Take `rec_french_lite_train` as an example:
+
+```
+Global:
+  ...
+  # Add a custom dictionary, if you modify the dictionary
+  # please point the path to the new dictionary
+  character_dict_path: ./ppocr/utils/french_dict.txt
+  # Add data augmentation during training
+  distort: true
+  # Identify spaces
+  use_space_char: true
+  ...
+  # Modify reader type
+  reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
+  ...
+...
+```
+

 ### EVALUATION