diff --git a/doc/doc_ch/multi_languages.md b/doc/doc_ch/multi_languages.md index a8f7c2b7..eec09535 100644 --- a/doc/doc_ch/multi_languages.md +++ b/doc/doc_ch/multi_languages.md @@ -5,6 +5,25 @@ - 2021.4.9 支持**80种**语言的检测和识别 - 2021.4.9 支持**轻量高精度**英文模型检测识别 +PaddleOCR 旨在打造一套丰富、领先、且实用的OCR工具库,不仅提供了通用场景下的中英文模型,也提供了专门在英文场景下训练的模型, +和覆盖[80个语言](#语种缩写)的小语种模型。 + +其中英文模型支持,大小写字母和常见标点的检测识别,并优化了空格字符的识别: + +
+ +
+ +小语种模型覆盖了拉丁语系、阿拉伯语系、中文繁体、韩语、日语等等: + +
+ + +
+ + +本文档将简要介绍小语种模型的使用方法。 + - [1 安装](#安装) - [1.1 paddle 安装](#paddle安装) - [1.2 paddleocr package 安装](#paddleocr_package_安装) @@ -68,7 +87,11 @@ Paddleocr目前支持80个语种,可以通过修改--lang参数进行切换, paddleocr --image_dir doc/imgs/japan_2.jpg --lang=japan ``` -![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs/japan_2.jpg) + +
+ +
+ 结果是一个list,每个item包含了文本框,文字和识别置信度 ```text @@ -138,8 +161,10 @@ im_show.save('result.jpg') ``` 结果可视化: -![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_results/korean.jpg) +
+ +
* 识别预测 @@ -152,7 +177,8 @@ for line in result: print(line) ``` -![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_words/german/1.jpg) + +![](../imgs_words/german/1.jpg) 结果是一个tuple,只包含识别结果和识别置信度 @@ -187,7 +213,10 @@ im_show.save('result.jpg') ``` 结果可视化 : -![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.0/doc/imgs_results/whl/12_det.jpg) + +
+ +
ppocr 还支持方向分类, 更多使用方式请参考:[whl包使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/whl.md)。 @@ -233,7 +262,7 @@ ppocr 支持使用自己的数据进行自定义训练或finetune, 其中识别 |卡纳达文|Kannada |kn| |泰米尔文|Tamil |ta| |南非荷兰文 |Afrikaans |af| -|阿塞拜疆文 |Azerbaijani |az| +|阿塞拜疆文 |Azerbaijani |az| |波斯尼亚文|Bosnian|bs| |捷克文|Czech|cs| |威尔士文 |Welsh |cy| diff --git a/doc/doc_en/multi_languages_en.md b/doc/doc_en/multi_languages_en.md index d1c4583f..3d786832 100644 --- a/doc/doc_en/multi_languages_en.md +++ b/doc/doc_en/multi_languages_en.md @@ -5,6 +5,26 @@ -2021.4.9 supports the detection and recognition of 80 languages -2021.4.9 supports **lightweight high-precision** English model detection and recognition +PaddleOCR aims to create a rich, leading, and practical OCR tool library, which not only provides +Chinese and English models in general scenarios, but also provides models specifically trained +in English scenarios. And multilingual models covering [80 languages](#language_abbreviations). + +Among them, the English model supports the detection and recognition of uppercase and lowercase +letters and common punctuation, and the recognition of space characters is optimized: + +
+ +
+ +The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japanese, etc.: + +
+ + +
+ +This document will briefly introduce how to use the multilingual model. + -[1 Installation](#Install) -[1.1 paddle installation](#paddleinstallation) -[1.2 paddleocr package installation](#paddleocr_package_install) diff --git a/doc/imgs_results/multi_lang/en_1.jpg b/doc/imgs_results/multi_lang/en_1.jpg new file mode 100644 index 00000000..2dc84d3f Binary files /dev/null and b/doc/imgs_results/multi_lang/en_1.jpg differ diff --git a/doc/imgs_results/multi_lang/en_2.jpg b/doc/imgs_results/multi_lang/en_2.jpg new file mode 100644 index 00000000..455ec98e Binary files /dev/null and b/doc/imgs_results/multi_lang/en_2.jpg differ diff --git a/doc/imgs_results/multi_lang/en_3.jpg b/doc/imgs_results/multi_lang/en_3.jpg new file mode 100644 index 00000000..36eb063d Binary files /dev/null and b/doc/imgs_results/multi_lang/en_3.jpg differ diff --git a/doc/imgs_results/multi_lang/french_0.jpg b/doc/imgs_results/multi_lang/french_0.jpg new file mode 100644 index 00000000..3c2abe63 Binary files /dev/null and b/doc/imgs_results/multi_lang/french_0.jpg differ diff --git a/doc/imgs_results/multi_lang/japan_2.jpg b/doc/imgs_results/multi_lang/japan_2.jpg new file mode 100644 index 00000000..7038ba2e Binary files /dev/null and b/doc/imgs_results/multi_lang/japan_2.jpg differ diff --git a/ppocr/utils/en_dict.txt b/ppocr/utils/en_dict.txt new file mode 100644 index 00000000..7677d31b --- /dev/null +++ b/ppocr/utils/en_dict.txt @@ -0,0 +1,95 @@ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +: +; +< += +> +? +@ +A +B +C +D +E +F +G +H +I +J +K +L +M +N +O +P +Q +R +S +T +U +V +W +X +Y +Z +[ +\ +] +^ +_ +` +a +b +c +d +e +f +g +h +i +j +k +l +m +n +o +p +q +r +s +t +u +v +w +x +y +z +{ +| +} +~ +! +" +# +$ +% +& +' +( +) +* ++ +, +- +. +/ +