Merge pull request #1459 from weisy11/dygraph

update style text doc, add corpus file descriptions
2020-12-16 14:34:01 +08:00 · 2020-12-16 14:34:01 +08:00 · 631fe2ecca
parent f38a22c0b3 f41851025b
commit 631fe2ecca
2 changed files with 19 additions and 2 deletions
--- a/StyleText/README.md
+++ b/StyleText/README.md
@ -116,9 +116,17 @@ In actual application scenarios, it is often necessary to synthesize pictures in
   * `CorpusGenerator`：
     * `method`：Method of CorpusGenerator，supports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is used，No other configuration is needed，otherwise you need to set `corpus_file` and `language`.
     * `language`：Language of the corpus.
-     * `corpus_file`: Filepath of the corpus.
+     * `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings（'\n'）. Corpus generator samples one line each time.


+Example of corpus file: 
+```
+PaddleOCR
+飞桨文字识别
+StyleText
+风格文本图像数据合成
+```
+
 We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :

 <div align="center">
--- a/StyleText/README_ch.md
+++ b/StyleText/README_ch.md
@ -102,7 +102,16 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
   * `CorpusGenerator`：
     * `method`：语料生成方法，目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`，则不需要填写其他配置，否则需要修改`corpus_file`和`language`；
     * `language`：语料的语种；
-     * `corpus_file`: 语料文件路径。
+     * `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分，之后每次随机选取一行。
+
+   语料文件格式示例：
+   ```
+   PaddleOCR
+   飞桨文字识别
+   StyleText
+   风格文本图像数据合成
+   ...
+   ```

   Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像，便于合成场景丰富的文本图像，下图给出了一些示例。