From 1e00e429b4018346c99d810841b40e38881a5bf1 Mon Sep 17 00:00:00 2001
From: xxupiano <xuxin1999102@126.com>
Date: Tue, 30 Nov 2021 14:26:22 +0800
Subject: [PATCH] Update README

---
 README.md    | 195 +++++++++++++++++++++++++++++++++------------------
 README_CN.md | 118 ++++++++++++++++++++++---------
 2 files changed, 214 insertions(+), 99 deletions(-)
diff --git a/README.md b/README.md
index 797e8b1..11a79b8 100644
--- a/README.md
+++ b/README.md
@@ -18,22 +18,23 @@
 
 <br>
 
-<h2 align="center">
-    <p>A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population</p>
-</h2>
+<h1 align="center">
+    <p>A Deep Learning Based Knowledge Extraction Toolkit<br>for Knowledge Base Population</p>
+</h1>
+
 
 DeepKE is a knowledge extraction toolkit supporting **low-resource** and **document-level** scenarios. It provides three functions based on **PyTorch**, including **Named Entity Recognition**, **Relation Extraciton** and **Attribute Extraction**.
 
 <br>
 
-## Prediction
+# Prediction
 
 There is a demonstration of prediction.<br>
 <img src="pics/demo.gif" width="636" height="494" align=center>
 
 <br>
 
-## Model Framework
+# Model Framework
 
 <h3 align="center">
     <img src="pics/architectures.png">
@@ -42,23 +43,25 @@ There is a demonstration of prediction.<br>
     Figure 1: The framework of DeepKE
 </p>
 
-- DeepKE contains three modules for **named entity recognition**, **relation extraction** and **attribute extraction**, the three tasks respectively.
-- Each module has its own submodules. For example, there are **standard**, **document-level** and **few-shot** submodules in the attribute extraction modular.
-- Each submodule compose of three parts: a **collection of tools**, which can function as tokenizer, dataloader, preprocessor and the like, a **encoder** and a part for **training and prediction**
+- DeepKE contains a unified framework for **named entity recognition**, **relation extraction** and **attribute extraction**, the three  knowledge extraction functions.
+- Each task can be implemented in different scenarios. For example, we can achieve relation extraction in **standard**, **low-resource (few-shot)** and **document-level** settings.
+- Each application scenario comprises of three components: **Data** including Tokenizer, Preprocessor and Loader, **Model** including Module, Encoder and Forwarder, **Core** including Training, Evaluation and Prediction. 
 
 <br>
 
-## Quickstart
+# Quickstart
 
-*DeepKE* is supported `pip install deepke`. Take the fully supervised attribute extraction for example.
+*DeepKE* is supported `pip install deepke`. Take the fully supervised relation extraction for example. <br>(Please star✨ and fork :memo: !!!)
 
-**Step1** Download basic codes `git clone https://github.com/zjunlp/DeepKE.git ` (Please star✨ and fork :memo:)
+**Step1** Download the basic codes
 
-**Step2** Create a virtual environment using`Anaconda` and enter it.
-
- We also provide dockerfile source code, you can create your own image, which is located in the docker folder.
+```bash
+git clone https://github.com/zjunlp/DeepKE.git
+```
 
+**Step2** Create a virtual environment using `Anaconda` and enter it.<br>
 
+We also provide dockerfile source code, you can create your own image, which is located in the `docker` folder.
 
 ```bash
 conda create -n deepke python=3.8
@@ -86,21 +89,29 @@ conda activate deepke
 cd DeepKE/example/re/standard
 ```
 
-**Step4** Training (Parameters for training can be changed in the `conf` folder)
+**Step4** Download the dataset
+
+```bash
+wget 120.27.214.45/Data/re/standard/data.tar.gz
+
+tar -xzvf data.tar.gz
+```
+
+**Step5** Training (Parameters for training can be changed in the `conf` folder)
 
 ```bash
 python run.py
 ```
 
-**Step5** Prediction (Parameters for prediction can be changed in the `conf` folder)
+**Step6** Prediction (Parameters for prediction can be changed in the `conf` folder)
 
 ```bash
 python predict.py
 ```
 
+<br>
 
-
-### Requirements
+# Requirements
 
 > python == 3.8
 
@@ -117,9 +128,11 @@ python predict.py
 - opt-einsum==3.3.0
 - ujson
 
-### Introduction of Three Functions
+<br>
 
-#### 1. Named Entity Recognition
+# Introduction of Three Functions
+
+## 1. Named Entity Recognition
 
 - Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.
 
@@ -134,10 +147,18 @@ python predict.py
 - Read the detailed process in specific README
   - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)**
 
-    **Step1** Enter  `DeepKE/example/ner/standard`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
+    **Step1** Enter  `DeepKE/example/ner/standard`.  Download the dataset.
 
-    **Step2** Training
+    ```bash
+    wget 120.27.214.45/Data/ner/standard/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
 
+    **Step2** Training<br>
+
+    The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+  
     ```bash
     python run.py
     ```
@@ -147,26 +168,34 @@ python predict.py
     ```bash
     python predict.py
     ```
-
+  
   - **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)**
 
-    **Step1** Enter  `DeepKE/example/ner/few-shot`. The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.<br>
-
-    **Step2** Training with default `CoNLL-2003` dataset.
+    **Step1** Enter  `DeepKE/example/ner/few-shot`.  Download the dataset.
 
+    ```bash
+    wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
+  
+    **Step2** Training in the low-resouce setting <br>
+  
+    The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.
+  
     ```bash
     python run.py +train=few_shot
     ```
-
-    Users can modify `load_path` in `conf/train/few_shot.yaml` with the use of existing loaded model.<br>
-
+    
+    Users can modify `load_path` in `conf/train/few_shot.yaml` to use existing loaded model.<br>
+    
     **Step3** Add `- predict` to `conf/config.yaml`, modify `loda_path` as the model path and `write_path` as the path where the predicted results are saved in `conf/predict.yaml`, and then run `python predict.py`
-
+    
     ```bash
     python predict.py
     ```
 
-#### 2. Relation Extraction
+## 2. Relation Extraction
 
 - Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.
 
@@ -180,12 +209,20 @@ python predict.py
 
 - Read the detailed process in specific README
 
-  - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/standard)** 
+  - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)** 
 
-    **Step1** Enter the `DeepKE/example/re/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
+    **Step1** Enter the `DeepKE/example/re/standard` folder.  Download the dataset.
 
-    **Step2** Training
+    ```bash
+    wget 120.27.214.45/Data/re/standard/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
 
+    **Step2** Training<br>
+
+    The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+  
     ```bash
     python run.py
     ```
@@ -195,42 +232,58 @@ python predict.py
     ```bash
     python predict.py
     ```
+  
+  - **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
 
-  - **[FEW-SHOT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/few-shot)**
+    **Step1** Enter `DeepKE/example/re/few-shot`. Download the dataset.
 
-    **Step1** Enter `DeepKE/example/re/few-shot`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
+    ```bash
+    wget 120.27.214.45/Data/re/few_shot/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
 
-    **Step 2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. <br>
+    **Step 2** Training<br>
 
+    - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+    - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. 
+  
     ```bash
     python run.py
     ```
-
+  
     **Step3** Prediction
-
+  
+    ```bash
+    python predict.py
+    ```
+  
+  - **[DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**<br>
+  
+    **Step1** Enter `DeepKE/example/re/document`.  Download the dataset.
+  
+    ```bash
+    wget 120.27.214.45/Data/re/document/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
+    
+    **Step2** Training<br>
+  
+    - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+    - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. 
+    
+    ```bash
+    python run.py
+    ```
+    
+    **Step3** Prediction
+    
     ```bash
     python predict.py
     ```
 
-  - **[DOCUMENT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/document)**<br>
-
-    Download the model `train_distant.json` from [*Google Drive*](https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw) to `data/`.
-
-    **Step1** Enter `DeepKE/example/re/document`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
-
-    **Step2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. 
-
-    ```bash
-    python run.py
-    ```
-
-    **Step3** Prediction
-
-    ```bash
-    python predict.py
-    ```
-
-#### 3. Attribute Extraction
+## 3. Attribute Extraction
 
 - Attribute extraction is to extract attributes for entities in a unstructed text.
 
@@ -243,25 +296,33 @@ python predict.py
   |        2014年10月1日许鞍华执导的电影《黄金时代》上映         | 上映时间 | 黄金时代 |     19     | 2014年10月1日 |     0      |
 
 - Read the detailed process in specific README
-  - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/ae/standard)**
+  - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**
 
-    **Step1** Enter the `DeepKE/example/ae/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
+    **Step1** Enter the `DeepKE/example/ae/standard` folder. Download the dataset.
 
-    **Step2** Training
+    ```bash
+    wget 120.27.214.45/Data/ae/standard/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
 
+    **Step2** Training<br>
+
+    The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+    
     ```bash
     python run.py
     ```
-
+    
     **Step3** Prediction
-
+    
     ```bash
     python predict.py
     ```
 
+<br>
 
-
-## Notebook Tutorial
+# Notebook Tutorial
 
 This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. Users can study *DeepKE* with them.
 
@@ -297,7 +358,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
 
 <br>
 
-## Tips
+# Tips
 
 1. Using nearest mirror, like [THU](https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/) in China, will speed up the installation of *Anaconda*.
 2. Using nearest mirror, like [aliyun](http://mirrors.aliyun.com/pypi/simple/) in China, will speed up `pip install XXX`.
@@ -307,7 +368,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
 
 <br>
 
-## Developers
+# Developers
 
 Zhejiang University: Ningyu Zhang, Liankuan Tao, Haiyang Yu, Xiang Chen, Xin Xu, Xi Tian, Lei Li, Zhoubo Li, Shumin Deng, Yunzhi Yao, Hongbin Ye, Xin Xie, Guozhou Zheng, Huajun Chen
 
diff --git a/README_CN.md b/README_CN.md
index bbfcede..3cbd705 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -47,9 +47,9 @@ DeepKE包括了三个模块，可以进行命名实体识别、关系抽取以
 
 DeepKE支持pip安装使用，以常规全监督设定关系抽取为例，经过以下五个步骤就可以实现一个常规关系抽取模型
 
-**Step 1** 下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```（别忘记star和fork哈！！！）
+**Step 1**：下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```（别忘记star和fork哈！！！）
 
-**Step 2** 使用anaconda创建虚拟环境，进入虚拟环境(提供Dockerfile源码可自行创建镜像，位于docker文件夹中)
+**Step 2**：使用anaconda创建虚拟环境，进入虚拟环境(提供Dockerfile源码可自行创建镜像，位于docker文件夹中)
 
 ```
 conda create -n deepke python=3.8
@@ -70,24 +70,26 @@ python setup.py install
 python setup.py develop
 ```
 
-**Step 3**  进入任务文件夹，以常规关系抽取为例
+**Step 3** ：进入任务文件夹，以常规关系抽取为例
 
 ```
 cd DeepKE/example/re/standard
 ```
 
-**Step 4**  模型训练，训练用到的参数可在conf文件夹内修改
+**Step 4** ：模型训练，训练用到的参数可在conf文件夹内修改
 
 ```
 python run.py
 ```
 
-**Step 5**  模型预测。预测用到的参数可在conf文件夹内修改
+**Step 5** ：模型预测。预测用到的参数可在conf文件夹内修改
 
 ```
 python predict.py
 ```
 
+<br>
+
 ### 环境依赖
 
 > python == 3.8
@@ -118,11 +120,19 @@ python predict.py
   | 秦始皇兵马俑位于陕西省西安市，1961年被国务院公布为第一批全国重点文物保护单位，是世界八大奇迹之一。 |           秦始皇           | 陕西省，西安市 |             国务院             |
 
 - 具体流程请进入详细的README中
-  - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ner/standard)** 
+  - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)** 
   
-     **Step1**: 进入`DeepKE/example/ner/standard`，数据集和参数配置可以分别在`data`和`conf`文件夹中修改；<br>
+     **Step1**: 进入`DeepKE/example/ner/standard`，下载数据集
      
-     **Step2**: 模型训练
+     ```bash
+     wget 120.27.214.45/Data/ner/standard/data.tar.gz
+     
+     tar -xzvf data.tar.gz
+     ```
+     
+     **Step2**: 模型训练<br>
+     
+     数据集和参数配置可以分别在`data`和`conf`文件夹中修改
      
      ```
      python run.py
@@ -135,9 +145,17 @@ python predict.py
      
   - **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)** 
   
-    **Step1**: 进入`DeepKE/example/ner/few-shot`，模型加载和保存位置以及参数配置可以在`conf`文件夹中修改；<br>
+    **Step1**: 进入`DeepKE/example/ner/few-shot`，下载数据集
     
-    **Step2**：模型训练，默认使用`CoNLL-2003`数据集进行训练
+    ```bash
+    wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
+    
+    **Step2**：低资源场景下训练模型<br>
+    
+    模型加载和保存位置以及参数配置可以在`conf`文件夹中修改
     
      ```
      python run.py +train=few_shot
@@ -162,45 +180,71 @@ python predict.py
   |     提起杭州的美景，西湖总是第一个映入脑海的词语。     | 所在城市 |    西湖    |      8      |    杭州    |      2      |
 
 - 具体流程请进入详细的README中，RE包括了以下三个子功能
-  - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/re/standard)**  
+  - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)**  
 
-    **Step1**：进入`DeepKE/example/re/standard`，数据集和参数配置可以分别进入`data`和`conf`文件夹中修改；<br>
+    **Step1**：进入`DeepKE/example/re/standard`，下载数据集
+  
+    ```bash
+    wget 120.27.214.45/Data/re/standard/data.tar.gz
     
-    **Step2**：模型训练
+    tar -xzvf data.tar.gz
+    ```
+  
+    **Step2**：模型训练<br>
 
+    数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+  
     ```
     python run.py
     ```
-    
+  
     **Step3**：模型预测
-
+  
     ```
     python predict.py
     ```
   
-  - **[少样本FEW-SHOT](https://github.com/zjunlp/deepke/blob/main/example/re/few-shot)**
+  - **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
   
-    **Step1**：进入`DeepKE/example/re/few-shot`，数据集和参数配置可以分别进入`data`和`conf`文件夹中修改；<br>
-  
-    **Step2**：模型训练，如需从上次训练的模型开始训练：设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径，每次训练的日志默认保存在根目录，可用`log_dir`来配置；<br>
+    **Step1**：进入`DeepKE/example/re/few-shot`，下载数据集
+
+    ```bash
+    wget 120.27.214.45/Data/re/few_shot/data.tar.gz
     
+    tar -xzvf data.tar.gz
+    ```
+  
+    **Step2**：模型训练<br>
+  
+    - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+  
+    - 如需从上次训练的模型开始训练：设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径，每次训练的日志默认保存在根目录，可用`log_dir`来配置
+  
     ```
     python run.py
     ```
-    
+  
     **Step3**：模型预测
-
+  
     ```
     python predict.py
     ```
-
-  - **[文档级DOCUMENT](https://github.com/zjunlp/deepke/blob/main/example/re/document)** <br>
-    ```train_distant.json```由于文件太大，请自行从Google Drive上下载到data/目录下；<br>
+  
+  - **[文档级DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)** <br>
+    
+    **Step1**：进入`DeepKE/example/re/document`，下载数据集
+    
+    ```bash
+    wget 120.27.214.45/Data/re/document/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
+    
+    **Step2**：模型训练<br>
+    
+    - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+    - 如需从上次训练的模型开始训练：设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径，每次训练的日志默认保存在根目录，可用`log_dir`来配置；
     
-    **Step1**：进入`DeepKE/example/re/document`，数据集和参数配置可以分别进入`data`和`conf`文件夹中修改；<br>
-  
-    **Step2**：模型训练，如需从上次训练的模型开始训练：设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径，每次训练的日志默认保存在根目录，可用`log_dir`来配置；
-  
     ```
     python run.py
     ```
@@ -221,22 +265,32 @@ python predict.py
   |        2014年10月1日许鞍华执导的电影《黄金时代》上映         | 上映时间 | 黄金时代 |     19     | 2014年10月1日 |     0      |
 
 - 具体流程请进入详细的README中
-  - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ae/standard)**  
+  - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**  
     
-    **Step1**：进入`DeepKE/example/re/standard`，数据集和参数配置可以分别进入`data`和`conf`文件夹中修改；<br>
+    **Step1**：进入`DeepKE/example/re/standard`，下载数据集
     
-    **Step2**：模型训练
+    ```bash
+    wget 120.27.214.45/Data/ae/standard/data.tar.gz
+    
+    tar -xzvf data.tar.gz
+    ```
+    
+    **Step2**：模型训练<br>
 
+    数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+    
     ```
     python run.py
     ```
     
     **Step3**：模型预测
-
+    
     ```
     python predict.py
     ```
 
+<br>
+
 ### Notebook教程
 
 本工具提供了若干Notebook和Google Colab教程，用户可针对性调试学习。