From 1e00e429b4018346c99d810841b40e38881a5bf1 Mon Sep 17 00:00:00 2001
From: xxupiano
Date: Tue, 30 Nov 2021 14:26:22 +0800
Subject: [PATCH] Update README
---
README.md | 195 +++++++++++++++++++++++++++++++++------------------
README_CN.md | 118 ++++++++++++++++++++++---------
2 files changed, 214 insertions(+), 99 deletions(-)
diff --git a/README.md b/README.md
index 797e8b1..11a79b8 100644
--- a/README.md
+++ b/README.md
@@ -18,22 +18,23 @@
-
-
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
-
+
+
A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Base Population
+
+
DeepKE is a knowledge extraction toolkit supporting **low-resource** and **document-level** scenarios. It provides three functions based on **PyTorch**, including **Named Entity Recognition**, **Relation Extraciton** and **Attribute Extraction**.
-## Prediction
+# Prediction
There is a demonstration of prediction.
-## Model Framework
+# Model Framework
@@ -42,23 +43,25 @@ There is a demonstration of prediction.
Figure 1: The framework of DeepKE
-- DeepKE contains three modules for **named entity recognition**, **relation extraction** and **attribute extraction**, the three tasks respectively.
-- Each module has its own submodules. For example, there are **standard**, **document-level** and **few-shot** submodules in the attribute extraction modular.
-- Each submodule compose of three parts: a **collection of tools**, which can function as tokenizer, dataloader, preprocessor and the like, a **encoder** and a part for **training and prediction**
+- DeepKE contains a unified framework for **named entity recognition**, **relation extraction** and **attribute extraction**, the three knowledge extraction functions.
+- Each task can be implemented in different scenarios. For example, we can achieve relation extraction in **standard**, **low-resource (few-shot)** and **document-level** settings.
+- Each application scenario comprises of three components: **Data** including Tokenizer, Preprocessor and Loader, **Model** including Module, Encoder and Forwarder, **Core** including Training, Evaluation and Prediction.
-## Quickstart
+# Quickstart
-*DeepKE* is supported `pip install deepke`. Take the fully supervised attribute extraction for example.
+*DeepKE* is supported `pip install deepke`. Take the fully supervised relation extraction for example.
(Please star✨ and fork :memo: !!!)
-**Step1** Download basic codes `git clone https://github.com/zjunlp/DeepKE.git ` (Please star✨ and fork :memo:)
+**Step1** Download the basic codes
-**Step2** Create a virtual environment using`Anaconda` and enter it.
-
- We also provide dockerfile source code, you can create your own image, which is located in the docker folder.
+```bash
+git clone https://github.com/zjunlp/DeepKE.git
+```
+**Step2** Create a virtual environment using `Anaconda` and enter it.
+We also provide dockerfile source code, you can create your own image, which is located in the `docker` folder.
```bash
conda create -n deepke python=3.8
@@ -86,21 +89,29 @@ conda activate deepke
cd DeepKE/example/re/standard
```
-**Step4** Training (Parameters for training can be changed in the `conf` folder)
+**Step4** Download the dataset
+
+```bash
+wget 120.27.214.45/Data/re/standard/data.tar.gz
+
+tar -xzvf data.tar.gz
+```
+
+**Step5** Training (Parameters for training can be changed in the `conf` folder)
```bash
python run.py
```
-**Step5** Prediction (Parameters for prediction can be changed in the `conf` folder)
+**Step6** Prediction (Parameters for prediction can be changed in the `conf` folder)
```bash
python predict.py
```
+
-
-### Requirements
+# Requirements
> python == 3.8
@@ -117,9 +128,11 @@ python predict.py
- opt-einsum==3.3.0
- ujson
-### Introduction of Three Functions
+
-#### 1. Named Entity Recognition
+# Introduction of Three Functions
+
+## 1. Named Entity Recognition
- Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.
@@ -134,10 +147,18 @@ python predict.py
- Read the detailed process in specific README
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)**
- **Step1** Enter `DeepKE/example/ner/standard`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter `DeepKE/example/ner/standard`. Download the dataset.
- **Step2** Training
+ ```bash
+ wget 120.27.214.45/Data/ner/standard/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+ **Step2** Training
+
+ The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+
```bash
python run.py
```
@@ -147,26 +168,34 @@ python predict.py
```bash
python predict.py
```
-
+
- **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)**
- **Step1** Enter `DeepKE/example/ner/few-shot`. The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.
-
- **Step2** Training with default `CoNLL-2003` dataset.
+ **Step1** Enter `DeepKE/example/ner/few-shot`. Download the dataset.
+ ```bash
+ wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2** Training in the low-resouce setting
+
+ The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.
+
```bash
python run.py +train=few_shot
```
-
- Users can modify `load_path` in `conf/train/few_shot.yaml` with the use of existing loaded model.
-
+
+ Users can modify `load_path` in `conf/train/few_shot.yaml` to use existing loaded model.
+
**Step3** Add `- predict` to `conf/config.yaml`, modify `loda_path` as the model path and `write_path` as the path where the predicted results are saved in `conf/predict.yaml`, and then run `python predict.py`
-
+
```bash
python predict.py
```
-#### 2. Relation Extraction
+## 2. Relation Extraction
- Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.
@@ -180,12 +209,20 @@ python predict.py
- Read the detailed process in specific README
- - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/standard)**
+ - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)**
- **Step1** Enter the `DeepKE/example/re/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter the `DeepKE/example/re/standard` folder. Download the dataset.
- **Step2** Training
+ ```bash
+ wget 120.27.214.45/Data/re/standard/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+ **Step2** Training
+
+ The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+
```bash
python run.py
```
@@ -195,42 +232,58 @@ python predict.py
```bash
python predict.py
```
+
+ - **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
- - **[FEW-SHOT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/few-shot)**
+ **Step1** Enter `DeepKE/example/re/few-shot`. Download the dataset.
- **Step1** Enter `DeepKE/example/re/few-shot`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ ```bash
+ wget 120.27.214.45/Data/re/few_shot/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
- **Step 2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
+ **Step 2** Training
+ - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
+
```bash
python run.py
```
-
+
**Step3** Prediction
-
+
+ ```bash
+ python predict.py
+ ```
+
+ - **[DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**
+
+ **Step1** Enter `DeepKE/example/re/document`. Download the dataset.
+
+ ```bash
+ wget 120.27.214.45/Data/re/document/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2** Training
+
+ - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
+
+ ```bash
+ python run.py
+ ```
+
+ **Step3** Prediction
+
```bash
python predict.py
```
- - **[DOCUMENT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/document)**
-
- Download the model `train_distant.json` from [*Google Drive*](https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw) to `data/`.
-
- **Step1** Enter `DeepKE/example/re/document`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
-
- **Step2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
-
- ```bash
- python run.py
- ```
-
- **Step3** Prediction
-
- ```bash
- python predict.py
- ```
-
-#### 3. Attribute Extraction
+## 3. Attribute Extraction
- Attribute extraction is to extract attributes for entities in a unstructed text.
@@ -243,25 +296,33 @@ python predict.py
| 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 |
- Read the detailed process in specific README
- - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/ae/standard)**
+ - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**
- **Step1** Enter the `DeepKE/example/ae/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter the `DeepKE/example/ae/standard` folder. Download the dataset.
- **Step2** Training
+ ```bash
+ wget 120.27.214.45/Data/ae/standard/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+ **Step2** Training
+
+ The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+
```bash
python run.py
```
-
+
**Step3** Prediction
-
+
```bash
python predict.py
```
+
-
-## Notebook Tutorial
+# Notebook Tutorial
This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. Users can study *DeepKE* with them.
@@ -297,7 +358,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
-## Tips
+# Tips
1. Using nearest mirror, like [THU](https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/) in China, will speed up the installation of *Anaconda*.
2. Using nearest mirror, like [aliyun](http://mirrors.aliyun.com/pypi/simple/) in China, will speed up `pip install XXX`.
@@ -307,7 +368,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
-## Developers
+# Developers
Zhejiang University: Ningyu Zhang, Liankuan Tao, Haiyang Yu, Xiang Chen, Xin Xu, Xi Tian, Lei Li, Zhoubo Li, Shumin Deng, Yunzhi Yao, Hongbin Ye, Xin Xie, Guozhou Zheng, Huajun Chen
diff --git a/README_CN.md b/README_CN.md
index bbfcede..3cbd705 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -47,9 +47,9 @@ DeepKE包括了三个模块,可以进行命名实体识别、关系抽取以
DeepKE支持pip安装使用,以常规全监督设定关系抽取为例,经过以下五个步骤就可以实现一个常规关系抽取模型
-**Step 1** 下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```(别忘记star和fork哈!!!)
+**Step 1**:下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```(别忘记star和fork哈!!!)
-**Step 2** 使用anaconda创建虚拟环境,进入虚拟环境(提供Dockerfile源码可自行创建镜像,位于docker文件夹中)
+**Step 2**:使用anaconda创建虚拟环境,进入虚拟环境(提供Dockerfile源码可自行创建镜像,位于docker文件夹中)
```
conda create -n deepke python=3.8
@@ -70,24 +70,26 @@ python setup.py install
python setup.py develop
```
-**Step 3** 进入任务文件夹,以常规关系抽取为例
+**Step 3** :进入任务文件夹,以常规关系抽取为例
```
cd DeepKE/example/re/standard
```
-**Step 4** 模型训练,训练用到的参数可在conf文件夹内修改
+**Step 4** :模型训练,训练用到的参数可在conf文件夹内修改
```
python run.py
```
-**Step 5** 模型预测。预测用到的参数可在conf文件夹内修改
+**Step 5** :模型预测。预测用到的参数可在conf文件夹内修改
```
python predict.py
```
+
+
### 环境依赖
> python == 3.8
@@ -118,11 +120,19 @@ python predict.py
| 秦始皇兵马俑位于陕西省西安市,1961年被国务院公布为第一批全国重点文物保护单位,是世界八大奇迹之一。 | 秦始皇 | 陕西省,西安市 | 国务院 |
- 具体流程请进入详细的README中
- - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ner/standard)**
+ - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)**
- **Step1**: 进入`DeepKE/example/ner/standard`,数据集和参数配置可以分别在`data`和`conf`文件夹中修改;
+ **Step1**: 进入`DeepKE/example/ner/standard`,下载数据集
- **Step2**: 模型训练
+ ```bash
+ wget 120.27.214.45/Data/ner/standard/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**: 模型训练
+
+ 数据集和参数配置可以分别在`data`和`conf`文件夹中修改
```
python run.py
@@ -135,9 +145,17 @@ python predict.py
- **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)**
- **Step1**: 进入`DeepKE/example/ner/few-shot`,模型加载和保存位置以及参数配置可以在`conf`文件夹中修改;
+ **Step1**: 进入`DeepKE/example/ner/few-shot`,下载数据集
- **Step2**:模型训练,默认使用`CoNLL-2003`数据集进行训练
+ ```bash
+ wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**:低资源场景下训练模型
+
+ 模型加载和保存位置以及参数配置可以在`conf`文件夹中修改
```
python run.py +train=few_shot
@@ -162,45 +180,71 @@ python predict.py
| 提起杭州的美景,西湖总是第一个映入脑海的词语。 | 所在城市 | 西湖 | 8 | 杭州 | 2 |
- 具体流程请进入详细的README中,RE包括了以下三个子功能
- - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/re/standard)**
+ - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)**
- **Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
+ **Step1**:进入`DeepKE/example/re/standard`,下载数据集
+
+ ```bash
+ wget 120.27.214.45/Data/re/standard/data.tar.gz
- **Step2**:模型训练
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**:模型训练
+ 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+
```
python run.py
```
-
+
**Step3**:模型预测
-
+
```
python predict.py
```
- - **[少样本FEW-SHOT](https://github.com/zjunlp/deepke/blob/main/example/re/few-shot)**
+ - **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
- **Step1**:进入`DeepKE/example/re/few-shot`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
-
- **Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
+ **Step1**:进入`DeepKE/example/re/few-shot`,下载数据集
+
+ ```bash
+ wget 120.27.214.45/Data/re/few_shot/data.tar.gz
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**:模型训练
+
+ - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+
+ - 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置
+
```
python run.py
```
-
+
**Step3**:模型预测
-
+
```
python predict.py
```
-
- - **[文档级DOCUMENT](https://github.com/zjunlp/deepke/blob/main/example/re/document)**
- ```train_distant.json```由于文件太大,请自行从Google Drive上下载到data/目录下;
+
+ - **[文档级DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**
+
+ **Step1**:进入`DeepKE/example/re/document`,下载数据集
+
+ ```bash
+ wget 120.27.214.45/Data/re/document/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**:模型训练
+
+ - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+ - 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
- **Step1**:进入`DeepKE/example/re/document`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
-
- **Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
-
```
python run.py
```
@@ -221,22 +265,32 @@ python predict.py
| 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 |
- 具体流程请进入详细的README中
- - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ae/standard)**
+ - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**
- **Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
+ **Step1**:进入`DeepKE/example/re/standard`,下载数据集
- **Step2**:模型训练
+ ```bash
+ wget 120.27.214.45/Data/ae/standard/data.tar.gz
+
+ tar -xzvf data.tar.gz
+ ```
+
+ **Step2**:模型训练
+ 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
+
```
python run.py
```
**Step3**:模型预测
-
+
```
python predict.py
```
+
+
### Notebook教程
本工具提供了若干Notebook和Google Colab教程,用户可针对性调试学习。