Update README

This commit is contained in:
xxupiano 2021-11-30 14:26:22 +08:00
parent c58b226929
commit 1e00e429b4
2 changed files with 214 additions and 99 deletions

195
README.md
View File

@ -18,22 +18,23 @@
<br>
<h2 align="center">
<p>A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population</p>
</h2>
<h1 align="center">
<p>A Deep Learning Based Knowledge Extraction Toolkit<br>for Knowledge Base Population</p>
</h1>
DeepKE is a knowledge extraction toolkit supporting **low-resource** and **document-level** scenarios. It provides three functions based on **PyTorch**, including **Named Entity Recognition**, **Relation Extraciton** and **Attribute Extraction**.
<br>
## Prediction
# Prediction
There is a demonstration of prediction.<br>
<img src="pics/demo.gif" width="636" height="494" align=center>
<br>
## Model Framework
# Model Framework
<h3 align="center">
<img src="pics/architectures.png">
@ -42,23 +43,25 @@ There is a demonstration of prediction.<br>
Figure 1: The framework of DeepKE
</p>
- DeepKE contains three modules for **named entity recognition**, **relation extraction** and **attribute extraction**, the three tasks respectively.
- Each module has its own submodules. For example, there are **standard**, **document-level** and **few-shot** submodules in the attribute extraction modular.
- Each submodule compose of three parts: a **collection of tools**, which can function as tokenizer, dataloader, preprocessor and the like, a **encoder** and a part for **training and prediction**
- DeepKE contains a unified framework for **named entity recognition**, **relation extraction** and **attribute extraction**, the three knowledge extraction functions.
- Each task can be implemented in different scenarios. For example, we can achieve relation extraction in **standard**, **low-resource (few-shot)** and **document-level** settings.
- Each application scenario comprises of three components: **Data** including Tokenizer, Preprocessor and Loader, **Model** including Module, Encoder and Forwarder, **Core** including Training, Evaluation and Prediction.
<br>
## Quickstart
# Quickstart
*DeepKE* is supported `pip install deepke`. Take the fully supervised attribute extraction for example.
*DeepKE* is supported `pip install deepke`. Take the fully supervised relation extraction for example. <br>(Please star✨ and fork :memo: !!!)
**Step1** Download basic codes `git clone https://github.com/zjunlp/DeepKE.git ` (Please star✨ and fork :memo:)
**Step1** Download the basic codes
**Step2** Create a virtual environment using`Anaconda` and enter it.
We also provide dockerfile source code, you can create your own image, which is located in the docker folder.
```bash
git clone https://github.com/zjunlp/DeepKE.git
```
**Step2** Create a virtual environment using `Anaconda` and enter it.<br>
We also provide dockerfile source code, you can create your own image, which is located in the `docker` folder.
```bash
conda create -n deepke python=3.8
@ -86,21 +89,29 @@ conda activate deepke
cd DeepKE/example/re/standard
```
**Step4** Training (Parameters for training can be changed in the `conf` folder)
**Step4** Download the dataset
```bash
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step5** Training (Parameters for training can be changed in the `conf` folder)
```bash
python run.py
```
**Step5** Prediction (Parameters for prediction can be changed in the `conf` folder)
**Step6** Prediction (Parameters for prediction can be changed in the `conf` folder)
```bash
python predict.py
```
<br>
### Requirements
# Requirements
> python == 3.8
@ -117,9 +128,11 @@ python predict.py
- opt-einsum==3.3.0
- ujson
### Introduction of Three Functions
<br>
#### 1. Named Entity Recognition
# Introduction of Three Functions
## 1. Named Entity Recognition
- Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.
@ -134,10 +147,18 @@ python predict.py
- Read the detailed process in specific README
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)**
**Step1** Enter `DeepKE/example/ner/standard`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
**Step1** Enter `DeepKE/example/ner/standard`. Download the dataset.
**Step2** Training
```bash
wget 120.27.214.45/Data/ner/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2** Training<br>
The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
```bash
python run.py
```
@ -147,26 +168,34 @@ python predict.py
```bash
python predict.py
```
- **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)**
**Step1** Enter `DeepKE/example/ner/few-shot`. The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.<br>
**Step2** Training with default `CoNLL-2003` dataset.
**Step1** Enter `DeepKE/example/ner/few-shot`. Download the dataset.
```bash
wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2** Training in the low-resouce setting <br>
The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.
```bash
python run.py +train=few_shot
```
Users can modify `load_path` in `conf/train/few_shot.yaml` with the use of existing loaded model.<br>
Users can modify `load_path` in `conf/train/few_shot.yaml` to use existing loaded model.<br>
**Step3** Add `- predict` to `conf/config.yaml`, modify `loda_path` as the model path and `write_path` as the path where the predicted results are saved in `conf/predict.yaml`, and then run `python predict.py`
```bash
python predict.py
```
#### 2. Relation Extraction
## 2. Relation Extraction
- Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.
@ -180,12 +209,20 @@ python predict.py
- Read the detailed process in specific README
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/standard)**
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)**
**Step1** Enter the `DeepKE/example/re/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
**Step1** Enter the `DeepKE/example/re/standard` folder. Download the dataset.
**Step2** Training
```bash
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2** Training<br>
The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
```bash
python run.py
```
@ -195,42 +232,58 @@ python predict.py
```bash
python predict.py
```
- **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
- **[FEW-SHOT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/few-shot)**
**Step1** Enter `DeepKE/example/re/few-shot`. Download the dataset.
**Step1** Enter `DeepKE/example/re/few-shot`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
```bash
wget 120.27.214.45/Data/re/few_shot/data.tar.gz
tar -xzvf data.tar.gz
```
**Step 2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. <br>
**Step 2** Training<br>
- The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
- Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
```bash
python run.py
```
**Step3** Prediction
```bash
python predict.py
```
- **[DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**<br>
**Step1** Enter `DeepKE/example/re/document`. Download the dataset.
```bash
wget 120.27.214.45/Data/re/document/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2** Training<br>
- The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
- Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
```bash
python run.py
```
**Step3** Prediction
```bash
python predict.py
```
- **[DOCUMENT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/document)**<br>
Download the model `train_distant.json` from [*Google Drive*](https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw) to `data/`.
**Step1** Enter `DeepKE/example/re/document`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
**Step2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
```bash
python run.py
```
**Step3** Prediction
```bash
python predict.py
```
#### 3. Attribute Extraction
## 3. Attribute Extraction
- Attribute extraction is to extract attributes for entities in a unstructed text.
@ -243,25 +296,33 @@ python predict.py
| 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 |
- Read the detailed process in specific README
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/ae/standard)**
- **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**
**Step1** Enter the `DeepKE/example/ae/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.<br>
**Step1** Enter the `DeepKE/example/ae/standard` folder. Download the dataset.
**Step2** Training
```bash
wget 120.27.214.45/Data/ae/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2** Training<br>
The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
```bash
python run.py
```
**Step3** Prediction
```bash
python predict.py
```
<br>
## Notebook Tutorial
# Notebook Tutorial
This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. Users can study *DeepKE* with them.
@ -297,7 +358,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
<br>
## Tips
# Tips
1. Using nearest mirror, like [THU](https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/) in China, will speed up the installation of *Anaconda*.
2. Using nearest mirror, like [aliyun](http://mirrors.aliyun.com/pypi/simple/) in China, will speed up `pip install XXX`.
@ -307,7 +368,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
<br>
## Developers
# Developers
Zhejiang University: Ningyu Zhang, Liankuan Tao, Haiyang Yu, Xiang Chen, Xin Xu, Xi Tian, Lei Li, Zhoubo Li, Shumin Deng, Yunzhi Yao, Hongbin Ye, Xin Xie, Guozhou Zheng, Huajun Chen

View File

@ -47,9 +47,9 @@ DeepKE包括了三个模块可以进行命名实体识别、关系抽取以
DeepKE支持pip安装使用以常规全监督设定关系抽取为例经过以下五个步骤就可以实现一个常规关系抽取模型
**Step 1** 下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```别忘记star和fork哈
**Step 1**下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```别忘记star和fork哈
**Step 2** 使用anaconda创建虚拟环境进入虚拟环境(提供Dockerfile源码可自行创建镜像位于docker文件夹中)
**Step 2**使用anaconda创建虚拟环境进入虚拟环境(提供Dockerfile源码可自行创建镜像位于docker文件夹中)
```
conda create -n deepke python=3.8
@ -70,24 +70,26 @@ python setup.py install
python setup.py develop
```
**Step 3** 进入任务文件夹,以常规关系抽取为例
**Step 3** 进入任务文件夹,以常规关系抽取为例
```
cd DeepKE/example/re/standard
```
**Step 4** 模型训练训练用到的参数可在conf文件夹内修改
**Step 4** 模型训练训练用到的参数可在conf文件夹内修改
```
python run.py
```
**Step 5** 模型预测。预测用到的参数可在conf文件夹内修改
**Step 5** 模型预测。预测用到的参数可在conf文件夹内修改
```
python predict.py
```
<br>
### 环境依赖
> python == 3.8
@ -118,11 +120,19 @@ python predict.py
| 秦始皇兵马俑位于陕西省西安市1961年被国务院公布为第一批全国重点文物保护单位是世界八大奇迹之一。 | 秦始皇 | 陕西省,西安市 | 国务院 |
- 具体流程请进入详细的README中
- **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ner/standard)**
- **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)**
**Step1**: 进入`DeepKE/example/ner/standard`,数据集和参数配置可以分别在`data`和`conf`文件夹中修改;<br>
**Step1**: 进入`DeepKE/example/ner/standard`下载数据集
**Step2**: 模型训练
```bash
wget 120.27.214.45/Data/ner/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2**: 模型训练<br>
数据集和参数配置可以分别在`data`和`conf`文件夹中修改
```
python run.py
@ -135,9 +145,17 @@ python predict.py
- **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)**
**Step1**: 进入`DeepKE/example/ner/few-shot`模型加载和保存位置以及参数配置可以在`conf`文件夹中修改;<br>
**Step1**: 进入`DeepKE/example/ner/few-shot`下载数据集
**Step2**:模型训练,默认使用`CoNLL-2003`数据集进行训练
```bash
wget 120.27.214.45/Data/ner/few_shot/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2**:低资源场景下训练模型<br>
模型加载和保存位置以及参数配置可以在`conf`文件夹中修改
```
python run.py +train=few_shot
@ -162,45 +180,71 @@ python predict.py
| 提起杭州的美景,西湖总是第一个映入脑海的词语。 | 所在城市 | 西湖 | 8 | 杭州 | 2 |
- 具体流程请进入详细的README中RE包括了以下三个子功能
- **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/re/standard)**
- **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)**
**Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;<br>
**Step1**:进入`DeepKE/example/re/standard`,下载数据集
```bash
wget 120.27.214.45/Data/re/standard/data.tar.gz
**Step2**:模型训练
tar -xzvf data.tar.gz
```
**Step2**:模型训练<br>
数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
```
python run.py
```
**Step3**:模型预测
```
python predict.py
```
- **[少样本FEW-SHOT](https://github.com/zjunlp/deepke/blob/main/example/re/few-shot)**
- **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)**
**Step1**:进入`DeepKE/example/re/few-shot`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;<br>
**Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;<br>
**Step1**:进入`DeepKE/example/re/few-shot`,下载数据集
```bash
wget 120.27.214.45/Data/re/few_shot/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2**:模型训练<br>
- 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
- 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置
```
python run.py
```
**Step3**:模型预测
```
python predict.py
```
- **[文档级DOCUMENT](https://github.com/zjunlp/deepke/blob/main/example/re/document)** <br>
```train_distant.json```由于文件太大请自行从Google Drive上下载到data/目录下;<br>
- **[文档级DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)** <br>
**Step1**:进入`DeepKE/example/re/document`,下载数据集
```bash
wget 120.27.214.45/Data/re/document/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2**:模型训练<br>
- 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
- 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
**Step1**:进入`DeepKE/example/re/document`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;<br>
**Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
```
python run.py
```
@ -221,22 +265,32 @@ python predict.py
| 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 |
- 具体流程请进入详细的README中
- **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ae/standard)**
- **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)**
**Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;<br>
**Step1**:进入`DeepKE/example/re/standard`下载数据集
**Step2**:模型训练
```bash
wget 120.27.214.45/Data/ae/standard/data.tar.gz
tar -xzvf data.tar.gz
```
**Step2**:模型训练<br>
数据集和参数配置可以分别进入`data`和`conf`文件夹中修改
```
python run.py
```
**Step3**:模型预测
```
python predict.py
```
<br>
### Notebook教程
本工具提供了若干Notebook和Google Colab教程用户可针对性调试学习。