From 1e00e429b4018346c99d810841b40e38881a5bf1 Mon Sep 17 00:00:00 2001 From: xxupiano Date: Tue, 30 Nov 2021 14:26:22 +0800 Subject: [PATCH] Update README --- README.md | 195 +++++++++++++++++++++++++++++++++------------------ README_CN.md | 118 ++++++++++++++++++++++--------- 2 files changed, 214 insertions(+), 99 deletions(-) diff --git a/README.md b/README.md index 797e8b1..11a79b8 100644 --- a/README.md +++ b/README.md @@ -18,22 +18,23 @@
-

-

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

-

+

+

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Base Population

+

+ DeepKE is a knowledge extraction toolkit supporting **low-resource** and **document-level** scenarios. It provides three functions based on **PyTorch**, including **Named Entity Recognition**, **Relation Extraciton** and **Attribute Extraction**.
-## Prediction +# Prediction There is a demonstration of prediction.

-## Model Framework +# Model Framework

@@ -42,23 +43,25 @@ There is a demonstration of prediction.
Figure 1: The framework of DeepKE

-- DeepKE contains three modules for **named entity recognition**, **relation extraction** and **attribute extraction**, the three tasks respectively. -- Each module has its own submodules. For example, there are **standard**, **document-level** and **few-shot** submodules in the attribute extraction modular. -- Each submodule compose of three parts: a **collection of tools**, which can function as tokenizer, dataloader, preprocessor and the like, a **encoder** and a part for **training and prediction** +- DeepKE contains a unified framework for **named entity recognition**, **relation extraction** and **attribute extraction**, the three knowledge extraction functions. +- Each task can be implemented in different scenarios. For example, we can achieve relation extraction in **standard**, **low-resource (few-shot)** and **document-level** settings. +- Each application scenario comprises of three components: **Data** including Tokenizer, Preprocessor and Loader, **Model** including Module, Encoder and Forwarder, **Core** including Training, Evaluation and Prediction.
-## Quickstart +# Quickstart -*DeepKE* is supported `pip install deepke`. Take the fully supervised attribute extraction for example. +*DeepKE* is supported `pip install deepke`. Take the fully supervised relation extraction for example.
(Please star✨ and fork :memo: !!!) -**Step1** Download basic codes `git clone https://github.com/zjunlp/DeepKE.git ` (Please star✨ and fork :memo:) +**Step1** Download the basic codes -**Step2** Create a virtual environment using`Anaconda` and enter it. - - We also provide dockerfile source code, you can create your own image, which is located in the docker folder. +```bash +git clone https://github.com/zjunlp/DeepKE.git +``` +**Step2** Create a virtual environment using `Anaconda` and enter it.
+We also provide dockerfile source code, you can create your own image, which is located in the `docker` folder. ```bash conda create -n deepke python=3.8 @@ -86,21 +89,29 @@ conda activate deepke cd DeepKE/example/re/standard ``` -**Step4** Training (Parameters for training can be changed in the `conf` folder) +**Step4** Download the dataset + +```bash +wget 120.27.214.45/Data/re/standard/data.tar.gz + +tar -xzvf data.tar.gz +``` + +**Step5** Training (Parameters for training can be changed in the `conf` folder) ```bash python run.py ``` -**Step5** Prediction (Parameters for prediction can be changed in the `conf` folder) +**Step6** Prediction (Parameters for prediction can be changed in the `conf` folder) ```bash python predict.py ``` +
- -### Requirements +# Requirements > python == 3.8 @@ -117,9 +128,11 @@ python predict.py - opt-einsum==3.3.0 - ujson -### Introduction of Three Functions +
-#### 1. Named Entity Recognition +# Introduction of Three Functions + +## 1. Named Entity Recognition - Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc. @@ -134,10 +147,18 @@ python predict.py - Read the detailed process in specific README - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)** - **Step1** Enter `DeepKE/example/ner/standard`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter `DeepKE/example/ner/standard`. Download the dataset. - **Step2** Training + ```bash + wget 120.27.214.45/Data/ner/standard/data.tar.gz + + tar -xzvf data.tar.gz + ``` + **Step2** Training
+ + The dataset and parameters can be customized in the `data` folder and `conf` folder respectively. + ```bash python run.py ``` @@ -147,26 +168,34 @@ python predict.py ```bash python predict.py ``` - + - **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)** - **Step1** Enter `DeepKE/example/ner/few-shot`. The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder.
- - **Step2** Training with default `CoNLL-2003` dataset. + **Step1** Enter `DeepKE/example/ner/few-shot`. Download the dataset. + ```bash + wget 120.27.214.45/Data/ner/few_shot/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2** Training in the low-resouce setting
+ + The directory where the model is loaded and saved and the configuration parameters can be cusomized in the `conf` folder. + ```bash python run.py +train=few_shot ``` - - Users can modify `load_path` in `conf/train/few_shot.yaml` with the use of existing loaded model.
- + + Users can modify `load_path` in `conf/train/few_shot.yaml` to use existing loaded model.
+ **Step3** Add `- predict` to `conf/config.yaml`, modify `loda_path` as the model path and `write_path` as the path where the predicted results are saved in `conf/predict.yaml`, and then run `python predict.py` - + ```bash python predict.py ``` -#### 2. Relation Extraction +## 2. Relation Extraction - Relationship extraction is the task of extracting semantic relations between entities from a unstructured text. @@ -180,12 +209,20 @@ python predict.py - Read the detailed process in specific README - - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/standard)** + - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)** - **Step1** Enter the `DeepKE/example/re/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter the `DeepKE/example/re/standard` folder. Download the dataset. - **Step2** Training + ```bash + wget 120.27.214.45/Data/re/standard/data.tar.gz + + tar -xzvf data.tar.gz + ``` + **Step2** Training
+ + The dataset and parameters can be customized in the `data` folder and `conf` folder respectively. + ```bash python run.py ``` @@ -195,42 +232,58 @@ python predict.py ```bash python predict.py ``` + + - **[FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)** - - **[FEW-SHOT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/few-shot)** + **Step1** Enter `DeepKE/example/re/few-shot`. Download the dataset. - **Step1** Enter `DeepKE/example/re/few-shot`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ ```bash + wget 120.27.214.45/Data/re/few_shot/data.tar.gz + + tar -xzvf data.tar.gz + ``` - **Step 2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`.
+ **Step 2** Training
+ - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively. + - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. + ```bash python run.py ``` - + **Step3** Prediction - + + ```bash + python predict.py + ``` + + - **[DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**
+ + **Step1** Enter `DeepKE/example/re/document`. Download the dataset. + + ```bash + wget 120.27.214.45/Data/re/document/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2** Training
+ + - The dataset and parameters can be customized in the `data` folder and `conf` folder respectively. + - Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. + + ```bash + python run.py + ``` + + **Step3** Prediction + ```bash python predict.py ``` - - **[DOCUMENT](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/re/document)**
- - Download the model `train_distant.json` from [*Google Drive*](https://drive.google.com/drive/folders/1c5-0YwnoJx8NS6CV2f-NoTHR__BdkNqw) to `data/`. - - **Step1** Enter `DeepKE/example/re/document`. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
- - **Step2** Training. Start with the model trained last time: modify `train_from_saved_model` in `conf/train.yaml`as the path where the model trained last time was saved. And the path saving logs generated in training can be customized by `log_dir`. - - ```bash - python run.py - ``` - - **Step3** Prediction - - ```bash - python predict.py - ``` - -#### 3. Attribute Extraction +## 3. Attribute Extraction - Attribute extraction is to extract attributes for entities in a unstructed text. @@ -243,25 +296,33 @@ python predict.py | 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 | - Read the detailed process in specific README - - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/deepke/blob/test_new_deepke/example/ae/standard)** + - **[STANDARD (Fully Supervised)](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)** - **Step1** Enter the `DeepKE/example/ae/standard` folder. The dataset and parameters can be customized in the `data` folder and `conf` folder respectively.
+ **Step1** Enter the `DeepKE/example/ae/standard` folder. Download the dataset. - **Step2** Training + ```bash + wget 120.27.214.45/Data/ae/standard/data.tar.gz + + tar -xzvf data.tar.gz + ``` + **Step2** Training
+ + The dataset and parameters can be customized in the `data` folder and `conf` folder respectively. + ```bash python run.py ``` - + **Step3** Prediction - + ```bash python predict.py ``` +
- -## Notebook Tutorial +# Notebook Tutorial This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. Users can study *DeepKE* with them. @@ -297,7 +358,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
-## Tips +# Tips 1. Using nearest mirror, like [THU](https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/) in China, will speed up the installation of *Anaconda*. 2. Using nearest mirror, like [aliyun](http://mirrors.aliyun.com/pypi/simple/) in China, will speed up `pip install XXX`. @@ -307,7 +368,7 @@ This toolkit provides many `Jupyter Notebook` and `Google Colab` tutorials. User
-## Developers +# Developers Zhejiang University: Ningyu Zhang, Liankuan Tao, Haiyang Yu, Xiang Chen, Xin Xu, Xi Tian, Lei Li, Zhoubo Li, Shumin Deng, Yunzhi Yao, Hongbin Ye, Xin Xie, Guozhou Zheng, Huajun Chen diff --git a/README_CN.md b/README_CN.md index bbfcede..3cbd705 100644 --- a/README_CN.md +++ b/README_CN.md @@ -47,9 +47,9 @@ DeepKE包括了三个模块,可以进行命名实体识别、关系抽取以 DeepKE支持pip安装使用,以常规全监督设定关系抽取为例,经过以下五个步骤就可以实现一个常规关系抽取模型 -**Step 1** 下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```(别忘记star和fork哈!!!) +**Step 1**:下载代码 ```git clone https://github.com/zjunlp/DeepKE.git```(别忘记star和fork哈!!!) -**Step 2** 使用anaconda创建虚拟环境,进入虚拟环境(提供Dockerfile源码可自行创建镜像,位于docker文件夹中) +**Step 2**:使用anaconda创建虚拟环境,进入虚拟环境(提供Dockerfile源码可自行创建镜像,位于docker文件夹中) ``` conda create -n deepke python=3.8 @@ -70,24 +70,26 @@ python setup.py install python setup.py develop ``` -**Step 3** 进入任务文件夹,以常规关系抽取为例 +**Step 3** :进入任务文件夹,以常规关系抽取为例 ``` cd DeepKE/example/re/standard ``` -**Step 4** 模型训练,训练用到的参数可在conf文件夹内修改 +**Step 4** :模型训练,训练用到的参数可在conf文件夹内修改 ``` python run.py ``` -**Step 5** 模型预测。预测用到的参数可在conf文件夹内修改 +**Step 5** :模型预测。预测用到的参数可在conf文件夹内修改 ``` python predict.py ``` +
+ ### 环境依赖 > python == 3.8 @@ -118,11 +120,19 @@ python predict.py | 秦始皇兵马俑位于陕西省西安市,1961年被国务院公布为第一批全国重点文物保护单位,是世界八大奇迹之一。 | 秦始皇 | 陕西省,西安市 | 国务院 | - 具体流程请进入详细的README中 - - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ner/standard)** + - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ner/standard)** - **Step1**: 进入`DeepKE/example/ner/standard`,数据集和参数配置可以分别在`data`和`conf`文件夹中修改;
+ **Step1**: 进入`DeepKE/example/ner/standard`,下载数据集 - **Step2**: 模型训练 + ```bash + wget 120.27.214.45/Data/ner/standard/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2**: 模型训练
+ + 数据集和参数配置可以分别在`data`和`conf`文件夹中修改 ``` python run.py @@ -135,9 +145,17 @@ python predict.py - **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)** - **Step1**: 进入`DeepKE/example/ner/few-shot`,模型加载和保存位置以及参数配置可以在`conf`文件夹中修改;
+ **Step1**: 进入`DeepKE/example/ner/few-shot`,下载数据集 - **Step2**:模型训练,默认使用`CoNLL-2003`数据集进行训练 + ```bash + wget 120.27.214.45/Data/ner/few_shot/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2**:低资源场景下训练模型
+ + 模型加载和保存位置以及参数配置可以在`conf`文件夹中修改 ``` python run.py +train=few_shot @@ -162,45 +180,71 @@ python predict.py | 提起杭州的美景,西湖总是第一个映入脑海的词语。 | 所在城市 | 西湖 | 8 | 杭州 | 2 | - 具体流程请进入详细的README中,RE包括了以下三个子功能 - - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/re/standard)** + - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/re/standard)** - **Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
+ **Step1**:进入`DeepKE/example/re/standard`,下载数据集 + + ```bash + wget 120.27.214.45/Data/re/standard/data.tar.gz - **Step2**:模型训练 + tar -xzvf data.tar.gz + ``` + + **Step2**:模型训练
+ 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改 + ``` python run.py ``` - + **Step3**:模型预测 - + ``` python predict.py ``` - - **[少样本FEW-SHOT](https://github.com/zjunlp/deepke/blob/main/example/re/few-shot)** + - **[少样本FEW-SHOT](https://github.com/zjunlp/DeepKE/tree/main/example/re/few-shot)** - **Step1**:进入`DeepKE/example/re/few-shot`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
- - **Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置;
+ **Step1**:进入`DeepKE/example/re/few-shot`,下载数据集 + + ```bash + wget 120.27.214.45/Data/re/few_shot/data.tar.gz + tar -xzvf data.tar.gz + ``` + + **Step2**:模型训练
+ + - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改 + + - 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置 + ``` python run.py ``` - + **Step3**:模型预测 - + ``` python predict.py ``` - - - **[文档级DOCUMENT](https://github.com/zjunlp/deepke/blob/main/example/re/document)**
- ```train_distant.json```由于文件太大,请自行从Google Drive上下载到data/目录下;
+ + - **[文档级DOCUMENT](https://github.com/zjunlp/DeepKE/tree/main/example/re/document)**
+ + **Step1**:进入`DeepKE/example/re/document`,下载数据集 + + ```bash + wget 120.27.214.45/Data/re/document/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2**:模型训练
+ + - 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改 + - 如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置; - **Step1**:进入`DeepKE/example/re/document`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
- - **Step2**:模型训练,如需从上次训练的模型开始训练:设置`conf/train.yaml`中的`train_from_saved_model`为上次保存模型的路径,每次训练的日志默认保存在根目录,可用`log_dir`来配置; - ``` python run.py ``` @@ -221,22 +265,32 @@ python predict.py | 2014年10月1日许鞍华执导的电影《黄金时代》上映 | 上映时间 | 黄金时代 | 19 | 2014年10月1日 | 0 | - 具体流程请进入详细的README中 - - **[常规全监督STANDARD](https://github.com/zjunlp/deepke/blob/main/example/ae/standard)** + - **[常规全监督STANDARD](https://github.com/zjunlp/DeepKE/tree/main/example/ae/standard)** - **Step1**:进入`DeepKE/example/re/standard`,数据集和参数配置可以分别进入`data`和`conf`文件夹中修改;
+ **Step1**:进入`DeepKE/example/re/standard`,下载数据集 - **Step2**:模型训练 + ```bash + wget 120.27.214.45/Data/ae/standard/data.tar.gz + + tar -xzvf data.tar.gz + ``` + + **Step2**:模型训练
+ 数据集和参数配置可以分别进入`data`和`conf`文件夹中修改 + ``` python run.py ``` **Step3**:模型预测 - + ``` python predict.py ``` +
+ ### Notebook教程 本工具提供了若干Notebook和Google Colab教程,用户可针对性调试学习。