This commit is contained in:
xxupiano 2021-09-25 17:30:49 +08:00
parent 66137ed549
commit acae147b2b
5 changed files with 1336092 additions and 324947 deletions

View File

@ -0,0 +1,28 @@
## People's Daily(人民日报) dataset
### Task
Named Entity Recognition
### Description
**Tags**: LOC(地名), ORG(机构名), PER(人名)
**Tag Strategy**BIO
**Split**: '*space*' (北 B-LOC)
**Data Size**:
Train data set ( [example.train](example.train) ):
|句数|字符数|LOC数|ORG数|PER数|
|:-:|:-:|:-:|:-:|:-:|
|20864|979180|16571|9277|8144|
Dev data set ( [example.dev](example.dev) ):
|句数|字符数|LOC数|ORG数|PER数|
|:-:|:-:|:-:|:-:|:-:|
|2318|109870|1951|984|884|
Test data set ( [example.test](example.test) )
|句数|字符数|LOC数|ORG数|PER数|
|:-:|:-:|:-:|:-:|:-:|
|4636|219197|3658|2185|1864|
**Reference**:
<https://github.com/zjy-ucas/ChineseNER>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -133,13 +133,13 @@ class Ner:
tmp.append(word)
else:
wordstype = result[i-1][1][2:]
tag[wordstype].append(' '.join(tmp))
tag[wordstype].append(''.join(tmp))
tmp.clear()
tmp.append(word)
elif i==len(result)-1:
tmp.append(word)
wordstype = result[i][1][2:]
tag[wordstype].append(' '.join(tmp))
tag[wordstype].append(''.join(tmp))
else:
tmp.append(word)