The WikiANN dataset (Pan et al. 2017) is a dataset with NER annotations for PER, ORG and LOC. It has been constructed using the linked entities in Wikipedia pages for 282 different languages.
Identifier | Task Type | Metric | License | Website | Code | Download |
---|---|---|---|---|---|---|
U.Dep | POS Tagging | F1 (macro) | CC BY-SA 3.0 |
The dataset is constructed using the linked entities in Wikipedia pages.
# | Train | Dev | Test |
---|---|---|---|
Examples | 16,237 | 7,029 | 7,263 |
train | validation | test | |
---|---|---|---|
O | 0.786 | 0.789 | 0.791 |
LOC | 0.100 | 0.096 | 0.096 |
PER | 0.060 | 0.060 | 0.059 |
ORG | 0.054 | 0.055 | 0.055 |
Number of common words in the row and column divided by the total number of unique words in the row.
  | train | validation | test |
---|---|---|---|
train | 1.000 | 0.142 | 0.144 |
validation | 0.075 | 1.000 | 0.099 |
test | 0.076 | 0.098 | 1.000 |
bg:Видът O
bg:е O
bg:разпространен O
bg:в O
bg:Бурунди B-LOC
bg:, O
bg:Демократична B-LOC
bg:република I-LOC
bg:Конго I-LOC
bg:, O
bg:Замбия B-LOC
bg:и O
bg:Танзания B-LOC
bg:. O
[1] Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, and Heng Ji. 2017. Cross-lingual Name Tagging and Linking for 282 Languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1946–1958, Vancouver, Canada. Association for Computational Linguistics.
@inproceedings{pan-etal-2017-cross,
title = "Cross-lingual Name Tagging and Linking for 282 Languages",
author = "Pan, Xiaoman and
Zhang, Boliang and
May, Jonathan and
Nothman, Joel and
Knight, Kevin and
Ji, Heng",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P17-1178",
doi = "10.18653/v1/P17-1178",
pages = "1946--1958",
}
Attribution-NonCommercial 4.0 International (Apache License 2.0). See the LICENSE file.