The task focuses on cross-lingual, document-level extraction of named entities — the systems should recognize, classify, and extract all named entity mentions in a document; detecting the position of each named entity mention is not required.
The input text collection consists of sets of documents retrieved from the Web, each set being about a certain entity or event.
# | Train | Dev | Test |
---|---|---|---|
Examples | 724 | 182 | 301 |
Topic | Examples |
---|---|
Brexit | 598 |
Covid19 | 151 |
USElection2020 | 150 |
NordStream | 130 |
AsiaBibi | 94 |
Ryanair | 84 |
Total | 1,207 |
train | validation | test | |
---|---|---|---|
O | 0.908 | 0.908 | 0.929 |
LOC | 0.029 | 0.027 | 0.020 |
ORG | 0.022 | 0.023 | 0.012 |
PER | 0.020 | 0.021 | 0.025 |
EVT | 0.008 | 0.008 | 0.007 |
PRO | 0.005 | 0.005 | 0.004 |
Number of common words in the row and column divided by the total number of unique words in the row.
train | validation | test | |
---|---|---|---|
train | 1.000 | 0.773 | 0.417 |
validation | 0.356 | 1.000 | 0.247 |
test | 0.434 | 0.560 | 1.000 |
{
"tokens":
["Меркел","и","Путин","обсъдиха","реализацията","на","проекта","\"","Северен","поток","-","2","\"","Канцлерът","на","Германия","Ангела","Меркел","и","президентът","на","Русия","Владимир","Путин","са","обсъдили","по","телефона","реализацията","на","проекта","\"","Северен","поток","-","2","\"",".","Както","съобщава","пресслужбата","на","Кремъл","лидерите","са","потвърдили","позициите","на","страните","относно","разширението","на","действащия","магистрален","газопровод",".","По","-","рано","компанията","\"","Норд","стрим","\"",",","която","води","строителството",",","получи","разрешения","от","Германия","и","Финландия",".","Швеция","и","Дания","също","се","канят","да","предоставят","своите","териториални","води","за","този","грандиозен","строеж",".","По","дъното","на","Балтийско","море","\"","Северен","поток","-","2","\"","ще","свърже","директно","Русия","и","Германия",".","Годишно","по","него","ще","минават","55","милиарда","кум",".","м","газ",".","Превод",":","Блиц",".","Публикувай","директно","сам","без","цензура","във","\"","Фейсбук","\"","групата","За","реклама","виж","-"],
"ner_tags":
["B-PER","O","B-PER","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","B-LOC","B-PER","I-PER","O","O","O","B-LOC","B-PER","I-PER","O","O","O","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","O","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","I-ORG","O","O","O","O","O","O","O","O","O","B-LOC","O","B-LOC","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-LOC","I-LOC","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","O","O","O","O","O","O","O","O","B-PRO","O","O","O","O","O","O"],
"ID":"NordStream-00229243",
"langs":["bg"]
}
[1] The First Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2017) (Piskorski et al., BSNLP 2019)
@inproceedings{piskorski-etal-2017-first,
title = "The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in {S}lavic Languages",
author = "Piskorski, Jakub and
Pivovarova, Lidia and
{\v{S}}najder, Jan and
Steinberger, Josef and
Yangarber, Roman",
booktitle = "Proceedings of the 6th Workshop on {B}alto-{S}lavic Natural Language Processing",
month = apr,
year = "2017",
address = "Valencia, Spain",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W17-1412",
doi = "10.18653/v1/W17-1412",
pages = "76--85",
}
[2] The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2019) (Piskorski et al., BSNLP 2019)
@inproceedings{piskorski-etal-2019-second,
title = "The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
author = "Piskorski, Jakub and
Laskova, Laska and
Marci{\'n}czuk, Micha{\l} and
Pivovarova, Lidia and
P{\v{r}}ib{\'a}{\v{n}}, Pavel and
Steinberger, Josef and
Yangarber, Roman",
booktitle = "Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing",
month = aug,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W19-3709",
doi = "10.18653/v1/W19-3709",
pages = "63--74",
}
[3] Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., BSNLP 2021)
@inproceedings{piskorski-etal-2021-slav,
title = "Slav-{NER}: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
author = "Piskorski, Jakub and
Babych, Bogdan and
Kancheva, Zara and
Kanishcheva, Olga and
Lebedeva, Maria and
Marci{\'n}czuk, Micha{\l} and
Nakov, Preslav and
Osenova, Petya and
Pivovarova, Lidia and
Pollak, Senja and
P{\v{r}}ib{\'a}{\v{n}}, Pavel and
Radev, Ivaylo and
Robnik-Sikonja, Marko and
Starko, Vasyl and
Steinberger, Josef and
Yangarber, Roman",
booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
month = apr,
year = "2021",
address = "Kiyv, Ukraine",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.bsnlp-1.15",
pages = "122--133",
}
None.