BSNLP Named Entity Recognition

The task focuses on cross-lingual, document-level extraction of named entities — the systems should recognize, classify, and extract all named entity mentions in a document; detecting the position of each named entity mention is not required.

Identifier	Task Type	Metric	License	Website	Code	Download
BSNLP	Named Entity Recognition	F1 (macro)

Data Source

The input text collection consists of sets of documents retrieved from the Web, each set being about a certain entity or event.

Data Description

#	Train	Dev	Test
Examples	724	182	301

High-level characteristics

#Documents: 1,207
#Sentences: 18,206
#Tokens: 431,592

Topic Distribution

Topic	Examples
Brexit	598
Covid19	151
USElection2020	150
NordStream	130
AsiaBibi	94
Ryanair	84
Total	1,207

Label Distribution

	train	validation	test
O	0.908	0.908	0.929
LOC	0.029	0.027	0.020
ORG	0.022	0.023	0.012
PER	0.020	0.021	0.025
EVT	0.008	0.008	0.007
PRO	0.005	0.005	0.004

Vocabulary Overlap

Number of common words in the row and column divided by the total number of unique words in the row.

	train	validation	test
train	1.000	0.773	0.417
validation	0.356	1.000	0.247
test	0.434	0.560	1.000

Example

{
	"tokens":
      ["Меркел","и","Путин","обсъдиха","реализацията","на","проекта","\"","Северен","поток","-","2","\"","Канцлерът","на","Германия","Ангела","Меркел","и","президентът","на","Русия","Владимир","Путин","са","обсъдили","по","телефона","реализацията","на","проекта","\"","Северен","поток","-","2","\"",".","Както","съобщава","пресслужбата","на","Кремъл","лидерите","са","потвърдили","позициите","на","страните","относно","разширението","на","действащия","магистрален","газопровод",".","По","-","рано","компанията","\"","Норд","стрим","\"",",","която","води","строителството",",","получи","разрешения","от","Германия","и","Финландия",".","Швеция","и","Дания","също","се","канят","да","предоставят","своите","териториални","води","за","този","грандиозен","строеж",".","По","дъното","на","Балтийско","море","\"","Северен","поток","-","2","\"","ще","свърже","директно","Русия","и","Германия",".","Годишно","по","него","ще","минават","55","милиарда","кум",".","м","газ",".","Превод",":","Блиц",".","Публикувай","директно","сам","без","цензура","във","\"","Фейсбук","\"","групата","За","реклама","виж","-"],
	"ner_tags":
      ["B-PER","O","B-PER","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","B-LOC","B-PER","I-PER","O","O","O","B-LOC","B-PER","I-PER","O","O","O","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","O","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","I-ORG","O","O","O","O","O","O","O","O","O","B-LOC","O","B-LOC","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-LOC","I-LOC","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","O","O","O","O","O","O","O","O","B-PRO","O","O","O","O","O","O"],
	"ID":"NordStream-00229243",
	"langs":["bg"]
}

Citation

[1] The First Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2017) (Piskorski et al., BSNLP 2019)

@inproceedings{piskorski-etal-2017-first,
    title = "The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Pivovarova, Lidia  and
      {\v{S}}najder, Jan  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 6th Workshop on {B}alto-{S}lavic Natural Language Processing",
    month = apr,
    year = "2017",
    address = "Valencia, Spain",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-1412",
    doi = "10.18653/v1/W17-1412",
    pages = "76--85",
}

[2] The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2019) (Piskorski et al., BSNLP 2019)

@inproceedings{piskorski-etal-2019-second,
    title = "The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Laskova, Laska  and
      Marci{\'n}czuk, Micha{\l}  and
      Pivovarova, Lidia  and
      P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-3709",
    doi = "10.18653/v1/W19-3709",
    pages = "63--74",
}

[3] Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., BSNLP 2021)

@inproceedings{piskorski-etal-2021-slav,
    title = "Slav-{NER}: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Babych, Bogdan  and
      Kancheva, Zara  and
      Kanishcheva, Olga  and
      Lebedeva, Maria  and
      Marci{\'n}czuk, Micha{\l}  and
      Nakov, Preslav  and
      Osenova, Petya  and
      Pivovarova, Lidia  and
      Pollak, Senja  and
      P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
      Radev, Ivaylo  and
      Robnik-Sikonja, Marko  and
      Starko, Vasyl  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Kiyv, Ukraine",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bsnlp-1.15",
    pages = "122--133",
}

License

None.