BSNLP Named Entity Recognition

The task focuses on cross-lingual, document-level extraction of named entities — the systems should recognize, classify, and extract all named entity mentions in a document; detecting the position of each named entity mention is not required.

Identifier Task Type Metric License Website Code Download
BSNLP Named Entity Recognition F1 (macro)

Data Source

The input text collection consists of sets of documents retrieved from the Web, each set being about a certain entity or event.

Data Description

# Train Dev Test
Examples 724 182 301

High-level characteristics

  • #Documents: 1,207
  • #Sentences: 18,206
  • #Tokens: 431,592

Topic Distribution

Topic Examples
Brexit 598
Covid19 151
USElection2020 150
NordStream 130
AsiaBibi 94
Ryanair 84
Total 1,207

Label Distribution

train validation test
O 0.908 0.908 0.929
LOC 0.029 0.027 0.020
ORG 0.022 0.023 0.012
PER 0.020 0.021 0.025
EVT 0.008 0.008 0.007
PRO 0.005 0.005 0.004

Vocabulary Overlap

Number of common words in the row and column divided by the total number of unique words in the row.

  train validation test
train 1.000 0.773 0.417
validation 0.356 1.000 0.247
test 0.434 0.560 1.000

Example

{
	"tokens":
      ["Меркел","и","Путин","обсъдиха","реализацията","на","проекта","\"","Северен","поток","-","2","\"","Канцлерът","на","Германия","Ангела","Меркел","и","президентът","на","Русия","Владимир","Путин","са","обсъдили","по","телефона","реализацията","на","проекта","\"","Северен","поток","-","2","\"",".","Както","съобщава","пресслужбата","на","Кремъл","лидерите","са","потвърдили","позициите","на","страните","относно","разширението","на","действащия","магистрален","газопровод",".","По","-","рано","компанията","\"","Норд","стрим","\"",",","която","води","строителството",",","получи","разрешения","от","Германия","и","Финландия",".","Швеция","и","Дания","също","се","канят","да","предоставят","своите","териториални","води","за","този","грандиозен","строеж",".","По","дъното","на","Балтийско","море","\"","Северен","поток","-","2","\"","ще","свърже","директно","Русия","и","Германия",".","Годишно","по","него","ще","минават","55","милиарда","кум",".","м","газ",".","Превод",":","Блиц",".","Публикувай","директно","сам","без","цензура","във","\"","Фейсбук","\"","групата","За","реклама","виж","-"],
	"ner_tags":
      ["B-PER","O","B-PER","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","B-LOC","B-PER","I-PER","O","O","O","B-LOC","B-PER","I-PER","O","O","O","O","O","O","O","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","O","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","I-ORG","O","O","O","O","O","O","O","O","O","B-LOC","O","B-LOC","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-LOC","I-LOC","O","B-PRO","I-PRO","I-PRO","I-PRO","O","O","O","O","B-LOC","O","B-LOC","O","O","O","O","O","O","O","O","O","O","O","O","O","O","O","B-ORG","O","O","O","O","O","O","O","O","B-PRO","O","O","O","O","O","O"],
	"ID":"NordStream-00229243",
	"langs":["bg"]
}

Citation

[1] The First Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2017) (Piskorski et al., BSNLP 2019)

@inproceedings{piskorski-etal-2017-first,
    title = "The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Pivovarova, Lidia  and
      {\v{S}}najder, Jan  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 6th Workshop on {B}alto-{S}lavic Natural Language Processing",
    month = apr,
    year = "2017",
    address = "Valencia, Spain",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-1412",
    doi = "10.18653/v1/W17-1412",
    pages = "76--85",
}

[2] The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., 2019) (Piskorski et al., BSNLP 2019)

@inproceedings{piskorski-etal-2019-second,
    title = "The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Laskova, Laska  and
      Marci{\'n}czuk, Micha{\l}  and
      Pivovarova, Lidia  and
      P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-3709",
    doi = "10.18653/v1/W19-3709",
    pages = "63--74",
}

[3] Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages (Piskorski et al., BSNLP 2021)

@inproceedings{piskorski-etal-2021-slav,
    title = "Slav-{NER}: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across {S}lavic Languages",
    author = "Piskorski, Jakub  and
      Babych, Bogdan  and
      Kancheva, Zara  and
      Kanishcheva, Olga  and
      Lebedeva, Maria  and
      Marci{\'n}czuk, Micha{\l}  and
      Nakov, Preslav  and
      Osenova, Petya  and
      Pivovarova, Lidia  and
      Pollak, Senja  and
      P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
      Radev, Ivaylo  and
      Robnik-Sikonja, Marko  and
      Starko, Vasyl  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Kiyv, Ukraine",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bsnlp-1.15",
    pages = "122--133",
}

License

None.