Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 200 contributors producing more than 100 treebanks in over 70 languages.
Identifier | Task Type | Metric | License | Website | Code | Download |
---|---|---|---|---|---|---|
U.Dep | POS Tagging | F1 (macro) | CC BY-NC-SA 3.0 |
UD_Bulgarian-BTB is based on the HPSG-based BulTreeBank, created at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. The original consists of 215,000 tokens (over 15,000 sentences).
All the texts were processed automatically at tokenization, morphological and chunk level. Then the full syntactic analysis were performed manually by trained annotators.
# | Train | Dev | Test |
---|---|---|---|
Examples | 8,907 | 1,115 | 1,116 |
train | validation | test | |
---|---|---|---|
NOUN | 0.218 | 0.219 | 0.222 |
PUNCT | 0.141 | 0.141 | 0.144 |
ADP | 0.141 | 0.142 | 0.142 |
VERB | 0.110 | 0.111 | 0.108 |
ADJ | 0.087 | 0.087 | 0.088 |
PRON | 0.065 | 0.066 | 0.062 |
AUX | 0.056 | 0.055 | 0.056 |
PROPN | 0.055 | 0.052 | 0.051 |
ADV | 0.042 | 0.040 | 0.043 |
CCONJ | 0.031 | 0.032 | 0.030 |
DET | 0.015 | 0.018 | 0.017 |
NUM | 0.013 | 0.013 | 0.014 |
PART | 0.013 | 0.013 | 0.012 |
SCONJ | 0.010 | 0.011 | 0.010 |
INTJ | 0.001 | 0.001 | 0.001 |
Number of common words in the row and column divided by the total number of unique words in the row.
  | train | validation | test |
---|---|---|---|
train | 1.000 | 0.671 | 0.689 |
validation | 0.163 | 1.000 | 0.327 |
test | 0.165 | 0.321 | 1.000 |
# newdoc id = akadgram
# sent_id = akadgram-s2
# text = В дискусията, предполагам, ще се засегнат важни въпроси.
1 В в ADP R _ 2 case 2:case _
2 дискусията дискусия NOUN Ncfsd Definite=Def|Gender=Fem|Number=Sing 8 obl 8:obl:в SpaceAfter=No
3 , , PUNCT punct _ 4 punct 4:punct _
4 предполагам предполагам VERB Vpitf-r1s Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 8 advcl 8:advcl SpaceAfter=No
5 , , PUNCT punct _ 4 punct 4:punct _
6 ще ще AUX Tx _ 8 aux 8:aux _
7 се се PRON Ppxta Case=Acc|PronType=Prs|Reflex=Yes 8 expl 8:expl _
8 засегнат засегна-(се) VERB Vpptf-r3p Aspect=Perf|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 0 root 0:root _
9 важни важен ADJ A-pi Definite=Ind|Degree=Pos|Number=Plur 10 amod 10:amod _
10 въпроси въпрос NOUN Ncmpi Definite=Ind|Gender=Masc|Number=Plur 8 nsubj:pass 8:nsubj:pass SpaceAfter=No
11 . . PUNCT punct _ 8 punct 8:punct _
[1] Petya Osenova and Kiril Simov. BTB-TR05: BulTreeBank Stylebook. BulTreeBank Project Technical Report № 05. 2004.
@techreport{OsenovaSimov2004,
author = {Petya Osenova and Kiril Simov},
title = {BTB-TR05: BulTreeBank Stylebook ą 05},
year = {2004},
url = {http://www.bultreebank.org/TechRep/BTB-TR05.pdf}
}
[2] Kiril Simov and Petya Osenova. 2003. Practical Annotation Scheme for an HPSG Treebank of Bulgarian. In Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003.
@inproceedings{simov-osenova-2003-practical,
title = "Practical Annotation Scheme for an {HPSG} Treebank of {B}ulgarian",
author = "Simov, Kiril and Osenova, Petya",
booktitle = "Proceedings of 4th International Workshop on Linguistically Interpreted Corpora ({LINC}-03) at {EACL} 2003",
year = "2003",
url = "https://aclanthology.org/W03-2403",
}
[3] Kiril Simov, Gergana Popova, Petya Osenova. HPSG-based syntactic treebank of Bulgarian (BulTreeBank). In: “A Rainbow of Corpora: Corpus Linguistics and the Languages of the World”, edited by Andrew Wilson, Paul Rayson, and Tony McEnery; Lincom-Europa, Munich 2002, pp. 135-142.
@incollection{SimovOsPo2002,
author = {Kiril Simov and Gergana Popova and Petya Osenova},
title = {HPSG-based syntactic treebank of Bulgarian (BulTreeBank)},
booktitle = {A Rainbow of Corpora: Corpus Linguistics and the Languages of the World},
editor = {Andrew Wilson, Paul Rayson and Tony McEnery},
publisher = {Lincom-Europa},
pages = {135--142},
year = {2002},
}
[4] Kiril Simov, Petya Osenova and Milena Slavcheva. BTB-TR03: BulTreeBank Morphosyntactic Tagset. BulTreeBank Project Technical Report № 03. 2004
@techreport{SimovOseSlav2004,
author = {Kiril Simov and Petya Osenova and Milena Slavcheva},
title = {BTB-TR03: BulTreeBank Morphosyntactic Tagset. BulTreeBank Project Technical Report ą 03},
year = {2004},
url = {http://www.bultreebank.org/TechRep/BTB-TR03.pdf}
}
[5] Kiril Simov, Petya Osenova, Alexander Simov, Milen Kouylekov. Design and Implementation of the Bulgarian HPSG-based Treebank. In Erhard Hinrichs and Kiril Simov, editors, Journal of Research on Language and Computation, Special Issue, Kluwer Academic Publishers, pp. 495-522.
@article{SimOsSimKo2005,
author = {Kiril Simov and Petya Osenova and Alexander Simov and Milen Kouylekov},
title = {Design and Implementation of the Bulgarian HPSG-based Treebank},
journal = {Journal of Research on Language and Computation. Special Issue},
year = {2005},
pages = {495--522},
publisher = {Kluwer Academic Publisher},
}
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0).