The dataset focuses on the task of fine-grained sentiment analysis on movie reviews. It has been automatically collected from a Bulgarian ticket-booking website.
Identifier | Task Type | Metric | License | Website | Code | Download |
---|---|---|---|---|---|---|
Cinexio | Sentiment Analysis | Pearson-Spearman Corr | Research |
Bulgarian movie review website (Cinexio). The labels are retrieved automatically based on the users’ ratings.
The splits are produced based on movie id (url), there are no comments about the same movie in two subsets of the data.
# | Train | Dev | Test |
---|---|---|---|
Examples | 8,155 | 811 | 861 |
Number of unique movies in each subset:
train | validation | test | |
---|---|---|---|
0.0 | 0.003 | 0.006 | 0.008 |
0.5 | 0.016 | 0.036 | 0.038 |
1.0 | 0.054 | 0.086 | 0.086 |
1.5 | 0.005 | 0.009 | 0.003 |
2.0 | 0.041 | 0.051 | 0.051 |
2.5 | 0.013 | 0.010 | 0.016 |
3.0 | 0.083 | 0.067 | 0.080 |
3.5 | 0.031 | 0.021 | 0.031 |
4.0 | 0.149 | 0.120 | 0.127 |
4.5 | 0.052 | 0.052 | 0.048 |
5.0 | 0.553 | 0.544 | 0.510 |
Number of common words in the row and column divided by the total number of unique words in the row.
  | train | validation | test |
---|---|---|---|
train | 1.000 | 0.677 | 0.684 |
validation | 0.172 | 1.000 | 0.378 |
test | 0.183 | 0.396 | 1.000 |
{
"ID":0,
"Cinexio_URL":"http:\/\/www.cinexio.com\/sofia\/movie\/122",
"Comment":"Пет звезди са му малко - заслужава поне още толкова :)",
"User_Rating":5.0,
"Date":1358726400000,
"Category":2
}
[1] Borislav Kapukaranov and Preslav Nakov. 2015. Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pages 266–274, Hissar, Bulgaria. INCOMA Ltd. Shoumen, BULGARIA.
@inproceedings{kapukaranov-nakov-2015-fine,
title = "Fine-Grained Sentiment Analysis for Movie Reviews in {B}ulgarian",
author = "Kapukaranov, Borislav and
Nakov, Preslav",
booktitle = "Proceedings of the International Conference Recent Advances in Natural Language Processing",
month = sep,
year = "2015",
address = "Hissar, Bulgaria",
publisher = "INCOMA Ltd. Shoumen, BULGARIA",
url = "https://aclanthology.org/R15-1036",
pages = "266--274",
}
Research purposes.