Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian

The dataset focuses on the task of fine-grained sentiment analysis on movie reviews. It has been automatically collected from a Bulgarian ticket-booking website.

Identifier Task Type Metric License Website Code Download
Cinexio Sentiment Analysis Pearson-Spearman Corr Research

Data Source

Bulgarian movie review website (Cinexio). The labels are retrieved automatically based on the users’ ratings.

Data Description

The splits are produced based on movie id (url), there are no comments about the same movie in two subsets of the data.

# Train Dev Test
Examples 8,155 811 861

Domain Analysis

Number of unique movies in each subset:

  • Train: 257
  • Validation: 25
  • Test: 47

Label Distribution

train validation test
0.0 0.003 0.006 0.008
0.5 0.016 0.036 0.038
1.0 0.054 0.086 0.086
1.5 0.005 0.009 0.003
2.0 0.041 0.051 0.051
2.5 0.013 0.010 0.016
3.0 0.083 0.067 0.080
3.5 0.031 0.021 0.031
4.0 0.149 0.120 0.127
4.5 0.052 0.052 0.048
5.0 0.553 0.544 0.510

Vocabulary Overlap

Number of common words in the row and column divided by the total number of unique words in the row.

   train validation test
train 1.000 0.677 0.684
validation 0.172 1.000 0.378
test 0.183 0.396 1.000

Example

{
   "ID":0,
   "Cinexio_URL":"http:\/\/www.cinexio.com\/sofia\/movie\/122",
   "Comment":"Пет звезди са му малко - заслужава поне още толкова :)",
   "User_Rating":5.0,
   "Date":1358726400000,
   "Category":2
}

Citation

[1] Borislav Kapukaranov and Preslav Nakov. 2015. Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pages 266–274, Hissar, Bulgaria. INCOMA Ltd. Shoumen, BULGARIA.

@inproceedings{kapukaranov-nakov-2015-fine,
    title = "Fine-Grained Sentiment Analysis for Movie Reviews in {B}ulgarian",
    author = "Kapukaranov, Borislav  and
      Nakov, Preslav",
    booktitle = "Proceedings of the International Conference Recent Advances in Natural Language Processing",
    month = sep,
    year = "2015",
    address = "Hissar, Bulgaria",
    publisher = "INCOMA Ltd. Shoumen, BULGARIA",
    url = "https://aclanthology.org/R15-1036",
    pages = "266--274",
}

License

Research purposes.