XNLI: Evaluating Cross-lingual Sentence Representations
XNLI is a subset of a few thousand examples from MNLI which has been translated into a 14 different languages (some low-ish resource). As with MNLI, the goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels).
Identifier |
Task Type |
Metric |
License |
Website |
Code |
Download |
XNLI |
NLI / Entailment |
Accuracy |
CC BY-NC 4.0 |
|
|
|
Data Source
Manually annotated sentences. The training and development sets are translated from English.
Data Description
# |
Train |
Dev |
Test |
Examples |
392,702 |
5,010 |
2,490 |
Label Distribution
|
train |
validation |
test |
contradiction |
0.333 |
0.333 |
0.333 |
entailment |
0.333 |
0.333 |
0.333 |
neutral |
0.333 |
0.333 |
0.333 |
Vocabulary Overlap
Number of common words in the row and column divided by the total number of unique words in the row.
Premise
  |
train |
validation |
test |
train |
1.000 |
0.704 |
0.691 |
validation |
0.053 |
1.000 |
0.258 |
test |
0.087 |
0.436 |
1.000 |
Hypothesis
  |
train |
validation |
test |
train |
1.000 |
0.712 |
0.684 |
validation |
0.064 |
1.000 |
0.290 |
test |
0.100 |
0.478 |
1.000 |
Overall
  |
train |
validation |
test |
train |
1.000 |
0.702 |
0.674 |
validation |
0.075 |
1.000 |
0.306 |
test |
0.116 |
0.496 |
1.000 |
Example
language gold_label sentence1_binary_parse sentence2_binary_parse sentence1_parse sentence2_parse sentence1 sentence2 promptID pairID genre label1 label2 label3 label4 label5 sentence1_tokenized sentence2_tokenized match
bg neutral И той каза: Мамо, у дома съм. Той се обади на майка си веднага щом училищният автобус го е оставил. 1 1 facetoface neutral contradiction neutral neutral neutral И той каза : Мамо , у дома съм . Той се обади на майка си веднага щом училищният автобус го е оставил . True
Citation
[1] Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating Cross-lingual Sentence Representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
@inproceedings{conneau-etal-2018-xnli,
title = "{XNLI}: Evaluating Cross-lingual Sentence Representations",
author = "Conneau, Alexis and
Rinott, Ruty and
Lample, Guillaume and
Williams, Adina and
Bowman, Samuel and
Schwenk, Holger and
Stoyanov, Veselin",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D18-1269",
doi = "10.18653/v1/D18-1269",
pages = "2475--2485"
}
License
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
See the LICENSE file.