Language Model NLP Task Dataset Dataset-Domain Measure Performance mBERT Difference with mBERT Source
Estonian EstBERT (128)
POS (coarse)
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
97.89
97.42
0.47
Estonian EstBERT (128)
XPOS
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
98.4
98.06
0.34
Estonian EstBERT (128)
Morph
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
96.93
96.24
0.69
Estonian EstBERT (128)
DP
UDv2.5 EDT fiction, newspapers, scientific texts LAS
83.94
N/A
N/A
Estonian EstBERT (128)
DP
UDv2.5 EDT fiction, newspapers, scientific texts UAS
86.7
N/A
N/A
Estonian EstBERT (128)
TC
Estonian Valency Corpus news Accuracy
81.7
75.67
6.03
Estonian EstBERT (128)
SA
Estonian Valency Corpus news Accuracy
74.36
70.23
4.13
Estonian EstBERT (128)
NER
EstNER news F1
90.11
86.51
3.6
Estonian EstBERT (512)
POS (coarse)
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
97.84
97.43
0.41
Estonian EstBERT (512)
XPOS
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
98.43
98.13
0.3
Estonian EstBERT (512)
Morph
UDv2.5 EDT fiction, newspapers, scientific texts Accuracy
96.8
96.13
0.67
Estonian EstBERT (512)
TC
Estonian Valency Corpus news Accuracy
80.96
74.94
6.02
Estonian EstBERT (512)
SA
Estonian Valency Corpus news Accuracy
74.5
69.52
4.98
Estonian EstBERT (512)
NER
EstNER news F1
89.04
88.37
0.67
Basque BERTeus
NER
EIEC news F1 (test)
87.06
81.52
5.54
Basque BERTeus
POS
UD-1.2 news Accuracy (test)
97.76
96.37
1.39
Basque BERTeus
TC
BHTC news F1 (test)
76.77
68.42
8.35
Basque BERTeus
SA
Basque Cultural Heritage Tweets Corpus tweets F1 (test)
78.1
71.02
7.08
Dutch BERTje
NER
CoNLL-2002 news F1 (test)
88.3
80.7
7.6
Dutch BERTje
NER
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc F1 (test)
82.1
79.7
2.4
Dutch BERTje
POS
UD-LassySmall wiki Accuracy (test)
96.3
92.5
3.8
Dutch BERTje
POS (C)
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc Accuracy (test)
98.5
98.3
0.2
Dutch BERTje
POS (FG)
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc Accuracy (test)
96.8
96.2
0.6
Dutch BERTje
SRL-PA
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc F1 (test)
85.3
80.4
4.9
Dutch BERTje
SRL-M
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc F1 (test)
67.2
62.4
4.8
Dutch BERTje
SRT
SoNaR-1 news, social, legal, manual, wiki, web, press, proceedings, misc macro F1 (test)
64.3
57.3
7.0
Dutch BERTje
SA
110k Dutch Book Reviews Dataset book Reviews Accuracy (test)
93.0
89.1
3.9
Dutch BERTje
DDD
Europarl proceedings Accuracy (test)
98.27
98.28
-0.01
Dutch RobBERT
SA
110k Dutch Book Reviews Dataset book Reviews Accuracy (test)
94.42
N/A
N/A
Dutch RobBERT
DDD
Europarl proceedings Accuracy (test)
98.41
98.28
0.13
French CamemBERT
POS
GSD blogs, news, reviews, wiki UPOS
98.19
97.48
0.71
French CamemBERT
DP-UAS
GSD blogs, news, reviews, wiki Accuracy (test)
94.82
92.72
2.1
French CamemBERT
DP-LAS
GSD blogs, news, reviews, wiki Accuracy (test)
92.47
89.73
2.74
French CamemBERT
POS
Sequoia politics, news, wiki, agency UPOS
99.21
98.41
0.8
French CamemBERT
DP-UAS
Sequoia politics, news, wiki, agency Accuracy (test)
95.56
93.24
2.32
French CamemBERT
DP-LAS
Sequoia politics, news, wiki, agency Accuracy (test)
94.39
91.24
3.15
French CamemBERT
POS
Spoken transcription UPOS
96.68
96.02
0.66
French CamemBERT
DP-UAS
Spoken transcription Accuracy (test)
86.05
84.65
1.4
French CamemBERT
DP-LAS
Spoken transcription Accuracy (test)
80.07
78.63
1.44
French CamemBERT
POS
ParTUT legal, license, misc UPOS
97.63
97.35
0.28
French CamemBERT
DP-UAS
ParTUT legal, license, misc Accuracy (test)
95.21
94.18
1.03
French CamemBERT
DP-LAS
ParTUT legal, license, misc Accuracy (test)
92.2
91.37
0.83
French CamemBERT
POS
French Treebank news F1
87.93
82.75
5.18
French CamemBERT
NLI
XNLI (French) transcription, politics, news, literature, misc Accuracy
81.2
76.9
4.3
French CamemBERT
SA
CLS (French) book reviews Accuracy
93.4
86.15
7.25
French CamemBERT
SA
CLS (French) dvd reviews Accuracy
92.7
86.9
5.8
French CamemBERT
SA
CLS (French) music reviews Accuracy
94.15
86.65
7.5
French CamemBERT
PI
PAWS-X (French) wiki Accuracy
89.8
89.3
0.5
French CamemBERT
POS
French Treebank news F1 (test)
88.39
87.52
0.87
French CamemBERT
VSD
FrenchSemeval wiki F1
51.9
44.93
6.97
French CamemBERT
NSD
Semeval 2013 Task 12 (French) news F1
51.52
53.48
-1.96
French FlauBERT
SA
CLS (French) book reviews Accuracy
93.4
86.15
7.25
French FlauBERT
SA
CLS (French) dvd reviews Accuracy
92.5
86.9
5.6
French FlauBERT
SA
CLS (French) music reviews Accuracy
94.3
86.65
7.65
French FlauBERT
PI
PAWS-X (French) wiki Accuracy
89.9
89.3
0.6
French FlauBERT
NLI
XNLI transcription, politics, news, literature, misc Accuracy
81.3
76.9
4.4
French FlauBERT
NER
French Treebank news F1 (test)
89.05
87.52
1.53
French FlauBERT
VSD
FrenchSemeval wiki F1
47.4
44.93
2.47
French FlauBERT
NSD
Semeval 2013 Task 12 (French) news F1
50.78
53.48
-2.7
Finnish FinBERT
POS
Turku Dependency Treebank wiki, news, blog, speech, legislative, fiction UPOS
98.23
96.97
1.26
Finnish FinBERT
POS
FinnTreeBank grammar, news, literature, politics, legislative, misc UPOS
98.39
95.87
2.52
Finnish FinBERT
POS
Parallel UD treebank wiki, news UPOS
98.08
97.58
0.5
Finnish FinBERT
NER
FiNER wiki, news F1
92.4
90.29
2.11
Finnish FinBERT
DP
Turku Dependency Treebank wiki, news, blog, speech, legislative, fiction LAS (predicted segmentation)
91.93
86.32
5.61
Finnish FinBERT
DP
FinnTreeBank grammar, news, literature, politics, legislative, misc LAS (predicted segmentation)
92.16
85.52
6.64
Finnish FinBERT
DP
Parallel UD treebank wiki, news LAS (predicted segmentation)
92.54
89.18
3.36
Finnish FinBERT
TC
Yle news news Accuracy (test size 10K)
90.57
88.44
2.13
Finnish FinBERT
TC
Ylilauta online discussion social Accuracy (test size 10K)
79.18
67.92
11.26
Italian Italian BERT (XXL)
NER
WikiNER wiki F1 (test)
93.61
93.53
0.08
Italian Italian BERT (XXL)
NER
I-CAB 2009 news F1
88.13
85.18
2.95
Italian Italian BERT (XXL)
POS
PoSTWITA twitter Accuracy
93.75
91.54
2.21
Italian ALBERTO
SA
SENTIPOLC 2016 twitter F1 (test)
72.23
N/A
N/A
Italian ALBERTO
SC
SENTIPOLC 2016 twitter F1 (test)
79.06
N/A
N/A
Italian ALBERTO
ID
SENTIPOLC 2016 twitter F1 (test)
60.9
N/A
N/A
Italian Gilberto
POS
ParTUT legal, license, misc UPOS
98.8
98.0
0.8
Italian Gilberto
POS
ISDT legal, news, wiki, misc UPOS
98.6
98.5
0.1
Italian Gilberto
NER
WikiNER wiki F1
92.7
92.2
0.5
Italian Umberto
POS
ParTUT legal, license, misc Accuracy
98.9
N/A
N/A
Italian Umberto
POS
ISDT legal, news, wiki, misc Accuracy
98.98
N/A
N/A
Italian Umberto
NER
WikiNER wiki F1
92.53
N/A
N/A
Italian Umberto
NER
I-CAB 2007 news F1
92.53
N/A
N/A
German deepset-GermanBERT
IOL
germEval18Fine twitter F1
74.7
71.0
3.7
German deepset-GermanBERT
IOL
germEval18coarse twitter F1
48.8
44.1
4.7
German deepset-GermanBERT
NER
germEval14 wiki, news F1
84.0
83.4
0.6
German deepset-GermanBERT
NER
CoNLL-2003 news F1
80.4
79.2
1.2
German deepset-GermanBERT
TC
10kGNAD news Accuracy
90.5
88.8
1.7
German deepset-GermanBERT
NER
CoNLL-2003 news F1 (test)
83.7
82.55
1.15
German deepset-GermanBERT
NER
germEval14 wiki, news F1 (test)
86.61
86.26
0.35
German deepset-GermanBERT
POS
Parallel UD treebank wiki, news Accuracy (test)
98.56
98.58
-0.02
German German BERT
POS
Parallel UD treebank wiki, news Accuracy (test)
98.57
98.58
-0.01
German German BERT
NER
germEval14 wiki, news F1 (test)
86.89
86.26
0.63
German German BERT
NER
CoNLL-2003 news F1 (test)
84.52
82.55
1.97
German German Europeana BERT
NER
LFT news F1
80.55
77.26
3.29
German German Europeana BERT
NER
ONB news F1
85.5
83.44
2.06
Spanish BETO
POS
Turku Dependency Treebank wiki, news, blog, speech, legislative, fiction UPOS
98.97
97.1
1.87
Spanish BETO
NER
CoNLL 2000, 2002, 2007 news F1
88.43
87.38
1.05
Spanish BETO
TC
MLDoc news Accuracy
95.6
95.7
-0.1
Spanish BETO
PI
PAWS-X wiki Accuracy
89.05
90.7
-1.65
Spanish BETO
NLI
XNLI transcription, politics, news, literature, misc Accuracy
82.01
78.5
3.51
Spanish BETO
SA
TASS 2020 twitter F1 (test)
66.5
N/A
N/A
Spanish BETO
Emotion Analysis
TASS 2020 twitter F1 (test)
52.1
N/A
N/A
Spanish BETO
Hate Speech Detection
SemEval 2019 Task 5: HatEval twitter F1 (test)
76.8
N/A
N/A
Spanish BETO
ID
SemEval 2019 Task 5: HatEval social media F1 (test)
70.6
N/A
N/A
Russian RuBERT
PI
Paraphraser news Accuracy
84.99
81.66
3.33
Russian RuBERT
SA
RuSentiment social F1
72.63
70.82
1.81
Russian RuBERT
QA
SDSJ Task B wiki F1 (dev)
84.6
83.39
1.21
Slavic SlavicBERT
NER
BSNLP-2019 dataset web Recall
91.8
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
MRC
CMRC 2018 wiki F1 (test)
90.0
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
MRC
DRCD wiki F1 (test)
94.1
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
MRC
CJRC law F1 (test)
81.0
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
NLI
XNLI transcription, politics, news, literature, misc Accuracy (test)
80.6
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
SA
ChnSentiCorpz social, misc Accuracy (test)
94.9
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
SPM
LCQMC social Accuracy (test)
86.8
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
SPM
BQ Corpus log Accuracy (test)
84.9
N/A
N/A
Chinese Ch-RoBERTa-wwm-ext-large
TC
THUCNews news Accuracy (test)
97.6
N/A
N/A
Japanese BERT Japanese
TC
Livedoor news F1 (macro)
97.0
N/A
N/A
Korean KoBERT
SA
Naver movie review Accuracy
90.1
87.5
2.6
Thai BERT-th
NLI
XNLI transcription, politics, news, literature, misc Accuracy
68.9
66.1
2.8
Thai BERT-th
SA
Wongnai Review Dataset restaurant reviews Accuracy
57.06
N/A
N/A
Mongolian Mongolian BERT
NER
Mongolian NER news F1 (test)
81.46
N/A
N/A
Turkish BERTurk
POS
IMST dataset misc Accuracy
96.93
95.38
1.55
Turkish BERTurk
NER
nan nan F1
94.85
93.61
1.24
Arabic Arabert v1
SA
AJGT twitter Accuracy
93.8
83.6
10.2
Arabic Arabert v1
SA
HARD hotel reviews Accuracy
96.1
95.7
0.4
Arabic Arabert v1
SA
ASTD twitter Accuracy
92.6
80.1
12.5
Arabic Arabert v1
SA
ArSenTD-Lev twitter Accuracy
59.4
51.0
8.4
Arabic Arabert v1
SA
LABR book reviews Accuracy
86.7
83.0
3.7
Arabic Arabert v1
NER
ANER-corp news F1 (macro)
81.9
78.4
3.5
Arabic Arabert v1
QA
ARCD wiki F1 (macro)
62.7
61.3
1.4
Portuguese BERT-Large Portuguese
NER
Harem web, politics, fiction, email, transcriptions, news, misc F1 (5 classes)
83.3
79.44
3.86
English BERT-Base
NLI
MNLI (matched) misc Accuracy (dev)
84.6
N/A
N/A
English BERT-Base
PI
Quora Question Pairs social F1 (dev)
71.2
N/A
N/A
English BERT-Base
NLI
QNLI wiki Accuracy (dev)
90.5
N/A
N/A
English BERT-Base
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
93.5
N/A
N/A
English BERT-Base
LA
CoLA misc Matthew's Correlation (dev)
52.1
N/A
N/A
English BERT-Base
STS
STS-B misc Pearson-Spearman Correlation (dev)
85.8
N/A
N/A
English BERT-Base
PI
MRPC news F1 (dev)
88.9
N/A
N/A
English BERT-Base
TER
RTE news, wiki Accuracy (dev)
66.4
N/A
N/A
English BERT-Base
QA
SQuAD v1.1 wiki F1 (dev)
88.5
N/A
N/A
English BERT-Base
CI
SWAG video captions Accuracy (dev)
81.6
N/A
N/A
English BERT-Base
NLI
WNLI fiction Accuracy
45.1
N/A
N/A
English BERT-Base
SA
IMDb movie reviews Accuracy
93.46
N/A
N/A
English BERT-Large
NLI
MNLI (matched) misc Accuracy (dev)
86.7
N/A
N/A
English BERT-Large
PI
Quora Question Pairs social F1 (dev)
72.1
N/A
N/A
English BERT-Large
NLI
QNLI wiki Accuracy (dev)
92.7
N/A
N/A
English BERT-Large
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
94.9
N/A
N/A
English BERT-Large
LA
CoLA misc Matthew's Correlation (dev)
60.5
N/A
N/A
English BERT-Large
STS
STS-B misc Pearson-Spearman Correlation (dev)
86.5
N/A
N/A
English BERT-Large
PI
MRPC news F1 (dev)
89.3
N/A
N/A
English BERT-Large
TER
RTE news, wiki Accuracy (dev)
70.1
N/A
N/A
English BERT-Large
QA
SQuAD v1.1 wiki F1 (dev)
90.9
N/A
N/A
English BERT-Large
QA
SQuAD v2.0 wiki F1 (dev)
81.9
N/A
N/A
English BERT-Large
CI
SWAG video captions Accuracy (dev)
86.6
N/A
N/A
English BERT-Large
RC
RACE examinations Accuracy
72.0
N/A
N/A
English RoBERTa
NLI
MNLI (matched) misc Accuracy (dev)
90.2
N/A
N/A
English RoBERTa
PI
QQP social F1 (dev)
92.2
N/A
N/A
English RoBERTa
NLI
QNLI wiki Accuracy (dev)
94.7
N/A
N/A
English RoBERTa
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
96.4
N/A
N/A
English RoBERTa
LA
CoLA misc Matthew's Correlation (dev)
68.0
N/A
N/A
English RoBERTa
STS
STS-B misc Pearson-Spearman Correlation (dev)
92.4
N/A
N/A
English RoBERTa
PI
MRPC news F1 (dev)
90.9
N/A
N/A
English RoBERTa
TER
RTE news, wiki Accuracy (dev)
86.6
N/A
N/A
English RoBERTa
QA
SQuAD v1.1 wiki F1 (dev)
94.6
N/A
N/A
English RoBERTa
QA
SQuAD v2.0 wiki F1 (dev)
89.4
N/A
N/A
English RoBERTa
RC
RACE examinations Accuracy
83.2
N/A
N/A
English ALBERT (1M)
NLI
MNLI (matched) misc Accuracy (dev)
90.4
N/A
N/A
English ALBERT (1M)
PI
QQP social F1 (dev)
92.0
N/A
N/A
English ALBERT (1M)
NLI
QNLI wiki Accuracy (dev)
95.2
N/A
N/A
English ALBERT (1M)
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
96.8
N/A
N/A
English ALBERT (1M)
LA
CoLA misc Matthew's Correlation (dev)
68.7
N/A
N/A
English ALBERT (1M)
STS
STS-B misc Pearson-Spearman Correlation (dev)
92.7
N/A
N/A
English ALBERT (1M)
PI
MRPC news F1 (dev)
90.2
N/A
N/A
English ALBERT (1M)
TER
RTE news, wiki Accuracy (dev)
88.1
N/A
N/A
English ALBERT (1M)
QA
SQuAD v1.1 wiki F1 (dev)
94.8
N/A
N/A
English ALBERT (1M)
QA
SQuAD v2.0 wiki F1 (dev)
89.9
N/A
N/A
English ALBERT (1M)
RC
RACE examinations Accuracy
86.0
N/A
N/A
English ALBERT (1.5M)
NLI
MNLI (matched) misc Accuracy (dev)
90.8
N/A
N/A
English ALBERT (1.5M)
PI
QQP social F1 (dev)
92.2
N/A
N/A
English ALBERT (1.5M)
NLI
QNLI wiki Accuracy (dev)
95.3
N/A
N/A
English ALBERT (1.5M)
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
96.9
N/A
N/A
English ALBERT (1.5M)
LA
CoLA misc Matthew's Correlation (dev)
71.4
N/A
N/A
English ALBERT (1.5M)
STS
STS-B misc Pearson-Spearman Correlation (dev)
93.0
N/A
N/A
English ALBERT (1.5M)
PI
MRPC news F1 (dev)
90.9
N/A
N/A
English ALBERT (1.5M)
TER
RTE news, wiki Accuracy (dev)
89.2
N/A
N/A
English ALBERT (1.5M)
QA
SQuAD v1.1 wiki F1 (dev)
94.8
N/A
N/A
English ALBERT (1.5M)
QA
SQuAD v2.0 wiki F1 (dev)
90.2
N/A
N/A
English ALBERT (1.5M)
RC
RACE examinations Accuracy
86.5
N/A
N/A
English DistilBERT
NLI
MNLI (matched) misc Accuracy (dev)
79.0
N/A
N/A
English DistilBERT
PI
QQP social F1 (dev)
84.9
N/A
N/A
English DistilBERT
NLI
QNLI wiki Accuracy (dev)
85.3
N/A
N/A
English DistilBERT
SA
Stanford Sentiment Treebank movie reviews Accuracy (dev)
90.7
N/A
N/A
English DistilBERT
LA
CoLA misc Matthew's Correlation (dev)
43.6
N/A
N/A
English DistilBERT
STS
STS-B misc Pearson-Spearman Correlation (dev)
81.2
N/A
N/A
English DistilBERT
PI
MRPC news F1 (dev)
87.5
N/A
N/A
English DistilBERT
TER
RTE news, wiki Accuracy (dev)
59.9
N/A
N/A
English DistilBERT
QA
SQuAD v1.1 wiki F1 (dev)
78.7
N/A
N/A
English DistilBERT
NLI
WNLI fiction Accuracy
56.3
N/A
N/A
English DistilBERT
SA
IMDb movie reviews Accuracy
92.82
N/A
N/A
Yorùbá Fine-Tuned mBERT
NER
Global Voices Yorùbá news F1
52.5
0.0
52.5
Philipino BERT-Tagalog
SA
- electronic products reviews Accuracy
88.17
N/A
N/A
Spanish RoBERTuito
SA
TASS 2020 twitter F1 (test)
70.07
N/A
N/A
Spanish RoBERTuito
Emotion Analysis
TASS 2020 twitter F1 (test)
55.1
N/A
N/A
Spanish RoBERTuito
Hate Speech Detection
SemEval 2019 Task 5: HatEval twitter F1 (test)
80.1
N/A
N/A
Spanish RoBERTuito
ID
SemEval 2019 Task 5: HatEval social media F1 (test)
74.0
N/A
N/A
Spanish Spanish RoBERTa
SA
TASS 2020 twitter F1 (test)
66.9
N/A
N/A
Spanish Spanish RoBERTa
Emotion Analysis
TASS 2020 twitter F1 (test)
53.3
N/A
N/A
Spanish Spanish RoBERTa
Hate Speech Detection
SemEval 2019 Task 5: HatEval twitter F1 (test)
76.6
N/A
N/A
Spanish Spanish RoBERTa
ID
SemEval 2019 Task 5: HatEval social media F1 (test)
72.3
N/A
N/A
Spanish BERTin
SA
TASS 2020 twitter F1 (test)
66.5
N/A
N/A
Spanish BERTin
Emotion Analysis
TASS 2020 twitter F1 (test)
51.8
N/A
N/A
Spanish BERTin
Hate Speech Detection
SemEval 2019 Task 5: HatEval twitter F1 (test)
76.7
N/A
N/A
Spanish BERTin
ID
SemEval 2019 Task 5: HatEval social media F1 (test)
71.6
N/A
N/A

How to cite this

What the [MASK]? Making Sense of Language-Specific BERT Models

@article{nozza2020what, title={What the [MASK]? Making Sense of Language-Specific BERT Models}, author={Nozza, Debora and Bianchi, Federico and Hovy, Dirk}, journal={arXiv preprint arXiv:2003.02912}, year={2020} }


Don't Forget!

This is a collaborative resource to help researchers understand and find the best BERT model for a given dataset, task and language. The numbers here rely on self reported performance (we can give no guarantees for their accuracy. In the future, we hope to independently verify each of the models).

Do you want to add your model? Click Here