EFLLex aims to a lexicon of receptive vocabulary for English as a second/foreign language (EFL) that reports the normalized frequencies of words (lemmas) across 5 of the six levels of the CEFR (Common European Framework of Reference for Languages), excluding C2. Apart from information on single word usage, this list will also contains multi-word expressions and information on their usage at different levels, something that is rarely present in the resources of this kind. In the beta version, only single word are included.

The frequencies have been estimated on a corpus of 13 EFL textbooks and 8 on-line resources.

More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:

Dürlich, L. and François, T., EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, 7-12 May.

If you are using EFLLex, please, cite the above article.



Receptive lexicon
includes word frequencies observed in textbook reading activities and simplified readers


CEFR levels
A1 · A2 · B1 · B2 · C1


Lexical entries
lemma (word)
part of speech (tag)


Computed metrics
level_freq · normalized frequency (per 1 million words) for each level of the CEFR
total_freq · total normalized frequency in the source corpus
nb_doc · document frequency
Lemma POS-tag A1 A2 B1 B2 C1 Total
cat NN 77.40 351.71 39.19 28.57 22.53 79.38
empty JJ 0 28.83 28.65 102.29 37.84 61.88
explore VB 0 153.38 60.50 109.99 205.43 130.37
tiresome JJ 0 0 0 4.97 15.5 6
video NN 65.19 0 67.87 81.76 111.06 90.93
write VB 758.66 1421.51 1064.47 682.26 1104.72 1053.96
shopping center NN 0 45.12 9.80 0 15.50 11.45
Example of some entries from EFLLex


The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an excel sheet.


The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

You can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.


