EFLLex aims to a lexicon of receptive vocabulary for English as a second/foreign language (EFL) that reports the normalized frequencies of words (lemmas) across 5 of the six levels of the CEFR (Common European Framework of Reference for Languages), excluding C2. Apart from information on single word usage, this list will also contains multi-word expressions and information on their usage at different levels, something that is rarely present in the resources of this kind. In the beta version, only single word are included.

The frequencies have been estimated on a corpus of 13 EFL textbooks and 8 on-line resources.

More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:

Dürlich, L. and François, T., EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, 7-12 May.

If you are using EFLLex, please, cite the above article.

Features

menu_book

Receptive lexicon
includes word frequencies observed in textbook reading activities and simplified readers

bar_chart

CEFR levels
A1 · A2 · B1 · B2 · C1

toc

Lexical entries
lemma (word)
part of speech (tag)

calculate

Computed metrics
level_freq · normalized frequency (per 1 million words) for each level of the CEFR
total_freq · total normalized frequency in the source corpus
nb_doc · document frequency
Lemma POS-tag A1 A2 B1 B2 C1 Total
cat NN 77.40 351.71 39.19 28.57 22.53 79.38
empty JJ 0 28.83 28.65 102.29 37.84 61.88
explore VB 0 153.38 60.50 109.99 205.43 130.37
tiresome JJ 0 0 0 4.97 15.5 6
video NN 65.19 0 67.87 81.76 111.06 90.93
write VB 758.66 1421.51 1064.47 682.26 1104.72 1053.96
shopping center NN 0 45.12 9.80 0 15.50 11.45
Example of some entries from EFLLex

Format

The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an excel sheet.

Usage

search Search

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

bar_chart Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

get_app Download

You can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.

Authors

Luise Dürlich

Thomas François perm_contact_calendar
CEFRLex project coordinator
CENTAL, UCLouvain (BE)

Contributors

Brayan Delmée
Logo Design

Dorian Ricci, Baptiste Degryse & Anaïs Tack
Prototype design

Damien De Meyere
Website maintenance