EFLLex aims to a lexicon of receptive vocabulary for English as a second/foreign language (EFL) that reports the normalized frequencies of words (lemmas) across 5 of the six levels of the CEFR (Common European Framework of Reference for Languages), excluding C2. Apart from information on single word usage, this list will also contains multi-word expressions and information on their usage at different levels, something that is rarely present in the resources of this kind. In the beta version, only single word are included.
The frequencies have been estimated on a corpus of 13 EFL textbooks and 8 on-line resources.
More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:
Dürlich, L. and François, T., EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, 7-12 May.
If you are using EFLLex, please, cite the above article.
Features
menu_book |
Receptive lexiconincludes word frequencies observed in textbook reading activities and simplified readers |
---|---|
bar_chart |
CEFR levelsA1 · A2 · B1 · B2 · C1 |
toc |
Lexical entrieslemma (word)part of speech (tag) |
calculate |
Computed metricslevel_freq · normalized frequency (per 1 million words) for each level of the CEFRtotal_freq · total normalized frequency in the source corpus nb_doc · document frequency |
Lemma | POS-tag | A1 | A2 | B1 | B2 | C1 | Total |
---|---|---|---|---|---|---|---|
cat | NN | 77.40 | 351.71 | 39.19 | 28.57 | 22.53 | 79.38 |
empty | JJ | 0 | 28.83 | 28.65 | 102.29 | 37.84 | 61.88 |
explore | VB | 0 | 153.38 | 60.50 | 109.99 | 205.43 | 130.37 |
tiresome | JJ | 0 | 0 | 0 | 4.97 | 15.5 | 6 |
video | NN | 65.19 | 0 | 67.87 | 81.76 | 111.06 | 90.93 |
write | VB | 758.66 | 1421.51 | 1064.47 | 682.26 | 1104.72 | 1053.96 |
shopping center | NN | 0 | 45.12 | 9.80 | 0 | 15.50 | 11.45 |
Format
The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an excel sheet.
Usage
search SearchThe resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.
bar_chart AnalyseThe resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.
get_app DownloadYou can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.
Authors
Luise Dürlich
Thomas François perm_contact_calendar
CEFRLex project coordinator
CENTAL, UCLouvain (BE)
Contributors
Brayan Delmée Logo Design
Dorian Ricci, Baptiste Degryse & Anaïs Tack Prototype design
Damien De Meyere Website maintenance