FLELex is a lexicon for French as a foreign language (FFL) that reports the normalized frequencies of words (lemmas) across each level of the CEFR (Common European Framework of Reference for Languages).

The frequencies have been estimated on a corpus of FFL textbooks and FFL simplified readers. More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:

For FLELex (Treetagger and CRF Tagger) :

François, T., Gala, N., Watrin, P. & Fairon, C. FLELex: a graded lexical resource for French foreign learners. In the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, Iceland, 26-31 May.

For FLELex / Beacco :

Pintard, A. and François, T. (2020). Combining expert knowledge with frequency information to infer CEFR levels for words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp. 85-92).

If you are using FLELex, please, cite these articles.



Receptive lexicon
includes word frequencies observed in textbook reading activities and simplified readers


CEFR levels
A1 · A2 · B1 · B2 · C1 · C2


Lexical entries
lemma (word)
part of speech (tag)


Computed metrics
level_freq · normalized frequency (per 1 million words) for each level of the CEFR
total_freq · total normalized frequency in the source corpus


To build a resource like FLELex, the corpus need to be P.O.S.-tagged automatically. Two taggers do not necessarily share the same characteristics, which nevertheless impact the resulting resource. We have therefore selected two taggers with very different features and built two different versions of FLELex: FLELex-TT and FLELex-CRF. See the Download page for more details on the two versions.


The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an Excel sheet. The Beacco version contains an additional column with the CEFR level derived from the distributional information.

Lemma POS-tag A1 A2 B1 B2 C1 C2 Total
voiture NOM 633.3 598.5 482.7 202.7 271.9 25.9 461.5
abandonner VER 35.5 62.3 104.8 79.8 73.6 28.5 78.2
justice NOM 3.9 17.3 79.1 13.2 106.3 72.9 48.1
kilo NOM 40.3 29.9 10.2 0 1.6 0 19.8
logique NOM 0 0 6.8 18.6 36.3 9.6 9.9
en bas ADV 34.9 28.5 13 32.8 1.6 0 24
en clair ADV 0 0 0 0 8.2 19.5 1.2
sous réserve de PREP 0 0 0.361 0 0 0 0.03
Example of some entries from FLELex


search Search

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

bar_chart Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

get_app Download

You can use FLELex in NLP tasks but also for pedagogical and language assessment purposes. Be aware that there are two versions of FLELex: FLELex-TT and FLELex-CRF.


FLELex is the result of a collaboration between three teams:


Brayan Delmée
Logo Design

Anaïs Tack & Baptiste Degryse
Prototype design

Damien De Meyere
Website maintenance