FLELex is a lexicon for French as a foreign language (FFL) that reports the normalized frequencies of words (lemmas) across each level of the CEFR (Common European Framework of Reference for Languages).

The frequencies have been estimated on a corpus of FFL textbooks and FFL simplified readers. More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:

For FLELex (Treetagger and CRF Tagger) :

François, T., Gala, N., Watrin, P. & Fairon, C. FLELex: a graded lexical resource for French foreign learners. In the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, Iceland, 26-31 May.

For FLELex / Beacco :

Pintard, A. and François, T. (2020). Combining expert knowledge with frequency information to infer CEFR levels for words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp. 85-92).

If you are using FLELex, please, cite these articles.

Features

menu_book

Receptive lexicon
includes word frequencies observed in textbook reading activities and simplified readers

bar_chart

CEFR levels
A1 · A2 · B1 · B2 · C1 · C2

toc

Lexical entries
lemma (word)
part of speech (tag)

calculate

Computed metrics
level_freq · normalized frequency (per 1 million words) for each level of the CEFR
total_freq · total normalized frequency in the source corpus

Versions

To build a resource like FLELex, the corpus need to be P.O.S.-tagged automatically. Two taggers do not necessarily share the same characteristics, which nevertheless impact the resulting resource. We have therefore selected two taggers with very different features and built two different versions of FLELex: FLELex-TT and FLELex-CRF. See the Download page for more details on the two versions.

Format

The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an Excel sheet. The Beacco version contains an additional column with the CEFR level derived from the distributional information.

Lemma POS-tag A1 A2 B1 B2 C1 C2 Total
voiture NOM 633.3 598.5 482.7 202.7 271.9 25.9 461.5
abandonner VER 35.5 62.3 104.8 79.8 73.6 28.5 78.2
justice NOM 3.9 17.3 79.1 13.2 106.3 72.9 48.1
kilo NOM 40.3 29.9 10.2 0 1.6 0 19.8
logique NOM 0 0 6.8 18.6 36.3 9.6 9.9
en bas ADV 34.9 28.5 13 32.8 1.6 0 24
en clair ADV 0 0 0 0 8.2 19.5 1.2
sous réserve de PREP 0 0 0.361 0 0 0 0.03
Example of some entries from FLELex

Usage

search Search

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

bar_chart Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

get_app Download

You can use FLELex in NLP tasks but also for pedagogical and language assessment purposes. Be aware that there are two versions of FLELex: FLELex-TT and FLELex-CRF.

Authors

FLELex is the result of a collaboration between three teams:

Contributors

Brayan Delmée
Logo Design

Anaïs Tack & Baptiste Degryse
Prototype design

Damien De Meyere
Website maintenance