FLELex is a lexicon for French as a foreign language (FFL) that reports the normalized frequencies of words (lemmas) across each level of the CEFR (Common European Framework of Reference for Languages).
The frequencies have been estimated on a corpus of FFL textbooks and FFL simplified readers. More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:
For FLELex (Treetagger and CRF Tagger) :
François, T., Gala, N., Watrin, P. & Fairon, C. FLELex: a graded lexical resource for French foreign learners. In the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, Iceland, 26-31 May.
For FLELex / Beacco :
Pintard, A. and François, T. (2020). Combining expert knowledge with frequency information to infer CEFR levels for words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp. 85-92).
If you are using FLELex, please, cite these articles.
Features
menu_book |
Receptive lexiconincludes word frequencies observed in textbook reading activities and simplified readers |
---|---|
bar_chart |
CEFR levelsA1 · A2 · B1 · B2 · C1 · C2 |
toc |
Lexical entrieslemma (word)part of speech (tag) |
calculate |
Computed metricslevel_freq · normalized frequency (per 1 million words) for each level of the CEFRtotal_freq · total normalized frequency in the source corpus |
Versions
To build a resource like FLELex, the corpus need to be P.O.S.-tagged automatically. Two taggers do not necessarily share the same characteristics, which nevertheless impact the resulting resource. We have therefore selected two taggers with very different features and built two different versions of FLELex: FLELex-TT and FLELex-CRF. See the Download page for more details on the two versions.
Format
The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an Excel sheet. The Beacco version contains an additional column with the CEFR level derived from the distributional information.
Lemma | POS-tag | A1 | A2 | B1 | B2 | C1 | C2 | Total |
---|---|---|---|---|---|---|---|---|
voiture | NOM | 633.3 | 598.5 | 482.7 | 202.7 | 271.9 | 25.9 | 461.5 |
abandonner | VER | 35.5 | 62.3 | 104.8 | 79.8 | 73.6 | 28.5 | 78.2 |
justice | NOM | 3.9 | 17.3 | 79.1 | 13.2 | 106.3 | 72.9 | 48.1 |
kilo | NOM | 40.3 | 29.9 | 10.2 | 0 | 1.6 | 0 | 19.8 |
logique | NOM | 0 | 0 | 6.8 | 18.6 | 36.3 | 9.6 | 9.9 |
en bas | ADV | 34.9 | 28.5 | 13 | 32.8 | 1.6 | 0 | 24 |
en clair | ADV | 0 | 0 | 0 | 0 | 8.2 | 19.5 | 1.2 |
sous réserve de | PREP | 0 | 0 | 0.361 | 0 | 0 | 0 | 0.03 |
Usage
search SearchThe resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.
bar_chart AnalyseThe resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.
get_app DownloadYou can use FLELex in NLP tasks but also for pedagogical and language assessment purposes. Be aware that there are two versions of FLELex: FLELex-TT and FLELex-CRF.
Authors
FLELex is the result of a collaboration between three teams:
- The Center for Natural Language Processing (CENTAL) at UCLouvain;
- The Laboratoire Parole et Langue of the Aix-Marseille University;
- The EarlyTracks company.
Contributors
Brayan Delmée Logo Design
Anaïs Tack & Baptiste Degryse Prototype design
Damien De Meyere Website maintenance