FLELex · A CEFR-graded lexical resource for French as a foreign language

A CEFR-graded lexical resource for French as a foreign language

FLELex is a lexicon for French as a foreign language (FFL) that reports the normalized frequencies of words (lemmas) across each level of the CEFR (Common European Framework of Reference for Languages).

The frequencies have been estimated on a corpus of FFL textbooks and FFL simplified readers. More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:

For FLELex (Treetagger and CRF Tagger) :

François, T., Gala, N., Watrin, P. & Fairon, C. FLELex: a graded lexical resource for French foreign learners. In the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, Iceland, 26-31 May.

For FLELex / Beacco :

Pintard, A. and François, T. (2020). Combining expert knowledge with frequency information to infer CEFR levels for words. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp. 85-92).

If you are using FLELex, please, cite these articles.

Features

	Receptive lexicon includes word frequencies observed in textbook reading activities and simplified readers
	CEFR levels A1 · A2 · B1 · B2 · C1 · C2
	Lexical entries lemma (`word`) part of speech (`tag`)
	Computed metrics level_freq · normalized frequency (per 1 million words) for each level of the CEFR total_freq · total normalized frequency in the source corpus

Versions

To build a resource like FLELex, the corpus need to be P.O.S.-tagged automatically. Two taggers do not necessarily share the same characteristics, which nevertheless impact the resulting resource. We have therefore selected two taggers with very different features and built two different versions of FLELex: FLELex-TT and FLELex-CRF. See the Download page for more details on the two versions.

Format

The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an Excel sheet. The Beacco version contains an additional column with the CEFR level derived from the distributional information.

*Example of some entries from FLELex*
Lemma	POS-tag	A1	A2	B1	B2	C1	C2	Total
voiture	NOM	633.3	598.5	482.7	202.7	271.9	25.9	461.5
abandonner	VER	35.5	62.3	104.8	79.8	73.6	28.5	78.2
justice	NOM	3.9	17.3	79.1	13.2	106.3	72.9	48.1
kilo	NOM	40.3	29.9	10.2	0	1.6	0	19.8
logique	NOM	0	0	6.8	18.6	36.3	9.6	9.9
en bas	ADV	34.9	28.5	13	32.8	1.6	0	24
en clair	ADV	0	0	0	0	8.2	19.5	1.2
sous réserve de	PREP	0	0	0.361	0	0	0	0.03

Usage

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

Download

You can use FLELex in NLP tasks but also for pedagogical and language assessment purposes. Be aware that there are two versions of FLELex: FLELex-TT and FLELex-CRF.

Authors

FLELex is the result of a collaboration between three teams:

The Center for Natural Language Processing (CENTAL) at UCLouvain;
The Laboratoire Parole et Langue of the Aix-Marseille University;
The EarlyTracks company.

Contributors

Brayan Delmée
Logo Design

Anaïs Tack & Baptiste Degryse
Prototype design

Damien De Meyere
Website maintenance

Features

Receptive lexicon

CEFR levels

Lexical entries

Computed metrics

Versions

Format

Example of some entries from FLELex

Usage

Authors

Contributors