What is FLELex?
FLELex is a lexicon for French as a foreign language (FFL) that reports the normalized frequencies of words (lemmas) across each level of the CEFR (Common European Framework of Reference for Languages).
The frequencies have been estimated on a corpus of FFL textbooks and FFL simplified readers. More details on the corpus, the computation and normalization of the word frequencies, and the ressource itself can be found in:
François, T., Gala, N., Watrin, P. et Fairon, C. FLELex: a graded lexical resource for French foreign learners. In the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, Iceland, 26-31 May.
What's in FLELex?
For every word in FLELex, you can find his part-of-speech (P.O.S.) along with his normalized frequency for each level of the CEFR, and his total normalized frequency in our corpus. Here are some of entries from FLELex:
|sous réserve de||PREP||0||0||0.361||0||0||0||0.03|
How to get it?
You can download it from the Download FLELex page. Be aware that there are two versions of FLELex: FLELex-TT and FLELex-CRF.
Why 2 FLELex?
To build such a resource as FLELex, the corpus need to be P.O.S.-tagged automatically. To achieve this task, it is common to used a tagger. They do not share the same characteristics, which nevertheless impact the resulting resource. We have therefore selected two taggers with very different features and build two different versions of FLELex.
See the Download FLELex page for more details on the two versions.
How to use it?
The format is a .CSV file with 9 columns (see above), encoded in UTF-8.
Who did it?
FLELex is the result of a collaboration between three teams:
There are two FLELex versions available. Feel free to download any of them and use it for your own research.
To help you choose which of the two you want to use, a few explanations are provided below.
|Number of entries||14,236 lemmas|
|Tagger used||TreeTagger (Schmid, 1994)|
|Includes multiword expressions||No|
|Recommended use||For NLP purposes, since the POS-tagset of FLELex-TT is the same as that of the TreeTagger. It is thus very easy to automatically analyse a text using TreeTagger and FLELex.|
|Number of entries||17,871 lemmas|
|Tagger used||EarlyTracks CRF Tagger|
|Includes multiword expressions||Yes|
|Recommended use||For pedagogical purposes, since this resource includes multiword expressions (very useful for language learners) and the tagging accuracy was higher. However, for NLP use, you should either use the EarlyTracks CRF Tagger or adapt the tagset yourself.|
With FLELex, it is possible to analyse the lexical complexity of a French text for a specific CEFR proficiency level. All you need to do is introduce a text of your choice and we'll do the analysis for you. For additional tips and tricks on how to interpret the analysis, please consult the "How-to" tab below.
The program analyses the lexical difficulty of French texts for French foreign language learners and defines the level of difficulty for each lexical unit in the text for a specific CEFR level. For further information on how this analysis is done, see the How to interpret the results tab.
In order use our program, you can either copy and paste a text, or upload a text file in the New text tab. Please note that currently only plain text files (e.g. .txt) are supported.
After having uploaded the text of your choice, you can define the CEFR level for which you wish to analyse the lexical difficulty. For researchers or teachers: you can pick the specific proficiency level which interests you. For learners: you can use your own proficiency level.
Finally, you can define the tagger you wish to use for lemmatising your text. You can choose between the two taggers used for creating FLELex, i.e. TreeTagger or the EarlyTracks CRF Tagger. However, only the former is currently fully integrated into the program. As a reminder, TreeTagger only lemmatises simple words, whereas the EarlyTracks CRF Tagger lemmatises both simple words and multiword expressions.
After having uploaded your text and having defined the proficiency level as well as the tagger, you are now ready to proceed to the analysis!
After having entered a text, the program analyses the level of difficulty of each lexical unit that has been lemmatised (except for numerical units and named entities). The level of difficulty corresponds to the CEFR level as attested in FLELex. In order to caculate this proficiency level, the program selects the level in which the lexical unit is first observed.
The program thus analyses the lexical complexity for a specific proficiency level by identifying the words that have been given a CEFR level beyond the proficiency level that you have chosen. These difficult lexical units have been highlighted. For highlighting the difficult words, the program uses different colour schemes which reflect the word's CEFR level :
As for the lexical units that have not been found in FLELex, they are highlighted in red and are considered the most difficult, seeing that they have not been observed in French foreign language learning textbooks.
Finally, if you wish to obtain more linguistic information, you can click on each highlighted word to see its CEFR level, its lemma and its part of speech. If you wish to search the word in FLELex, you can click on the chart symbol, which will redirect you to the Search FLELex page.