NT2Lex is a lexical database for Dutch as a foreign language (NT2) that includes frequency distributions of words observed in texts graded along the six-level scale of the Common European Framework of Reference for Languages. It is a receptive graded lexicon, with word frequencies observed in textbook reading activities and simplified readers targeting learners of Dutch.

More information can be found in the following paper. When using the resource(s) in your research or publication, please cite this paper as well.

Tack, A., François, T., Desmet, P. & Fairon, C. (2018). NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 137-146).

Features

menu_book

Receptive lexicon
includes word frequencies observed in textbook reading activities and simplified readers

bar_chart

CEFR levels
A1 · A2 · B1 · B2 · C1

toc

Lexical entries
lemma (word)
part of speech (tag) · simplified CGN tagset
sense number (sense_se-id) · Open Dutch WordNet
synset (sense_sy-id) · Open Dutch WordNet

calculate

Computed metrics
F · raw frequency
D · dispersion index
SFI · standard frequency index
U · normalized frequency (per 1 million words)
tf-idf · term frequency-inverse document frequency

Format

The resource is formatted as a tab-separated file (UTF-8 encoding) with one line per entry. Each entry has been automatically tagged with its lemma, part of speech, sense and synset (if applicable).

Per each observed CEFR level, a number of frequency metrics (METRIC@LEVEL) are computed on the corresponding graded texts, as well as on all levels (METRIC@TOTAL). If an entry does not appear in a specific level, the corresponding columns are set to an empty symbol (-).

Usage

search Search

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

bar_chart Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

get_app Download

You can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.

Authors

Anaïs Tack
CENTAL, UCLouvain (BE)
ITEC, KU Leuven (BE)

Thomas François
CENTAL, UCLouvain (BE)

Piet Desmet
ITEC, KU Leuven (BE)

Cédrick Fairon
CENTAL, UCLouvain (BE)

Contributors

Anne-Sophie Desmet
Corpus Annotation

Brayan Delmée
Logo Design

Damien De Meyere
Website

Acknowledgements

This research was funded by an F.R.S.-FNRS research grant.