NT2Lex · A CEFR-graded lexical resource for Dutch as a foreign language

A CEFR-graded lexical resource for Dutch as a foreign language

NT2Lex is a lexical database for Dutch as a foreign language (NT2) that includes frequency distributions of words observed in texts graded along the six-level scale of the Common European Framework of Reference for Languages. It is a receptive graded lexicon, with word frequencies observed in textbook reading activities and simplified readers targeting learners of Dutch.

More information can be found in the following paper. When using the resource(s) in your research or publication, please cite this paper as well.

Tack, A., François, T., Desmet, P. & Fairon, C. (2018). NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 137-146).

Features

	Receptive lexicon includes word frequencies observed in textbook reading activities and simplified readers
	CEFR levels A1 · A2 · B1 · B2 · C1
	Lexical entries lemma (`word`) part of speech (`tag`) · simplified CGN tagset sense number (`sense_se-id`) · Open Dutch WordNet synset (`sense_sy-id`) · Open Dutch WordNet
	Computed metrics F · raw frequency D · dispersion index SFI · standard frequency index U · normalized frequency (per 1 million words) tf-idf · term frequency-inverse document frequency

Format

The resource is formatted as a tab-separated file (UTF-8 encoding) with one line per entry. Each entry has been automatically tagged with its lemma, part of speech, sense and synset (if applicable).

Per each observed CEFR level, a number of frequency metrics (METRIC@LEVEL) are computed on the corresponding graded texts, as well as on all levels (METRIC@TOTAL). If an entry does not appear in a specific level, the corresponding columns are set to an empty symbol (-).

Usage

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

Download

You can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.

Authors

Anaïs Tack
CENTAL, UCLouvain (BE)
ITEC, KU Leuven (BE)

Thomas François
CENTAL, UCLouvain (BE)

Piet Desmet
ITEC, KU Leuven (BE)

Cédrick Fairon
CENTAL, UCLouvain (BE)

Contributors

Anne-Sophie Desmet
Corpus Annotation

Brayan Delmée
Logo Design

Damien De Meyere
Website

Acknowledgements

This research was funded by an F.R.S.-FNRS research grant.

Features

Receptive lexicon

CEFR levels

Lexical entries

Computed metrics

Format

Usage

Authors

Contributors

Acknowledgements