NT2Lex is a lexical database for Dutch as a foreign language (NT2) that includes frequency distributions of words observed in texts graded along the six-level scale of the Common European Framework of Reference for Languages. It is a receptive graded lexicon, with word frequencies observed in textbook reading activities and simplified readers targeting learners of Dutch.
More information can be found in the following paper. When using the resource(s) in your research or publication, please cite this paper as well.
Tack, A., François, T., Desmet, P. & Fairon, C. (2018). NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 137-146).
Features
menu_book |
Receptive lexiconincludes word frequencies observed in textbook reading activities and simplified readers |
---|---|
bar_chart |
CEFR levelsA1 · A2 · B1 · B2 · C1 |
toc |
Lexical entrieslemma (word)part of speech (tag) · simplified CGN tagset sense number (sense_se-id) · Open Dutch WordNet synset (sense_sy-id) · Open Dutch WordNet |
calculate |
Computed metricsF · raw frequencyD · dispersion index SFI · standard frequency index U · normalized frequency (per 1 million words) tf-idf · term frequency-inverse document frequency |
Format
The resource is formatted as a tab-separated file (UTF-8 encoding) with one line per entry. Each entry has been automatically tagged with its lemma, part of speech, sense and synset (if applicable).
Per each observed CEFR level, a number of frequency metrics (METRIC@LEVEL) are computed on the corresponding graded texts, as well as on all levels (METRIC@TOTAL). If an entry does not appear in a specific level, the corresponding columns are set to an empty symbol (-).
Usage
search SearchThe resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.
bar_chart AnalyseThe resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.
get_app DownloadYou can use EFLLex in NLP tasks but also for pedagogical and language assessment purposes.
Authors
Anaïs Tack
CENTAL, UCLouvain (BE)
ITEC, KU Leuven (BE)
Thomas François
CENTAL, UCLouvain (BE)
Piet Desmet
ITEC, KU Leuven (BE)
Cédrick Fairon
CENTAL, UCLouvain (BE)
Contributors
Anne-Sophie Desmet
Corpus Annotation
Brayan Delmée Logo Design
Damien De Meyere Website
Acknowledgements
This research was funded by an F.R.S.-FNRS research grant.