SweLLex is a lexicon of productive vocabulary for Swedish as a second/foreign language (SVA). Like its sister resource, SVALex, it reports the normalized frequencies of words (lemmas) across six levels of the CEFR (Common European Framework of Reference for Languages). In the same fashion as SVALex, it contains information on both single word usage, multi-word expressions, as well as information on their usage at different levels, something that is rarely present in the resources of this kind.
The frequencies have been estimated on a corpus of essays written by SVA learners, SweLL corpus, described in the article:
Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.
More details on SweLLex resource are provided in the following article:
Elena Volodina, Ildikó Pilán, Lorena Llozhi, Baptiste Degryse, Thomas François. 2016. SweLLex: second language learners' productive vocabulary. Proceedings of the workshop on NLP4CALL&LA. NEALT Proceedings Series / Linköping Electronic Conference Proceedings
If you are using SweLLex, please, cite this article.
Features
create |
Productive lexiconincludes word frequencies observed in learner essays |
---|---|
bar_chart |
CEFR levelsA1 · A2 · B1 · B2 · C1 · C2 |
toc |
Lexical entrieslemma (word)part of speech (tag) |
calculate |
Computed metricslevel_freq · normalized frequency (per 1 million words) for each level of the CEFRtotal_freq · total normalized frequency in the source corpus nb_doc · document frequency |
Format
The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an excel sheet.
Lemma | POS-tag | A1 | A2 | B1 | B2 | C1 | Total |
---|---|---|---|---|---|---|---|
bil | NN_UTR | 430.2138 | 1234.2078 | 728.9847 | 422.283 | 363.5446 | 618.8567 |
överge | VB | 0 | 0 | 7.3203 | 24.5182 | 39.6516 | 17.2695 |
rättvisa | NN_UTR | 0 | 0 | 3.6601 | 25.6189 | 26.4344 | 13.6602 |
kilo | NN_NEU | 0 | 302.0833 | 145.1229 | 65.0611 | 13.2172 | 89.8907 |
resa | VB | 166.3009 | 375.2582 | 450.3526 | 298.4905 | 330.4297 | 356.362 |
låg | JJ | 0 | 49.315 | 125.922 | 217.3103 | 252.1311 | 156.126 |
så klart | ABM_MWE | 0 | 16.2635 | 81.6019 | 45.5033 | 13.2172 | 38.1738 |
till skillnad från | PPM_MWE | 0 | 0 | 5.3395 | 2.409 | 3.6699 | 5.1839 |
Usage
search SearchThe resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.
bar_chart AnalyseThe resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.
Authors
SVALex is the result of a collaboration between two teams:
- The Center for Natural Language Processing (CENTAL) at UCLouvain;
- The Språkbanken research unit of the University of Gothenburg.
Contributors
Brayan Delmée Logo Design
Dorian Ricci, Baptiste Degryse & Anaïs Tack Prototype design
Damien De Meyere Website maintenance