SweLLex is a lexicon of productive vocabulary for Swedish as a second/foreign language (SVA). Like its sister resource, SVALex, it reports the normalized frequencies of words (lemmas) across six levels of the CEFR (Common European Framework of Reference for Languages). In the same fashion as SVALex, it contains information on both single word usage, multi-word expressions, as well as information on their usage at different levels, something that is rarely present in the resources of this kind.

The frequencies have been estimated on a corpus of essays written by SVA learners, SweLL corpus, described in the article:

Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.

More details on SweLLex resource are provided in the following article:

Elena Volodina, Ildikó Pilán, Lorena Llozhi, Baptiste Degryse, Thomas François. 2016. SweLLex: second language learners' productive vocabulary. Proceedings of the workshop on NLP4CALL&LA. NEALT Proceedings Series / Linköping Electronic Conference Proceedings

If you are using SweLLex, please, cite this article.

Features

create

Productive lexicon
includes word frequencies observed in learner essays

bar_chart

CEFR levels
A1 · A2 · B1 · B2 · C1 · C2

toc

Lexical entries
lemma (word)
part of speech (tag)

calculate

Computed metrics
level_freq · normalized frequency (per 1 million words) for each level of the CEFR
total_freq · total normalized frequency in the source corpus
nb_doc · document frequency

Format

The format is a .CSV (tab separated values) file with 8 columns (see above), encoded in UTF-8. You can also open it in an excel sheet.

Lemma POS-tag A1 A2 B1 B2 C1 Total
bil NN_UTR 430.2138 1234.2078 728.9847 422.283 363.5446 618.8567
överge VB 0 0 7.3203 24.5182 39.6516 17.2695
rättvisa NN_UTR 0 0 3.6601 25.6189 26.4344 13.6602
kilo NN_NEU 0 302.0833 145.1229 65.0611 13.2172 89.8907
resa VB 166.3009 375.2582 450.3526 298.4905 330.4297 356.362
låg JJ 0 49.315 125.922 217.3103 252.1311 156.126
så klart ABM_MWE 0 16.2635 81.6019 45.5033 13.2172 38.1738
till skillnad från PPM_MWE 0 0 5.3395 2.409 3.6699 5.1839
Example of some entries from SVALex

Usage

search Search

The resource can be used to compare the frequency distribution of multiple words along the CEFR scale. An online query interface is available and can be accessed via the Search tab.

bar_chart Analyse

The resource can also be used to analyze the complexity of words in a text, in particular to identify which of the words in a text will be difficult at a given level. An online complexity analyzer is available and can be accessed via the Analyze tab.

Authors

SVALex is the result of a collaboration between two teams:

Contributors

Brayan Delmée
Logo Design

Dorian Ricci, Baptiste Degryse & Anaïs Tack
Prototype design

Damien De Meyere
Website maintenance