LessLex

About LessLex

LessLex [1] is novel multilingual lexical resource. Different from the vast majority of existing approaches, we ground our embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term we have thus the “blended” terminological vector along with those describing all senses associated to that term.

LessLex has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. We experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results.

Version 2.0

The version 2.0 of LessLex has been built by merging ConceptNet Numberbatch 19.08 and BabelNet 4.0.

The following table reports some statistics on the resource.

StatisticsAllNounsVerbsAdjectives
Seed terms196,710140,67817,88738,145
Terms in BabelNet177,715137,19514,40026,120
T+ avg. cardinality6.356.259.155.51
Discarded Senses----
Unique Senses329,201293,51214,19521,494
Avg. senses per term3.083.412.831.50
Total extracted terms406,217382,31712,82411,076
Avg. extracted terms per call1.221.411.241.00
WordNet Coverage113,91282,11313,73918,060

Publication date

9 July 2020

Download “LessLex 2.0”

LessLex_v1.0.tar.bz2 – Downloaded 171 times – 119.33 MB

Version 1.0

The version 1.0 of LessLex has been built by merging ConceptNet Numberbatch 17.06 and BabelNet 4.0.

The following table reports some statistics on the resource.

StatisticsAllNounsVerbsAdjectives
Seed terms84,62045,29211,94327,380
Terms in BabelNet65,62941,8178,45715,355
T+ avg. cardinality6.406.169.476.37
Discarded Senses16,66614,7373681,561
Unique Senses174,300148,38011,03814,882
Avg. senses per term4.806.123.771.77
Total extracted terms227,850206,6038,67112,576
Avg. extracted terms per call1.401.461.061.05
WordNet Coverage61,000---

Publication date

9 July 2020

Download “LessLex 1.0”

LessLex_v1.0.tar.bz2 – Downloaded 100 times –

Creative Commons License All of the above data is licensed under a Creative Commons Attribution 3.0 United States License.

Reference papers

[1] [pdf] [doi] D. Colla, E. Mensa, and D. P. Radicioni, “Lesslex: linking multilingual embeddings to sense representations of lexical items,” Computational linguistics, vol. 46, iss. 2, pp. 289-333, 2020.
[Bibtex]
@article{colla2020lesslex,
author = {Colla, Davide and Mensa, Enrico and Radicioni, Daniele P.},
title = {LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items},
journal = {Computational Linguistics},
volume = {46},
number = {2},
pages = {289-333},
year = {2020},
doi = {10.1162/coli\_a\_00375},
URL = {
https://doi.org/10.1162/coli_a_00375
},
eprint = {
https://doi.org/10.1162/coli_a_00375
}
,
abstract = { We present LESSLEX, a novel multilingual lexical resource. Different from the vast majority of existing approaches, we ground our embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term we have thus the “blended” terminological vector along with those describing all senses associated to that term. LESSLEX has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. We experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. We conclude by arguing that LESSLEX vectors may be relevant for practical applications and for research on conceptual and lexical access and competence. },
pdf={http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00375}
}