About ClOSeSt

ClOSeSt is a lexical resource that provides the most common senses (expressed as BabelNet synset IDs) for a set of terms given as input to the system [1]. Different version of the resource are offered, depending on the selected set of input terms.

Learn about the ClOSeSt structure

ClOSeSt is a json textual file that contains a series of entries, one for each retained sense of the terms given in input.

Each sense is organized in four fields, namely:

term that is the term provided to the system as input;
bsi that is the BabelNet synset ID for the term (it’s sense ID);
wikititle that is the WikiTitle of the sense associated to the term (it may be empty).
synset that is the list of lexicalization for the sense representing the term.

COCA Complete v1.0
COCA Core v1.0

COCA Complete v1.0

This version of ClOSeSt contains the most common senses for the all the terms of the english language extracted from the COCA corpus.

Publication date

27 June 2017

Statistics about this version

	Count
Input terms	26,956
Initial average polysemy	4.81
Final number of entries	36,110
Final number of unique terms	24,649
Final number of unique senses	28,518
Final average polysemy	1.46

Download “ClOSeSt COCA-Complete 1.0”

COCA-Complete_v1.0.tar.bz2 – Downloaded 80 times –

COCA Core v1.0

EAC Core v1.0

This version of ClOSeSt contains the most common senses for the 10.000 most frequent terms of the english language. The terms have been extracted from the COCA corpus.

Publication date

02 February 2017

Statistics about this version

	Count
Input terms	10,000
Initial average polysemy	6.08
Final number of entries	17,471
Final number of unique terms	9,775
Final number of unique senses	2,458
Final average polysemy	1.79

Download “ClOSeSt COCA-Core”

COCA-Core_v1.0.tar.bz2 – Downloaded 80 times –

All of the above data is licensed under a Creative Commons Attribution 3.0 United States License.

Reference papers

[1]

A. Lieto, E. Mensa, and D. P. Radicioni, “Taming Sense Sparsity: a Common-Sense Approach,” in Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian., 2016.
[Bibtex]

@inproceedings{lieto16taming,
author = {Antonio Lieto and
Enrico Mensa and
Daniele P. Radicioni},
title = {{Taming Sense Sparsity: a Common-Sense Approach}},
booktitle = {{Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) {\&} Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.}},
year = {2016},
url = {http://ceur-ws.org/Vol-1749/paper31.pdf},
pdf = {http://delorean.di.unito.it/ls/papers/lieto16taming.pdf}
}