ClOSeSt

About ClOSeSt

 

ClOSeSt is a lexical resource that provides the most common senses (expressed as BabelNet synset IDs) for a set of terms given as input to the system [1]. Different version of the resource are offered, depending on the selected set of input terms.

ClOSeSt is a json textual file that contains a series of entries, one for each retained sense of the terms given in input.

Each sense is organized in four fields, namely:

  • term that is the term provided to the system as input;
  • bsi that is the BabelNet synset ID for the term (it’s sense ID);
  • wikititle that is the WikiTitle of the sense associated to the term (it may be empty).
  • synset that is the list of lexicalization for the sense representing the term.

COCA Complete v1.0

This version of ClOSeSt contains the most common senses for the all the terms of the english language extracted from the COCA corpus.

Publication date

27 June 2017

Statistics about this version
Count
Input terms26,956
Initial average polysemy4.81
Final number of entries36,110
Final number of unique terms24,649
Final number of unique senses28,518
Final average polysemy1.46

Download “ClOSeSt COCA-Complete 1.0”

COCA-Complete_v1.0.tar.bz2 – Downloaded 80 times –

EAC Core v1.0

This version of ClOSeSt contains the most common senses for the 10.000 most frequent terms of the english language. The terms have been extracted from the COCA corpus.

Publication date

02 February 2017

Statistics about this version
Count
Input terms10,000
Initial average polysemy6.08
Final number of entries17,471
Final number of unique terms9,775
Final number of unique senses2,458
Final average polysemy1.79

Download “ClOSeSt COCA-Core”

COCA-Core_v1.0.tar.bz2 – Downloaded 80 times –

Creative Commons License All of the above data is licensed under a Creative Commons Attribution 3.0 United States License.

Reference papers

[1] [pdf] A. Lieto, E. Mensa, and D. P. Radicioni, “Taming Sense Sparsity: a Common-Sense Approach,” in Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian., 2016.
[Bibtex]
@inproceedings{lieto16taming,
author = {Antonio Lieto and
Enrico Mensa and
Daniele P. Radicioni},
title = {{Taming Sense Sparsity: a Common-Sense Approach}},
booktitle = {{Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) {\&} Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.}},
year = {2016},
url = {http://ceur-ws.org/Vol-1749/paper31.pdf},
pdf = {http://delorean.di.unito.it/ls/papers/lieto16taming.pdf}
}