In this page you can find Thesis and PhD projects proposed by our team. Contact us if you are interested.
TITLE: Natural Language Processing and text understanding: dealing with context and events
DESCRIPTION: AI applications are today facing unprecedented requests: dealing with manyfold unstructured text documents (as diverse as news reports, product reviews, scientific papers, social media communications, magazines, etc.) to extract factual information, specific information and trends, opinions and latent biases. While in the last few years many successful lexical resources have been proposed to deal with many traditional NLP tasks, two influential areas of investigation are emerging. The former has to do with the contextual nature of lexical meaning, which is at the base of semantic composition; the latter involves considering verbal semantics, and extending it in order to account for events. This doctoral proposal is aimed at targeting such problems.
Various applicative domains (amongst which documents categorization, keywords extraction, open domain question answering, text summarization, etc.) will be considered, and manyfold applications can be envisioned, also in accord with the interest of candidates, to tackle, e.g., figurative uses of language, fake detection in social media, knowledge graphs induction and embedding, events detection and extraction. Full opportunity will be given to the PhD candidates to pursue their interests in choosing the application field(s), such as legal domain, medical domain, cultural heritage, physical sciences.
TITLE: Performing Word Sense Disambiguation using Sense Embeddings
DESCRIPTION: Word Sense Disambiguation (WSD) is a long-standing but still open problem in Natural Language Processing (NLP). Current supervised WSD methods treat senses as discrete labels, however, recent works have proposed to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space. This master thesis proposal aims at the adoption of LessLex -a resource recently developed within the Computer Science Department- as an embedding space to solve the WSD task. LessLex is a set of semantic embeddings grounded on the BabelNet sense inventory: multilingual access is governed by the mapping of terms onto their underlying sense descriptions, in such a way that all vectors (for both terms and senses) share the same semantic space. As a result, for each term we have the “blended” terminological vector along with those describing all senses associated to that term. By exploiting this feature we aim at extending the currently proposed techniques, especially focusing on zero-shot Word Sense Disambiguation.
TITLE: Building a system for medical texts analysis and misspells corrections
DESCRIPTION: Noise in textual data is a very common phenomenon afflicting text documents, especially when dealing with informal texts such as chats, SMS and e-mails, and internal reports. This kind of text inherently contains spelling errors, special characters, non-standard word forms, grammar mistakes, and so on. While a lot of effort has been put in solving these issues for general texts, literature shows a limited but significant interest on the issue of correcting noise in medical text documents, however, most of the works are focused on the English language. This master’s thesis proposal aims at the development of a system for the analysis and detection of misspellings in Italian medical text documents. State-of-the art approaches and novel techniques based on neural networks will be exploited to detect misspellings in a collection of emergency room reports collected in Italian hospitals by the Italian National Institute of Health. The final spell-checker will be implemented into the pipeline of the ViDec (Violence Detector) system, which examines these textual reports to automatically detect injuries stemming from violence acts while also making explicit which elements of the record define the violent act itself.
Feel free to contact us if you are interested in any of the proposals.