PSG721: Neural text analysis models enhanced with external linguistic resources (2022-2025)

The accuracy of the natural language processing (NLP) systems is heavily dependent on the amount of annotated data. For many NLP tasks and languages, the annotated training sets are not large enough to train reliable models. While the data annotation is an ongoing process, there already exist several data resources, typically in the form of lexicons, ontologies and rule-based systems that are not exploited in modern deep learning models. This project will study methods for integrating various external linguistic resources into neural networks. Enhancing neural models with this external information can significantly improve the accuracy of predictions of tasks in the limited data settings. The result of this project is a framework that specifies relevant factors that influence the effect size of an enhanced model, giving specific guidelines of when and how to most optimally incorporate external resources into NLP systems.

Principal Investigator: Kairit Sirts, PhD


EstBERT

Training EstBERT was part of the EKTB11 project. The continuing goal is to benchmark models fine-tuned from EstBERT on various Estonian NLP tasks and make these fine-tuned models available via Huggingface.


EKTB11: Neural network based text analysis models for Estonian (2018-2022)

During the last few years, the natural language processing field has experienced a considerable technological shift due to the rapid developments in artificial neural networks technology. For many text analysis tasks concerned with automatic analysis of linguistic structure, neural models have proved to be more successful than the previously used statistical models. The automatic text analysis tools developed so far for Estonian are either rule-based or statistical. Based on the literature it can be expected that by adopting neural models their performance can be improved. The goal of this project is to bring the automatic text analysis tools for Estonian up to date by transferring them to neural technologies with the goal of improving their accuracy and quality.

Principal Investigator: Kairit Sirts, PhD