Jump to main content Hotkeys
Distributed and Self-organizing Systems
Distributed and Self-organizing Systems


Automated Metadata Annotation of Research Data with Homonym Disambiguation
Automated Metadata Annotation of Research Data with Homonym Disambiguation



Research Area

Intelligent Information Management





CKAN is a popular repository and framework for Research Data and Research Data
Management. Its core functionality consists of the ability to upload datasets, as well as to annotate them with metadata, such as keywords, description, and title. In the context of enriching datasets with metadata, finding the most suitable tags out of a predefined set of available keywords for a specific dataset can be quite difficult as that requires knowledge about the already existing tags which could result in improper annotated datasets. Additionally, some tags might be homonyms which - if not disambiguated - make it hard to correctly classify a dataset. This thesis explores algorithms and techniques for automatic annotation of datasets with plausible tags based on a specified context, e.g., the dataset description, title or previous work of the author and their field. Additionally, the investigated and implemented techniques should also be aware of the semantics of the keyword and thus capable of automatically disambiguating homonyms. The objective of this thesis consists of research into the different approaches of automatic annotation of datasets with keywords and their capabilities in performing disambiguation on homonym tags as described above. Thus, a thorough State-of-the-art analysis must be conducted, and the most applicable approaches should be implemented, and their performance evaluated in an objective manner based on different existing metrics and benchmarks. Additionally, a demonstrator has to be created which shows the capabilities of the implemented algorithms.

Powered by DGS
Edit list (authentication required)

Press Articles