Masterarbeit
NLP-Based Automation of Systematic Literature Reviews in Computer Science (focus on conduction phase)
Completion
2026/10
Research Area
Students
Md. Mehedee Zaman Khan
Advisers
Jan Haas M.Sc.
Dr.-Ing. Sebastian Heil
Description
In computer science research, the rapid growth of academic publications is making it increasingly difficult to conduct comprehensive and efficient literature reviews. Systematic literature reviews (SLRs) offer a structured methodology to address this challenge. Carrera-Rivera et al. have proposed a framework for conducting SLRs in the field of computer science, using an approach that divides the framework into a planning or preparation phase and a conducting phase in which the actual review is performed. Existing platforms such as Elicit or the Research Mode (often referred to as DeepResearch) of LLM providers offer partial solutions for automating the review process, but are either closed-source requiring monetary resource investment or do not correspond to formal SLR methods and are thus unlikely to be deterministic, hindering reproducibility of the results. The overall objective of this thesis is the construction of a fully locally deployable SLR pipeline that leverages the recent advancements in generative AI and related NLP methods, producing an interactive taxonomy of approaches and methods for a specific research question in the field of computer science.
The preparation part of the SLR framework specified by Carrera-Rivera et al. focuses on building search strings for relevant digital libraries which can be utilized by the later review conduction phase. It is a critical component of the pipeline as it provides foundational information to accurately identify publications relevant to the initial research question. This thesis focuses on the automation of the manual preparation phase utilizing natural language processing capabilities of Large Language Models and related modern NLP tools. A state-of-the-art analysis of adjacent approaches for building search strings for corresponding digital libraries must be performed. A concept building on the proposed framework by Carrera-Rivera et al. must be derived and subsequently, a prototype must be implemented and described, serving as the basis for the later pipeline stages. The proposed approach will be evaluated based on SLR search-query assessment frameworks taken from literature. Alternatively, a similar suitable evaluation strategy must be defined and applied to the proposed approach.
The second phase of the SLR framework proposed by Carrera-Rivera et al. focuses on conducting the literature review based on the outputs of the planning phase. This thesis aims to automate this conduction process using natural language understanding capabilities of LLMs and related NLP methods. The output of the conduction process consists in a hierarchically structured taxonomy of algorithms, methods and approaches relevant to the


