Masterarbeit
The role of Docker in Computational Reproducibility of Jupyter Notebooks from Scholarly Publications PubMed Central
Completion
2025/02
Research Area
Students
Hemanta Lo
Advisers
Dr. Sheeba Samuel
Prof. Dr.-Ing. Martin Gaedke
Description
Trustworthy science requires reproducibility, but it is a challenge in computational research. The main reason for this is mainly due to software, operating systems, and dependent differences causing inconsistent results – especially when we are using Jupiter notebooks. However, these notebooks are often used by biomedical researchers to document and share experiments, but are consistently dependent on specific software setups that are difficult for other researchers to reproduce. Others may find it difficult to verify the findings or verify them. At this point, our study proposes the use of Docker to create consistent environments distilling the original research setting and replicating them exactly. Our project introduces a reproducibility pipeline that uses Docker containers to standardize computational environments. Although biomedical publications are used as a test case, the methodology is designed for broader applicability across diverse research domains. By replicating the original results using these Docker environments, we can run the notebooks under control to make sure that they run in a controlled setting. For each of these steps, we detail exactly which techniques we use, as well as any errors or differences from the original experiments, right to the smallest detail in the notebooks. The testing process, conducted on a limited set of five biomedical repositories, demonstrated promising results. The use of Docker played a pivotal role in addressing reproducibility challenges by providing a controlled and isolated environment for execution. Through detailed logging and comprehensive analysis, the approach allowed the identification and resolution of errors with precision.


