Masterarbeit
Extending Jupyter Notebook Reproducibility Analysis to Non-GitHub Repositories (Codeberg, Zenodo)
Completion
in progress
Research Area
Intelligent Information Management
Students
David Keci
Advisers
Dr. Sheeba Samuel
Description
The majority of computational reproducibility studies targeting Jupyter notebooks have focused on notebooks hosted on GitHub, which is the dominant platform for open-source code sharing. However, a significant and growing portion of scientific software is hosted on alternative platforms including Codeberg, and Zenodo, each with different community norms, metadata standards, and accessibility characteristics. Limiting reproducibility analysis to GitHub introduces selection bias and excludes important segments of the research software community, particularly those prioritizing open-source infrastructure or long-term archival. The FAIR Jupyter (https://doi.org/10.4230/TGDK.2.2.4) pipeline and associated Knowledge Graph are currently designed for GitHub-hosted repositories, and their data collection, metadata extraction, and storage components are not compatible with Codeberg, or Zenodo. This thesis aims to extend the FAIR Jupyter pipeline to support notebook discovery, retrieval, and reproducibility analysis from Codeberg, and Zenodo. It will adapt existing pipeline components where possible and develop new connectors and metadata normalization layers for each platform. The thesis will also conduct a comparative analysis of reproducibility rates and notebook characteristics across platforms to determine whether platform choice is associated with differences in reproducibility. The thesis should deliver production-ready pipeline extensions for Codeberg, and Zenodo, including platform-specific API connectors, metadata normalization schemas, and integration tests. It should produce a cross-platform dataset of reproducibility outcomes and a comparative analysis report. The KG schema should be extended to represent platform provenance, and documentation should be provided to guide future contributors in adding support for additional platforms.


