Ethics & Sustainability

Open Science & Compliance Overview

FAIR Principles • CC0 Data • MIT Licensed Code • Open Access • Versioned Releases • RDF/OWL Standards • SPARQL Endpoint • Reproducible Pipeline

FAIR Principles

SemRepo adheres to the FAIR data principles:

Findable – All releases are published with persistent identifiers and are indexed via Zenodo, GitHub, and the project website.
Accessible – Data is openly available via RDF dumps and a public SPARQL endpoint without authentication barriers.
Interoperable – The dataset uses standardized semantic web formats (RDF, OWL, SPARQL, VoID) and links to external scholarly knowledge graphs.
Reusable – Data and pipelines are openly licensed and versioned to support reuse, replication, and extension.

Accessibility & Interoperability

SemRepo provides multiple access methods to support diverse use cases:

RDF data dumps for full dataset download
SPARQL endpoint for structured querying
Standardized ontologies (RDF/OWL) for semantic consistency
Interlinking with external scholarly knowledge graphs for integration into broader ecosystems

These design choices ensure compatibility with both academic and industrial applications.

Reusability & Versioning

SemRepo follows a versioned release strategy with periodic updates (approximately twice per year, depending on upstream data availability).

Each release includes:

A complete dataset snapshot
Metadata and provenance information
Fully reproducible construction pipelines

This ensures that results can be independently verified and extended by the research community.

Licensing

SemRepo is released under open licenses to maximize reuse and transparency:

Data: Creative Commons CC0 (public domain dedication)
Code & Pipeline: MIT License

This licensing model allows unrestricted reuse of both data and software, including commercial and academic applications, with minimal constraints. SemRepo is released under the CC0 license to maximize reuse and interoperability; users are nevertheless encouraged to cite the associated publication and dataset.

Ethical Considerations

SemRepo is built exclusively from publicly available software repositories and scholarly metadata sources.

However, we acknowledge that it inherits structural biases and coverage limitations from upstream sources such as GitHub and linked scholarly knowledge graphs, including uneven representation across languages, regions, and research communities. Transparent provenance tracking and regular updates are provided to support responsible interpretation and use of the dataset.