SemRepoย is an RDF knowledge graph with over 81 million triples on nearly 200,000 GitHub repositories linked to scientific research. SemRepo captures fine-grained repository-level metadata (e.g., contributors, issues, programming languages) and interlinks this with external scholarly knowledge graphs: repositories to publications inย LPWC, repository authors to their profiles inย SemOpenAlex, and research artifacts (e.g., datasets, experiments) are linked viaย MLSea.
What exactly do we provide?
- Periodically updated (approx. twice per year) RDF dump files of the SemRepo Knowledge Graph.
- A publicly accessible SPARQL endpoint containing the latest SemRepo Knowledge Graph data.
- An OWL ontology with 19 classes and 47 relations for modelling research-connected software repositories, including VoID and DCAT.
- An open-source pipeline for SemRepo construction and automatic interlinking.
- VoID and DCAT metadata descriptions for dataset discovery, access, and interoperability.
- URI resolution of the SemRepo Knowledge Graph within the Linked Open Data Cloud.
- Semantic interlinking with external scholarly knowledge graphs.
How big is the SemRepo Knowledge Graph?
SemRepo.org contains (as of April 2026)*:
๐๏ธ Repositories: 197,566
๐งโ๐ป Contributors: 2,916,508
๐ท๏ธ Issues: 2,609,510
๐ง Programming Language: 387,284
๐ข Organizations: 12,879
๐ง Packages: 95,505
๐งต Research Topics: 272,378
๐งโ๐ฌ Linked SemOpenAlex Authors: 11,867 (see https://semopenalex.org)
๐ Linked LPWC Repositories: 197,566 (see https://linkedpaperswithcode.com)
๐ MLSea Software entities: 148,185 (see https://w3id.org/mlsea)
*core classes only, in total SemRepo contains 19 classes.