We present SemRepo, an RDF knowledge graph of nearly 200,000 GitHub repositories linked to scientific papers. It contains over 81 million triples describing contributors, issues, dependencies, and languages, and connects repository authors to profiles in SemOpenAlex.
What exactly do we provide?
- Periodically updated RDF dump files of the SemRepo Knowledge Graph.
- URI resolution of the SemRepo Knowledge Graph within the Linked Open Data Cloud.
- A publicly accessible SPARQL endpoint containing the latest SemRepo Knowledge Graph data.
- Semantic interlinking of GitHub repositories with LinkedPapersWithCode , MLSea publications and SemOpenAlex author profiles.
- An OWL ontology with 19 classes and 47 relations for modelling research-connected software repositories.
How big is the Data Set Knowledge Graph?
SemRepo.org contains
🗃️ Repositories: 197,566
🧑💻 Contributors: 2,916,508
🏷️ Issues: 2,609,510
🏢 Organizations: 12,879
🧠 Packages: 95,505
🧵 Topics: 272,378
🧑🔬 Linked SemOpenAlex Authors: 11,867
🔗 Linked LPWC Repositories: 197,566
🔗 MLSea Software entities: 148,185
Potential use cases:
- Track research reproducibility by linking papers to active GitHub repositories and development metrics.
- Analyze collaboration patterns through contributor–repository graphs and team activity signals.
- Bridge research and industry by identifying open-source implementations of academic work.
- Discover expertise based on real code contributions, languages used, and package dependencies.
- Monitor technology trends and software adoption across research domains in real time.