Home

SemRepoย is an RDF knowledge graph with over 81 million triples on nearly 200,000 GitHub repositories linked to scientific research. SemRepo captures fine-grained repository-level metadata (e.g., contributors, issues, programming languages) and interlinks this information with external scholarly knowledge graphs: repositories are connected to publications inย LPWC, repository authors are linked to their profiles inย SemOpenAlex, and research artifacts (e.g., datasets, experiments) are linked viaย MLSea.

What exactly do we provide?

  1. Periodically updated (approx. twice per year) RDF dump files of the SemRepo Knowledge Graph.
  2. A publicly accessible SPARQL endpoint containing the latest SemRepo Knowledge Graph data.
  3. Semantic interlinking of GitHub repositories with external scholarly knowledge graphs.
  4. An OWL ontology with 19 classes and 47 relations for modelling research-connected software repositories.
  5. Open-source pipeline for SemRepo construction and automatic interlinking, enabling future extensions of SemRepo.
  6. URI resolution of the SemRepo Knowledge Graph within the Linked Open Data Cloud.

How big is the SemRepo Knowledge Graph?

SemRepo.org contains (as of April 2026)*:

๐Ÿ—ƒ๏ธ Repositories: 197,566

๐Ÿง‘โ€๐Ÿ’ป Contributors: 2,916,508

๐Ÿท๏ธ Issues: 2,609,510

๐Ÿง  Programming Language: 387,284

๐Ÿข Organizations: 12,879

๐Ÿง  Packages: 95,505

๐Ÿงต Research Topics: 272,378

๐Ÿง‘โ€๐Ÿ”ฌ Linked SemOpenAlex Authors: 11,867 (see https://semopenalex.org)

๐Ÿ”— Linked LPWC Repositories: 197,566 (see https://linkedpaperswithcode.com)

๐Ÿ”— MLSea Software entities: 148,185 (see https://w3id.org/mlsea)

*core classes only, in total SemRepo contains 19 classes.