Open Source Knowledge Graphs

A knowledge graph is a collection of data that represents entities and their relationships in a structured format. Open source knowledge graphs are freely available datasets that can be used for various applications such as natural language processing, machine learning, and data mining.

The following table compares some popular open source knowledge graphs:

DatasetNumber of EntitiesNumber of FactsNumber of ClassesNumber of RelationsLicenseDescription
DBpedia4.29 million411 million7362819CC-BY-SA 3.0Extracted from Wikipedia, contains entities and relations in various domains
YAGO5.13 million1.00 billion569,751106CC-BY-SA 3.0Extracted from Wikipedia, contains entities and relations in various domains
WordNet175,979207,0164N/AOpen-sourceLexical database of English words and concepts
Freebase49.95 million3.12 billion53,09270,902Open-sourceUser-contributed facts about people, places, and things
Wikidata18.69 million748.53 million302,2801874CC0Structured data from Wikipedia and other sources
OpenCyc41,0292.41 million116,82218,028Commercial and non-commercialA large ontology and knowledge base of common sense knowledge
IMDb484 millionN/A7N/AFree for non-commercial useContains information about movies, television shows,
MusicBrainz92.82 million37.87 million15N/ACC-BY-SA 4.0Contains information about music artists, albums, songs and more

When choosing a knowledge graph for your application, you should consider the following factors:

  • The size and scope of the dataset: If your application requires a large number of entities and relationships, you should choose a dataset with a high number of facts, classes, and predicates.
  • The domain of the dataset: If your application requires information from a specific domain, you should choose a dataset that is specialized in that domain.
  • The license of the dataset: Make sure that the dataset is available for use under the license that is suitable for your application.
  • The quality of the data: Make sure that the dataset is accurate, reliable, and free of errors.

It is important to note that most open source knowledge graphs are extracted from Wikipedia, so their data is not always perfectly accurate, and a certain level of quality control is necessary.

To evaluate which datasets to use for your application, you should explore the datasets and select the one that best suits your needs based on the factors listed above. It may also be beneficial to compare the results obtained using different datasets and evaluate which one produces the best results for your specific application.