Knowledge Graph Glossary

Knowledge graphs touch many of our lives on a daily basis, and yet many don’t know the first thing about them. Ask your smart speaker what the weather will be. Witness sales prospects in your CRM automatically populate with a profile of who you’re about to speak with. Use an Excel add-in to pull in data on a list of organizations. Chances are, many of these instances are powered by knowledge graphs (or should be!). 
As providers of the world’s largest commercially-available Knowledge Graph™, Diffbot is in a great position to share fundamental concepts and knowledge on what knowledge graphs are accomplishing today, and what they will likely facilitate in the future. We’re excited to offer this glossary of knowledge graph-related terms and concepts for the next generation of KG users.

Automated Data Cleaning

Automated Data Cleaning involves the application of machine learning to accomplish the data cleaning objectives of modifying or removing data…

[...]

Automated Knowledge Base

Automated Knowledge Bases are large repositories of knowledge structured as entities and the relationships between them that are compiled through…

[...]

Commonsense Knowledge Bases

Commonsense Knowledge Bases are knowledge bases (or knowledge graphs) organized around the representation of data that everyone is expected to…

[...]

Data Archive

Data Archives retain information over time often preserving a view of a given moment within a database. As opposed to…

[...]

Data Enrichment

Data Enrichment is the common practice of merging external, authoritative data with first-party customer or exploratory data. First-party “raw” customer…

[...]

Data Mining Confidence

Confidence within data mining is typically utilized for association-rule learning. In the case of market-basket analysis, confidence describes the relationships…

[...]

Data provenance

Data provenance is metadata that is paired with records and details the origin and confidence of the truth of data.…

[...]

Data Science Lifecycle

The Data Science Lifecycle is an iterative approach to managing data science contributions within an organization. Seven steps are commonly…

[...]

DIKW Pyramid

The DIKW Pyramid — also commonly known as the information or data pyramid — is a series of related models…

[...]

Discussion Data

Discussion Data is the primary fuel of discussion analysis as well as one extremely valuable source for sentiment analysis for…

[...]

Entity Resolution

Entity Resolution refers to a portion of a knowledge graph build process in which information from separate records is reconciled…

[...]

Facet

A Facet is an aspect or type of entry within a knowledge graph entity. Examples of facets include the location…

[...]

Faceted Search

Faceted Search is a search that returns a count of the prevalence of one set of attributes within entities that…

[...]

Firmographic Data

Firmographic Data — also known as firm demographic data — is data related to the fundamental characteristics of organizations. Often…

[...]

Folksonomy

A Folksonomy is a categorization system in which end users categorize content or entities with the use of tags or…

[...]

Graph

A graph is a mathematical concept used as a non-linear data structure within computer science. Graphs are often depicted visually,…

[...]

Head Entities

Head Entities are entities within a knowledge graph that are more regularly referenced, linked to, downloaded, and utilized than other…

[...]

Induction

Induction is a form of reasoning in which premises are viewed as supplying some satisfactory evidence as to the truth…

[...]

Inference Engines

Inference Engines are a component of an artificial intelligence system that apply logical rules to a knowledge graph (or base)…

[...]

Knowledge Base

Knowledge Bases are large repositories of structured or unstructured data for use within an information system. The term was originally…

[...]

Knowledge Engineering

Knowledge Engineering is a subset of engineering methods and questions within artificial intelligence that seeks to create systems that emulate…

[...]

Knowledge Fusion

Knowledge Fusion is a crucial and differentiating step within Diffbot’s Knowledge Graph™ build pipeline. Occuring after the linking of records,…

[...]

Knowledge Graph Entity

Knowledge Graph Entities are people, places, or “things” as defined within a knowledge graph. Grammatically, entities tend to be nouns…

[...]

Knowledge Graph Reasoner

 A Knowledge Graph Reasoner — also called an inference engine, rules engine, or semantic engine — is an AI-enabled system…

[...]

Linked Data

Linked Data is data that is encoded alongside it’s semantic meaning. Linked data has been championed by numerous organizations in…

[...]

Long Tail

Long Tail data or entities are those that are less commonly referenced within a knowledge graph (or any data set).…

[...]

Natural Language Processing

Natural Language Processing is a field of inquiry and processes concerned with the interaction of computers and human language (speech…

[...]

Noise

Noise is typically thought of as unexplained variability in data. Noise is in contrast to a signal, which is clearly…

[...]

Ontology

An Ontology is a set of concepts or categories within one subject matter or domain that show properties of entities…

[...]

Organizational Data

Organizational Data — also called firmographic or firm demographic data — is data related to the fundamental characteristics of organizations.…

[...]

Origin

An Origin is the location or source of data that is incorporated into a fact. Origins are important for automated…

[...]

Overmerging

Overmerging occurs when a knowledge graph entity has too many records such that these records make data for the entity…

[...]

Product Data

Product Data includes all readable, measurable, and structurable data about products. While there is no universally accepted schema for all…

[...]

Properties

Properties are attributes or characteristics of knowledge graph entities. Properties vary depending on entity type and as described in a…

[...]

Proxies

Proxies — also known as proxy servers — are intermediate servers that receive web requests and redirect them. Proxy servers…

[...]

Record Linking

Record Linking is an important aspect of any knowledge graph build process that involves linking records to entities. An example…

[...]

Relation Extraction

 Relation Extraction is the process of identifying associations between elements in unstructured data. For example, recognizing that Diffbot.com is the…

[...]

Schema

A Schema is a set of rules for how entities, attributes, and relationships between entities can be arranged in a…

[...]

Seed URL

A Seed URL in web crawling is a url from which a web crawler will begin to traverse a site.…

[...]

Semantic Integration

Semantic Integration is the process of integrating information from diverse sources into a single structure. For example, pulling video conferencing…

[...]

Semantic Search

Semantic Search provides results based on semantic meaning among searched entities. Semantic search is distinguished from lexical search, which returns…

[...]

Structured Data

Structured Data refers to any form of data that resides in established fields within a record. This is distinguished from…

[...]

Transparency and Explainability

Transparency and Explainability of AI systems are concepts used to rate and discuss how AI systems come to the conclusions…

[...]

Unstructured Data

Unstructured Data is data that does not reside in established fields within a record. Examples of unstructured data include emails,…

[...]

URI

Unstructured Data is data that does not reside in established fields within a record. Examples of unstructured data include emails,…

[...]