Data Trends: Comparing Data Fabrics, Data Meshes, And Knowledge Graphs

Data meshes, fabrics, and knowledge graphs are all positioned as frameworks through which similar benefits are realized. 

All three promote interoperability and ease of integrating new data sources. To varying degrees all three support real-time and event-driven data ingestion and processing. All three seek to avoid flat data output, data that needs additional processing time once it has been extracted, and orphaned data that becomes progressively stale. Additionally, with the focus on myriad (and a growing number of) data sources, robust data governance and semantic enrichment is at the forefront of each of these systems. 

With that said, there are differences between data mesh, fabric, and knowledge graphs. 

What Is A Data Fabric?

Data fabric is an architecture-centered design concept governing data access across many decentralized data sources. Initial ideas behind the development of data fabric methodologies include costly, slow, and low value data integration cycles common to centralized data lakes or warehouses. The aspirations of data fabric systems are to promote connectivity of disparate data sources as well as reusability by avoiding issues such as orphaned data or large volumes of extraneous data that tend to compile in centralized data stores. 

A focus on value-added data integration is central to the notion of data fabrics. Systems for semantic enrichment, linked data, and the harmonization of a variety of unstructured, semi-structured, and structured data are key for successful data fabric delivery. The creation of these systems is not decentralized. As such in a data fabric, data access is centralized and held under a single point of control.

Where available, data fabric makes data available via objective-centered APIs. For example, in the event a user needs to build a dashboard comparing hiring trends of competitors with news monitoring around noteworthy market events, a data fabric approach would involve first ingesting these disparate data sources, adding context or additional fields to data, then exposing the data as an API for the dashboard. 

What Is A Data Mesh?

First and foremost, data mesh is an organization-centered approach to data management. A data management system built with data mesh-centric principles enables users to access and query data from a variety of sources without first ingesting this data into a centralized warehouse. While architecture design is part of a data mesh, it is not as central to the characterization of a data mesh as to a data fabric. 

From an organizational perspective, data mesh views each edge data source as a product owned by a business unit in charge of that domain. In relation to these decentralized data stores, data mesh serves as a connectivity layer that is built such that both technical and non-technical users can utilize data sets where they reside. 

Ingestion of data closer to the source – without the need for transfer and ingestion into a central repository – can lower processing costs, decrease time-until analysis, and avoid privacy issues regarding data transferred between particular geographies. 

What Is A Knowledge Graph? 

Contrary to data meshes and fabrics, a knowledge graph is not a connectivity-layer-centric solution or a data management imperative. 

Knowledge graphs are graph databases that are built to preserve information and context. In particular, knowledge graphs are built around nodes (entities) and edges (relationships). Though data can be outputted in a format similar to a relational database, knowledge graphs provide better performance traversing through linked data and are much more adept at adding new fact types and data source formats “on the fly.” 

This makes knowledge graphs a natural choice for high velocity and variable type data like those used in news or market monitoring.  Data is linked and often augmented with additional semantic features upon ingestion in knowledge graphs, aligning with the objectives of data fabrics. For example, within Diffbot’s Knowledge Graph we have organization entities for which we can infer detailed industry fields, machine learning-computed estimated revenue, as well as similarity scores between organizations. 

Use of knowledge organization systems (KOS) aligns with data fabric and mesh goals to add additional semantics to variable incoming data streams and promote linked data. KOS’s commonly utilized in Knowledge Graph construction include: 

  • Glossaries/synonym rings: properly merge facts attached to entities mentioned in multiple ways
  • Unique identifiers: disambiguate entities with the same name (Apple Inc vs. Apple the fruit) 
  • Taxonomies: classify new entities in relation to old entities allowing for additional inferences (California is a state in the United States, therefore San Francisco is in the United States) 
  • Associative clustering: track loose relationships and similarities between entities (Pho is often associated with Vietnamese restaurants; machine learning engineers often work at AI startups)
  • Ontologies: rules, properties, constraints to entities and relationships (only organizations have funding rounds)

Also similar to data fabrics, knowledge graphs are often constructed with a single centralized data access via an API or integrations.

As the provider of the world’s largest commercially-available Knowledge Graph, Diffbot has seen many successful use cases for Knowledge Graph data. These uses include:

  • Market monitoring: tracking of firmographic changes and key events
  • Product intelligence: building knowledge graphs of related products 
  • News monitoring: tracking key events and relationships in the news
  • Machine learning: easy labeled data with context leads to quick workflows and explainability  
  • Sales development: ability to filter through detailed firmographics and person records
  • Hiring and investing: track attrition, skill sets, and meaningful organizational events
  • Data enrichment: easily digestible structured and linked data with expanding field types
  • Product Recommendations: serve up recommendations based on associated behaviors and products
  • Discussion tracking: velocity, sentiment, and influencer tracking
  • Fake news detection: the ability to corroborate facts across millions of articles and train models to predict accuracy of statements
  • Fraud detection: the ability to visualize and track complex relationships between regulatory bodies, private organizations, and key individuals
  • Supply chain / risk: the ability to visualize and track partnerships, key events, suppliers, vendors, locations, and hiring trends

Of course, many of the use cases above can also be supported with data fabrics and meshes. But where meshes and fabrics describe an entire ecosystem of data use and structure across an organization, knowledge graphs excel to a noteworthy degree in support of augmentation of other data stores as well as specific tasks. 

Is It Really About All Three? 

There are pros and cons to using any three of the knowledge management frameworks listed above. And it’s often not a choice of either/or. Data fabrics benefit from a single point of connectivity that can serve up standardized and semantically-enriched data from disparate internal and external sources. A data mesh may be suitable for underlying portions of an organization where agility is more heavily prized. A data source of record can then be supplied for integration and release from a central point (data fabric) for other teams. 

Additionally, data held in knowledge graphs may make sense for certain use cases within an organization utilizing a data fabric and/or mesh. A focus on interoperability and easy integration makes knowledge graph data great for augmentation and enrichment of data sets in other formats. A focus on providing context for information supports explainability, making knowledge graph data a preferred choice for machine learning and data science-centered initiatives within an organization.

Care to learn more about the world’s largest commercially-available Knowledge Graph? Reach out to our sales team today. 

No News Is Good News – Monitoring Average Sentiment By News Network With Diffbot’s Knowledge Graph

Ever have the feeling that news used to be more objective? That news organizations — now media empires — have moved into the realm of entertainment? Or that a cluster of news “across the aisle” from your beliefs is completely outrageous?

Many have these feelings, and coverage is rampant on bias and even straight up “fake” facts in news reporting.

With this in mind, we wanted to see if these hunches are valid. Has news gotten more negative over time? Is it a portion of the political spectrum driving this change? Or is it simply that bad things happen in the world and later get reported on?

To jump into this inquiry we utilized Diffbot’s Knowledge Graph. Diffbot is one of the few North American organizations to crawl the entire web. We apply AI-enabled web scrapers to pages that are publicly available to extract entities — think people, places, or things — and facts — think job titles, topics, and funding rounds.

We started our inquiry with some external coverage on bias in journalism provided by AllSides Media Bias Ratings.

Continue reading

Generating B2B Sales Leads With Diffbot’s Knowledge Graph

Generation of leads is the single largest challenge for up to 85% of B2B marketers.

Simultaneously, marketing and sales dashboards are filled with ever more data. There are more ways to get in front of a potential lead than ever before. And nearly every org of interest has a digital footprint.

So what’s the deal? 🤔

Firmographic, demographic, technographic (components of quality market segmentation) data are spread across the web. And even once they’re pulled into our workflows they’re often siloed, still only semi-structured, or otherwise disconnected. Data brokers provide data that gets stale more quickly than quality curated web sources.

But the fact persists, all the lead generation data you typically need is spread across the public web.

You just needs someone (or something 🤖) to find, read, and structure this data.

Continue reading

The 6 Biggest Difficulties With Data Cleaning (With Work Arounds)

Data is the new soil.

David Mccandless

If data is the new soil, then data cleaning is the act of tilling the field. It’s one of the least glamorous and (potentially) most time consuming portions of the data science lifecycle. And without it, you don’t have a foundation from which solid insights can grow.

At it’s simplest, data cleaning revolves around two opposing needs:

  • The need to amend data points that will skew the quality of your results
  • The need to retain as much of your useful data as you can

These needs are often most strictly opposed when choosing to clean a data set by removing data points that are incorrect, corrupted, or otherwise unusable in their present format.

Perhaps the most important result from a data cleaning job is that results be standardized in a way that analytics and BI tools can easily access any value, present data in dashboards, or otherwise make the data manipulatable.

Continue reading

From Knowledge Graphs to Knowledge Workflows

2020 was undeniably the “Year of the Knowledge Graph.”

2020 was the year that Gartner put Knowledge Graphs at the peak of its hype cycle.

It was the year where 10% of the papers published at EMNLP referenced “knowledge” in their titles.

It was the year over 1000 engineers, enterprise users, and academics came together to talk about Knowledge Graphs at the 2nd Knowledge Graph Conference.

There are good reasons for this grass-roots trend, as it isn’t any one company that is pushing this trend (ahem, I’m looking at you, Cognitive Computing), but rather a broad coalition of academics, industry vertical practitioners, and enterprise users that generally deal with building intelligent information systems.

Knowledge graphs represent the best of how we hope the “next step” of AI looks like: intelligent systems that aren’t black boxes, but are explainable, that are grounded in the same real-world entities as us humans, and are able to exchange knowledge with us with precise common vocabularies. It’s no coinincidence that in the same year that marked the peak of the deep learning revolution (2012), Google introduced the Google Knowledge Graph as a way to provide interpretability to its otherwise opaque search ranking algorithms.

The Risk Of Hype: Touted Benefits Don’t Materialize

Continue reading

Robotic Process Automation Extraction Is A Time Saver. But it’s Not Built For the Future

Enough individuals have heard the siren song of Robotic Process Automation to build several $1B companies. Even if you don’t know the “household names” in the space, something about the buzzword abbreviated as “RPA” leaves the impression that you need it. That it boosts productivity. That it enables “smart” processes. 

RPA saves millions of work hours, for sure. But how solid is the foundation for processes built using RPA tech? 

Related Reads: 

 

First off, RPA operates by literally moving pixels across the screen. Repetitive tasks are automated by saving “steps” with which someone would manipulate applications with their mouse, and then enacting these steps without human oversight. There are plenty of examples for situations in which this is handy. You need to move entries from a spreadsheet to a CRM. You need to move entries from a CRM to a CDP. You need to cut and paste thousands or millions of times between two windows in a browser. 

These are legitimate issues within back end business workflows. And RPA remedies these issues. But what happens when your software is updated? Or you need to connect two new programs? Or your ecosystem of tools changes completely? Or you just want to use your data differently? 

This shows the hint of the first issue with the foundation on which RPA is built. RPA can’t operate in environments in which it hasn’t seen (and received extensive documentation about). 

Continue reading

The Ultimate Guide To Data Analysis


Data analysis comes at the tail end of the data lifecycle. Directly after or simultaneously performed with data integration (in which data from different sources are pulled into a unified view). Data analysis involves cleaning, modelling, inspecting and visualizing data.

The ultimate goal of data analysis is to provide useful data-driven insights for guiding organizational decisions. And without data analysis, you might as well not even collect data in the first place. Data analysis is the process of turning data into information, insight, or hopefully knowledge of a given domain.
Continue reading

Is RPA Tech Becoming Outdated? Process Bots vs Search Bots in 2020

The original robots who caught my attention had physical human characteristics, or at least a physically visible presence in three dimensions: C3PO and R2D2 form the perfect duo, one modeled to walk and talk like a bookish human, the other with metallic, baby-like cuteness and it’s own language. 

Both were imagined, but still very tangible. And this imagery held staying power. This is how most of us still think about robots today. Follow the definition of robot and the following phrase surface, “a machine which resembles a human.” A phrase only followed by a description of the types of actions they actually undertake. 

Most robots today aren’t in the places we’d think to look based on sci-fi stories or dictionary definitions. Most robots come in two types: they’re sidekicks for desktop and server activities at work, or robots that scour the internet to tag and index web content.

All-in-all robots are typically still digital. Put another way, digital robots have come of age much faster than their mechanical cousins. 

Continue reading

Diffbot State of Machine Learning Report – 2018

In what will likely be the first of many reports from the team here at Diffbot, we wanted to start with a topic near and dear to our (silicon) hearts: machine learning.

Using the Diffbot Knowledge Graph, and in only a matter of hours, we conducted the single largest survey of machine learning skills ever compiled in order to generate a clear, global picture of the machine learning workforce. All of the data contained here was pulled from our structured database of more than 1 trillion facts about 10 trillion entities (and growing autonomously every day).

Of course, this is only scraping the surface of the data contained in our Knowledge Graph and, it’s worth noting, what you see below are not just numbers in a spreadsheet. What each of these data points represents are actual entities in our Knowledge Graph, each with their own set of data attached and linked to thousands of other entities in the KG.

So, when we say there are 720,000+ people skilled in machine learning – each of those people has their own entry in the Knowledge Graph, rich with publicly available information about their education, location, public profiles, work history, and more.