Data meshes, fabrics, and knowledge graphs are all positioned as frameworks through which similar benefits are realized.
All three promote interoperability and ease of integrating new data sources. To varying degrees all three support real-time and event-driven data ingestion and processing. All three seek to avoid flat data output, data that needs additional processing time once it has been extracted, and orphaned data that becomes progressively stale. Additionally, with the focus on myriad (and a growing number of) data sources, robust data governance and semantic enrichment is at the forefront of each of these systems.
With that said, there are differences between data mesh, fabric, and knowledge graphs.
What Is A Data Fabric?
Data fabric is an architecture-centered design concept governing data access across many decentralized data sources. Initial ideas behind the development of data fabric methodologies include costly, slow, and low value data integration cycles common to centralized data lakes or warehouses. The aspirations of data fabric systems are to promote connectivity of disparate data sources as well as reusability by avoiding issues such as orphaned data or large volumes of extraneous data that tend to compile in centralized data stores.
A focus on value-added data integration is central to the notion of data fabrics. Systems for semantic enrichment, linked data, and the harmonization of a variety of unstructured, semi-structured, and structured data are key for successful data fabric delivery. The creation of these systems is not decentralized. As such in a data fabric, data access is centralized and held under a single point of control.
Where available, data fabric makes data available via objective-centered APIs. For example, in the event a user needs to build a dashboard comparing hiring trends of competitors with news monitoring around noteworthy market events, a data fabric approach would involve first ingesting these disparate data sources, adding context or additional fields to data, then exposing the data as an API for the dashboard.
What Is A Data Mesh?
First and foremost, data mesh is an organization-centered approach to data management. A data management system built with data mesh-centric principles enables users to access and query data from a variety of sources without first ingesting this data into a centralized warehouse. While architecture design is part of a data mesh, it is not as central to the characterization of a data mesh as to a data fabric.
From an organizational perspective, data mesh views each edge data source as a product owned by a business unit in charge of that domain. In relation to these decentralized data stores, data mesh serves as a connectivity layer that is built such that both technical and non-technical users can utilize data sets where they reside.
Ingestion of data closer to the source – without the need for transfer and ingestion into a central repository – can lower processing costs, decrease time-until analysis, and avoid privacy issues regarding data transferred between particular geographies.
What Is A Knowledge Graph?
Contrary to data meshes and fabrics, a knowledge graph is not a connectivity-layer-centric solution or a data management imperative.
Knowledge graphs are graph databases that are built to preserve information and context. In particular, knowledge graphs are built around nodes (entities) and edges (relationships). Though data can be outputted in a format similar to a relational database, knowledge graphs provide better performance traversing through linked data and are much more adept at adding new fact types and data source formats “on the fly.”
This makes knowledge graphs a natural choice for high velocity and variable type data like those used in news or market monitoring. Data is linked and often augmented with additional semantic features upon ingestion in knowledge graphs, aligning with the objectives of data fabrics. For example, within Diffbot’s Knowledge Graph we have organization entities for which we can infer detailed industry fields, machine learning-computed estimated revenue, as well as similarity scores between organizations.
Use of knowledge organization systems (KOS) aligns with data fabric and mesh goals to add additional semantics to variable incoming data streams and promote linked data. KOS’s commonly utilized in Knowledge Graph construction include:
- Glossaries/synonym rings: properly merge facts attached to entities mentioned in multiple ways
- Unique identifiers: disambiguate entities with the same name (Apple Inc vs. Apple the fruit)
- Taxonomies: classify new entities in relation to old entities allowing for additional inferences (California is a state in the United States, therefore San Francisco is in the United States)
- Associative clustering: track loose relationships and similarities between entities (Pho is often associated with Vietnamese restaurants; machine learning engineers often work at AI startups)
- Ontologies: rules, properties, constraints to entities and relationships (only organizations have funding rounds)
Also similar to data fabrics, knowledge graphs are often constructed with a single centralized data access via an API or integrations.
As the provider of the world’s largest commercially-available Knowledge Graph, Diffbot has seen many successful use cases for Knowledge Graph data. These uses include:
- Market monitoring: tracking of firmographic changes and key events
- Product intelligence: building knowledge graphs of related products
- News monitoring: tracking key events and relationships in the news
- Machine learning: easy labeled data with context leads to quick workflows and explainability
- Sales development: ability to filter through detailed firmographics and person records
- Hiring and investing: track attrition, skill sets, and meaningful organizational events
- Data enrichment: easily digestible structured and linked data with expanding field types
- Product Recommendations: serve up recommendations based on associated behaviors and products
- Discussion tracking: velocity, sentiment, and influencer tracking
- Fake news detection: the ability to corroborate facts across millions of articles and train models to predict accuracy of statements
- Fraud detection: the ability to visualize and track complex relationships between regulatory bodies, private organizations, and key individuals
- Supply chain / risk: the ability to visualize and track partnerships, key events, suppliers, vendors, locations, and hiring trends
Of course, many of the use cases above can also be supported with data fabrics and meshes. But where meshes and fabrics describe an entire ecosystem of data use and structure across an organization, knowledge graphs excel to a noteworthy degree in support of augmentation of other data stores as well as specific tasks.
Is It Really About All Three?
There are pros and cons to using any three of the knowledge management frameworks listed above. And it’s often not a choice of either/or. Data fabrics benefit from a single point of connectivity that can serve up standardized and semantically-enriched data from disparate internal and external sources. A data mesh may be suitable for underlying portions of an organization where agility is more heavily prized. A data source of record can then be supplied for integration and release from a central point (data fabric) for other teams.
Additionally, data held in knowledge graphs may make sense for certain use cases within an organization utilizing a data fabric and/or mesh. A focus on interoperability and easy integration makes knowledge graph data great for augmentation and enrichment of data sets in other formats. A focus on providing context for information supports explainability, making knowledge graph data a preferred choice for machine learning and data science-centered initiatives within an organization.
Care to learn more about the world’s largest commercially-available Knowledge Graph? Reach out to our sales team today.