Generating B2B Sales Leads With Diffbot’s Knowledge Graph

Generation of leads is the single largest challenge for up to 85% of B2B marketers.

Simultaneously, marketing and sales dashboards are filled with ever more data. There are more ways to get in front of a potential lead than ever before. And nearly every org of interest has a digital footprint.

So what’s the deal? 🤔

Firmographic, demographic, technographic (components of quality market segmentation) data are spread across the web. And even once they’re pulled into our workflows they’re often siloed, still only semi-structured, or otherwise disconnected. Data brokers provide data that gets stale more quickly than quality curated web sources.

But the fact persists, all the lead generation data you typically need is spread across the public web.

You just needs someone (or something 🤖) to find, read, and structure this data.


Read More

The 6 Biggest Difficulties With Data Cleaning (With Work Arounds)

Data is the new soil.

David Mccandless

If data is the new soil, then data cleaning is the act of tilling the field. It’s one of the least glamorous and (potentially) most time consuming portions of the data science lifecycle. And without it, you don’t have a foundation from which solid insights can grow.

At it’s simplest, data cleaning revolves around two opposing needs:

  • The need to amend data points that will skew the quality of your results
  • The need to retain as much of your useful data as you can

These needs are often most strictly opposed when choosing to clean a data set by removing data points that are incorrect, corrupted, or otherwise unusable in their present format.

Perhaps the most important result from a data cleaning job is that results be standardized in a way that analytics and BI tools can easily access any value, present data in dashboards, or otherwise make the data manipulatable.


Read More

From Knowledge Graphs to Knowledge Workflows

2020 Was The “Year of the Knowledge Graph”

2020 was undeniably the “Year of the Knowledge Graph.”

2020 was the year that Gartner put Knowledge Graphs at the peak of its hype cycle.

It was the year where 10% of the papers published at EMNLP referenced “knowledge” in their titles.

It was the year over 1000 engineers, enterprise users, and academics came together to talk about Knowledge Graphs at the 2nd Knowledge Graph Conference.

There are good reasons for this grass-roots trend, as it isn’t any one company that is pushing this trend (ahem, I’m looking at you, Cognitive Computing), but rather a broad coalition of academics, industry vertical practitioners, and enterprise users that generally deal with building intelligent information systems.

Knowledge graphs represent the best of how we hope the “next step” of AI looks like: intelligent systems that aren’t black boxes, but are explainable, that are grounded in the same real-world entities as us humans, and are able to exchange knowledge with us with precise common vocabularies. It’s no coinincidence that in the same year that marked the peak of the deep learning revolution (2012), Google introduced the Google Knowledge Graph as a way to provide interpretability to its otherwise opaque search ranking algorithms.

The Risk Of Hype: Touted Benefits Don’t Materialize


Read More

Robotic Process Automation Extraction Is A Time Saver. But it’s Not Built For the Future

Enough individuals have heard the siren song of Robotic Process Automation to build several $1B companies. Even if you don’t know the “household names” in the space, something about the buzzword abbreviated as “RPA” leaves the impression that you need it. That it boosts productivity. That it enables “smart” processes. 

RPA saves millions of work hours, for sure. But how solid is the foundation for processes built using RPA tech? 

Related Reads: 


First off, RPA operates by literally moving pixels across the screen. Repetitive tasks are automated by saving “steps” with which someone would manipulate applications with their mouse, and then enacting these steps without human oversight. There are plenty of examples for situations in which this is handy. You need to move entries from a spreadsheet to a CRM. You need to move entries from a CRM to a CDP. You need to cut and paste thousands or millions of times between two windows in a browser. 

These are legitimate issues within back end business workflows. And RPA remedies these issues. But what happens when your software is updated? Or you need to connect two new programs? Or your ecosystem of tools changes completely? Or you just want to use your data differently? 

This shows the hint of the first issue with the foundation on which RPA is built. RPA can’t operate in environments in which it hasn’t seen (and received extensive documentation about). 


Read More

The Ultimate Guide To Data Analysis

Data analysis comes at the tail end of the data lifecycle. Directly after or simultaneously performed with data integration (in which data from different sources are pulled into a unified view). Data analysis involves cleaning, modelling, inspecting and visualizing data.

The ultimate goal of data analysis is to provide useful data-driven insights for guiding organizational decisions. And without data analysis, you might as well not even collect data in the first place. Data analysis is the process of turning data into information, insight, or hopefully knowledge of a given domain.

Read More

Is RPA Tech Becoming Outdated? Process Bots vs Search Bots in 2020

The original robots who caught my attention had physical human characteristics, or at least a physically visible presence in three dimensions: C3PO and R2D2 form the perfect duo, one modeled to walk and talk like a bookish human, the other with metallic, baby-like cuteness and it’s own language. 

Both were imagined, but still very tangible. And this imagery held staying power. This is how most of us still think about robots today. Follow the definition of robot and the following phrase surface, “a machine which resembles a human.” A phrase only followed by a description of the types of actions they actually undertake. 

Most robots today aren’t in the places we’d think to look based on sci-fi stories or dictionary definitions. Most robots come in two types: they’re sidekicks for desktop and server activities at work, or robots that scour the internet to tag and index web content.

All-in-all robots are typically still digital. Put another way, digital robots have come of age much faster than their mechanical cousins. 


Read More

Diffbot State of Machine Learning Report – 2018

In what will likely be the first of many reports from the team here at Diffbot, we wanted to start with a topic near and dear to our (silicon) hearts: machine learning.

Using the Diffbot Knowledge Graph, and in only a matter of hours, we conducted the single largest survey of machine learning skills ever compiled in order to generate a clear, global picture of the machine learning workforce. All of the data contained here was pulled from our structured database of more than 1 trillion facts about 10 trillion entities (and growing autonomously every day).

Of course, this is only scraping the surface of the data contained in our Knowledge Graph and, it’s worth noting, what you see below are not just numbers in a spreadsheet. What each of these data points represents are actual entities in our Knowledge Graph, each with their own set of data attached and linked to thousands of other entities in the KG.

So, when we say there are 720,000+ people skilled in machine learning – each of those people has their own entry in the Knowledge Graph, rich with publicly available information about their education, location, public profiles, work history, and more.

Read More

RIP: The Semantic Web

The Semantic Web has been a hotly debated topic for many years now.

The conversation has gained some momentum recently in how we frame issues like search, SEO, and linked data.

Semantic technologies have long been heralded as the best way to add linked data to your site.

But since the rise of AI, many are now asking, “Is the Semantic Web dead?”

In short, yes.

One article from Semantico even gave it a eulogy several years ago, indicating that it’s been in the process of dying for several years.

Of course, it’s not quite dead. Like a butterfly in a cocoon, it’s merely in the process of evolving into something better.

But why does this transition matter?

The Semantic Web was important to a lot of the ways we view data and handle data on our sites, especially in how they relate to search and SEO.

Without the Semantic Web, we wouldn’t have the Google we know today, for example.

But Google and other tech giants are now moving beyond semantic technology into the realm of AI and Machine Learning.

With that in mind, here’s what you should know about the “death” of the Semantic Web and what it means for you.

Free download: 5 Takeaways from the Death of the Semantic Web

What Is the Semantic Web?

The Semantic Web was our first attempt at structuring and organization the data on our websites so that search engines like Google could easily read it.

As W3C defines it, the Semantic Web “provides a common framework that allows data to be shared across application, enterprise and community boundaries.”

The idea was that if everyone’s data could be organized semantically – logically – search would be a cinch.

In terms of search, the Semantic Web would use data to create associations of known entities through the “structured data” within the page markup.

But the Semantic Web was a bit tedious. It required users to manually tag every web page in order to fit into its system.

Much of the information we get from the Internet today is delivered in the form of HTML documents linked to each other through hyperlinks (this is the linked data mentioned earlier).

If users failed to connect this data (tag it) properly, it would fail.

Machines, too, have a hard time extracting meaning from the links without proper structure.

Machines also have trouble understanding intent, which is the foundation of search.

Semantic Web technology was the first attempt to determine intent by creating a database of information that all linked (and related to) each other.

It was far from perfect, but it worked for a time.

Unfortunately, with the rise of Machine Learning, deep learning and other forms of AI, the Semantic Web has become much less capable by comparison.

The Role of Machine Learning and AI in Search

Semantic technology is transitioning to AI.

In his article, “The Semantic Web is Dead, Long Live the Semantic Web,” Denny Britz argues that the Semantic Web has been replaced by the “API economy.”

“APIs are proliferating,” he says.

He also notes that the biggest reason that the Semantic Web is failing where other, smarter technologies are succeeding is that semantic languages were hard to use.

“Semantic Web technologies were complex and opaque, made by academics for academics,” he adds. “[They were] not accessible to many developers, and not scalable to industrial workloads.”

Diffbot’s Knowledge Graph, for instance, can now extract meaningful information from the web with high levels of accuracy.

The graph uses a combination of Machine Learning and probabilistic techniques, combined with lots of data.

In essence, AI and Machine Learning are now capable of doing everything that the Semantic Web originally aspired.

And they’ve made the old ways somewhat irrelevant.

What This Means for Structured Data

So what does this all mean for you, the average web data user?

For one, it means that your Google search results are going to be much more accurate.

For another, it means that the way you structure your site’s data will significantly impact its rankings on Google and how well Google’s AI will be able to read that data.

Using Schema Markup – a type of semantic vocabulary – for example, will be important to SEO.

But it also means that you will need to use more powerful scraping tools if you want to collect data from other sources around the web.

In his nearly decade-old article, “5 Problems of the Semantic Web,” James Simmons describes one of biggest issues with the Semantic Web being a lack of bottom-up approach to web scraping.

He says that in the future “content scrapers of the Semantic Web and beyond will be equipped with the ability to read the content within Web documents and feeds.”

This technology, he adds, “Does not yet fully exist.”

Except that now it does.

With AI and Machine Learning, scraping technologies have improved to be able to process natural language as well as read structured (and unstructured) data in a highly accessible way.

The programming languages we use now are able to cut through the complexities of web data so that any site – regardless of size or number of HTML documents – can use data to grow.

In other words, the death of the Semantic Web is a very, very good thing for business.

Here are 5 key takeaways you should know about the Semantic Web


While the Semantic Web deserves a lot of praise for being the first of its kind in the world, there comes a time for every technology to evolve.

It might be easier to say that the Semantic Web is transitioning, rather than dying, but the reality is that AI and Machine Learning are outpacing it at a significant rate.

The way that new data technologies are growing is a sign of things to come.

But this is good news for sites that want to use data to outpace the competition. With AI and Machine Learning, it’s possible to gather data from any site at any time.

You don’t need some sort of “futuristic web scraper” because the technology already exists today.

You can get the data you need, the way you need it, from the sources you need it with very minimal effort.

If the Semantic Web has to die for this to happen, it’s a death we won’t shed any tears over.

Read More

How the Role of Chief Data Officer Is Changing in 2017


How in-demand is the role of Chief Data Officer? According to recent data studies, very.

2017 is set to be one of the biggest years yet for data analytics, and that means the demand for CDOs is on the rise.

Because companies are using data across multiple silos in a variety of functions, data is now much more of a commodity than it was a mere decade ago, and along with the ever-changing growth of data, the role of a CDO is changing along with it.

While some of the primary responsibilities of CDOs hasn’t changed – tracking and interpreting data trends, for example – organizations are now implementing data-driven strategies like never before, which means that CDOs are no longer the sole gatekeepers for big data.

For those who have either been in the role for many years and are now wondering what lies ahead, or those just stepping into it, this means facing a bevy of new challenges in the world of data management.

Here’s what you should know…

Here are 10 Things Modern CDOs Should Focus On

Changes to the Role of CDO

According to Mark Gambill, CMO of MicroStrategy, the CDO was originally born “as an attempt to create a bridge between functional leaders who need information in real time and the IT department.”

He argues that in a perfect world, there wouldn’t be a need for the role at all, but because data is “challenging, frustrating, and expensive” organizations need someone who can dedicate their time to sift through the complexities.

As Omri Kohl, founder of Pyramid Analytics, notes, organizations will need to take advantage of this role in 2017 like never before. He believes that companies looking to gain a competitive edge will need to shift their views on how to use data as well as leverage the capabilities of CDOs.

“As data becomes more robust, organizations are realizing that deploying a business analytics platform is not a nice-to-have anymore but instead a must-have,” he says. “And creating one role responsible for the centralized ownership of the overall data strategy will be critical to an organization’s success.”

This will require CDOs to understand the full capabilities of data and develop strategic solutions to utilize throughout the entire organization.

Because CDOs work with data across multiple departments and divisions, they will need to deal with more than just numbers and figures – they will need to deal in strategy.

CDOs in 2017 will be change agents: knowing how things operate, where and why there’s resistance to change, and how to help people understand the applications of complex data to drive growth.

And as data continues to become more prevalent and complex in the years to come, the primary role of the CDO will be not only to help organizations understand data, but to implement it in new and creative ways.

Challenges Faced by Modern CDOs

With such a shift in responsibilities, there are certain challenges faced by the modern CDO compared to earlier counterparts.

CDOs will be tasked with communicating and stewarding data in ways previous generations never did. They will need to take advantage of existing data while using it to drive practical innovation, all while setting priorities for the use of data throughout the company.

Essentially, modern CDOs will be responsible for:

  • Establishing the organization’s data strategy – CDOs must lead the transformation to becoming a “data-driven organization” and ensure that data is being valued and understood properly
  • Integrating data across multiple silos – Because data will need to be organized throughout many departments, CDOs are responsible for making sure it integrates well and drives tangible results
  • Monetizing and creating value from data – CDOs will need to monetize data to drive marketing and sales funnels instead of simply analyzing and reporting on trends
  • Understanding data security and risks – CDOs will be responsible for protecting data and understanding any and all potential risks and threats

They will also need to be able to communicate the “what, why, and how” of data to both the leadership and the technical members of their organizations. This means working closely with upper management and not just IT teams.

But how does a CDO do that, exactly?

How to Overcome Data Challenges in 2017

Tony Fross, VP of digital advisory services at Capgemini Consulting, believes that CDOs will need to position themselves as authorities in their respective companies more than ever before.

“CDOs need to be chiefs, and not buried four layers under the CEO. They have to have ownership of an enterprise strategy with broad horizontal input. Otherwise, they’re not truly CDOs.”

According to Fross, CDOs will need to create clear objectives and incentives for companies that still don’t value data, and they will need to promote the capabilities and potential of data for companies that already value it.

Part of that will be using fact-based evidence to support the usage of data alongside a level of emotional resonance to help communicate their message.

According to Gartner’s 2016 report, there is currently a lack of meaningful metrics to measure the effectiveness of the CDO, but CDOs can overcome this by delivering clear value and by positioning themselves as data authorities.

This means CDOs will need to:

  • Connect departments across the organization – Expanding the use of data beyond just “technical” departments is essential
  • Develop a data roadmap – Data strategies will need to capture the right data for use in areas like sales, marketing, and other customer-related channels
  • Turn data into action – Data should be harnessed in real-time so that organizations can run with it
  • Anticipate data needs and attack challenges head on – CDOs will need to take on leadership roles, not merely analytical ones

In order to face the many challenges of modern data, CDOs will need to be flexible enough to work with both technical and non-technical teams to develop data strategies that offer practical value.

Don’t miss these 10 areas of focus for any modern Chief Data Officer

Final Thoughts

In order for CDOs to be effective in helping to create data-driven organizations, they will need to not only monitor data, but also help leaders understand its value and apply it to multiple processes throughout the company.

This means creating strategic roadmaps to guide upper management, working closely with C-Level leadership to understand how data can be used to drive growth. This also means working with technical (and non-technical departments) to implement data in creative ways.

But most importantly, this means dedicating the time and energy to understanding the true value of data, communicating that value to others, and turning numbers into practical and measurable results.


Read More