Generating B2B Sales Leads With Diffbot’s Knowledge Graph

Generation of leads is the single largest challenge for up to 85% of B2B marketers.

Simultaneously, marketing and sales dashboards are filled with ever more data. There are more ways to get in front of a potential lead than ever before. And nearly every org of interest has a digital footprint.

So what’s the deal? 🤔

Firmographic, demographic, technographic (components of quality market segmentation) data are spread across the web. And even once they’re pulled into our workflows they’re often siloed, still only semi-structured, or otherwise disconnected. Data brokers provide data that gets stale more quickly than quality curated web sources.

But the fact persists, all the lead generation data you typically need is spread across the public web.

You just needs someone (or something 🤖) to find, read, and structure this data.

(more…)

Read More

Is Christian Bale a Christian? Is Mitt Romney a glove?

Download This Dataset of 12,118 Yahoo Answers for $1

With only 2 weeks left till May 4th (be with you), the internet is bursting with excitement over all the work that needs to be done before Yahoo Answers finally 404s.

From scheduling a 2nd COVID vaccine to your annual panic attack at missing the tax filing deadline (you probably didn’t, it was extended to May 17 in the U.S.), there is nothing short of a lengthy agenda for everyone ahead of the shutdown of this iconic website.

(more…)

Read More

These Are The Hardest Page Types To Scrape — With Workarounds For Each

Phrases like “the web is held together by [insert ad hoc, totally precarious binding agent]” have been around for a while for a reason.

While the services we rely on tend to sport hugely impressive availability considering, that still doesn’t negate the fact that the macro web is a tangled mess of semi or unstructured data, and site-by-site nuances.

Put this together with the fact that the web is by far our largest source of valuable external data, and you have a task as high reward as it is error prone. That task is web scraping.

As one of three western entities to crawl and structure a vast majority of the web, we’ve learned a thing or two about where web crawling can wrong. And incorporated many solutions into our rule-less Automatic Extraction APIs and Crawlbot.

In this guide we round up some of the most common challenges for teams or individuals trying to harvest data from the public web. And we provide a workaround for each. Want to see what rule-less extraction looks like for your site of interest? Check out our extraction test drive!

(more…)

Read More

How Employbl Saved 250 Hours Building Their Career-Matching Database

We started with about 1,000 companies in the Employbl database, mostly in the Bay Area. Now with Diffbot we can expand to other cities and add thousands of additional companies. 

Connor Leech – CEO @Employbl

Fixing tech starts with hiring. And fixing hiring is an information problem. That’s what Connor Leech, cofounder and CEO at Employbl discovered when creating a new talent marketplace meant to connect tech employees with the information-rich hiring marketplace they deserve.

Tech job seekers rely on a range of metrics to gauge the opportunity and stability of a potential employer.

While information like funding rounds, founders, team size, industry, and investors are often public, it can be hard to grab the myriad fields candidates value in a up-to-date format from around the web.

These difficulties are amplified by the fact that many tech startups are often “long tail” entities that also regularly change.

(more…)

Read More

From Knowledge Graphs to Knowledge Workflows

2020 Was The “Year of the Knowledge Graph”

2020 was undeniably the “Year of the Knowledge Graph.”

2020 was the year that Gartner put Knowledge Graphs at the peak of its hype cycle.

It was the year where 10% of the papers published at EMNLP referenced “knowledge” in their titles.

It was the year over 1000 engineers, enterprise users, and academics came together to talk about Knowledge Graphs at the 2nd Knowledge Graph Conference.

There are good reasons for this grass-roots trend, as it isn’t any one company that is pushing this trend (ahem, I’m looking at you, Cognitive Computing), but rather a broad coalition of academics, industry vertical practitioners, and enterprise users that generally deal with building intelligent information systems.

Knowledge graphs represent the best of how we hope the “next step” of AI looks like: intelligent systems that aren’t black boxes, but are explainable, that are grounded in the same real-world entities as us humans, and are able to exchange knowledge with us with precise common vocabularies. It’s no coinincidence that in the same year that marked the peak of the deep learning revolution (2012), Google introduced the Google Knowledge Graph as a way to provide interpretability to its otherwise opaque search ranking algorithms.

The Risk Of Hype: Touted Benefits Don’t Materialize

(more…)

Read More

Robotic Process Automation Extraction Is A Time Saver. But it’s Not Built For the Future

Enough individuals have heard the siren song of Robotic Process Automation to build several $1B companies. Even if you don’t know the “household names” in the space, something about the buzzword abbreviated as “RPA” leaves the impression that you need it. That it boosts productivity. That it enables “smart” processes. 

RPA saves millions of work hours, for sure. But how solid is the foundation for processes built using RPA tech? 

Related Reads: 

 

First off, RPA operates by literally moving pixels across the screen. Repetitive tasks are automated by saving “steps” with which someone would manipulate applications with their mouse, and then enacting these steps without human oversight. There are plenty of examples for situations in which this is handy. You need to move entries from a spreadsheet to a CRM. You need to move entries from a CRM to a CDP. You need to cut and paste thousands or millions of times between two windows in a browser. 

These are legitimate issues within back end business workflows. And RPA remedies these issues. But what happens when your software is updated? Or you need to connect two new programs? Or your ecosystem of tools changes completely? Or you just want to use your data differently? 

This shows the hint of the first issue with the foundation on which RPA is built. RPA can’t operate in environments in which it hasn’t seen (and received extensive documentation about). 

(more…)

Read More

The Ultimate Guide To Data Analysis


Data analysis comes at the tail end of the data lifecycle. Directly after or simultaneously performed with data integration (in which data from different sources are pulled into a unified view). Data analysis involves cleaning, modelling, inspecting and visualizing data.

The ultimate goal of data analysis is to provide useful data-driven insights for guiding organizational decisions. And without data analysis, you might as well not even collect data in the first place. Data analysis is the process of turning data into information, insight, or hopefully knowledge of a given domain.
(more…)

Read More

Converting text documents into knowledge graphs with the Diffbot Natural Language API

Most of the world’s knowledge is encoded in natural language (e.g., news articles, books, emails, academic papers). It is estimated that 80 percent of business-relevant information originates in unstructured form, primarily text. However, the ambiguous nature of human communication makes it difficult for software engineers and data scientists to leverage this information in their applications.

After years of research, we are proud to announce the Diffbot Natural Language API, a new product to help businesses convert their text documents into knowledge graphs. Knowledge graphs represent information about real-world entities (e.g., people, organizations, products, articles) via their relationships with other entities (e.g., founded by, educated at, was mentioned in). This is the same production-grade technology that we use to build the world’s largest knowledge graph from the web, and we are making it available to all.

(more…)

Read More

Is RPA Tech Becoming Outdated? Process Bots vs Search Bots in 2020

The original robots who caught my attention had physical human characteristics, or at least a physically visible presence in three dimensions: C3PO and R2D2 form the perfect duo, one modeled to walk and talk like a bookish human, the other with metallic, baby-like cuteness and it’s own language. 

Both were imagined, but still very tangible. And this imagery held staying power. This is how most of us still think about robots today. Follow the definition of robot and the following phrase surface, “a machine which resembles a human.” A phrase only followed by a description of the types of actions they actually undertake. 

Most robots today aren’t in the places we’d think to look based on sci-fi stories or dictionary definitions. Most robots come in two types: they’re sidekicks for desktop and server activities at work, or robots that scour the internet to tag and index web content.

All-in-all robots are typically still digital. Put another way, digital robots have come of age much faster than their mechanical cousins. 

(more…)

Read More

Stories By DQL: Tracking the Sentiment of a City


The story: sentiment of news mentions of Gaza fluctuate by as much as 2000% a week. 90% of news mentions about Minneapolis have had negative sentiment through the first week in June 2020 (they’re typically about 50% negative). Positive sentiment news mentions about New York City have steadily increased week by week through the pandemic.

Locations are important. They help form our identities. They bring us together or apart. Governance organizations, journalists, and scholars routinely need to track how one location perceives another. From threat detection to product launches, news monitoring in Diffbot’s Knowledge Graph makes it easy to take a truly global news feed and dissect how entities being talked about.

In this story by DQL discover ways to query millions of articles that feature location data (towns, cities, regions, nations).

How we got there: One of the most valuable aspects of Diffbot’s Knowledge Graph is the ability to utilize the relationships between different entity types. You can look for news mentions (article entities) related to people, products, brands, and more. You can look for what skills (skill or people entities) are held by which companies. You can look for discussions on specific products.
(more…)

Read More