Every Company That Sells Organization Data is Biased

Yes, even the biggest leaders in market intelligence. Even us.

Some focus solely on startups. Some only on venture-backed companies. But you probably wouldn’t even know. Because most won’t (or can’t) tell you what their data is biased towards! 🤭

“We have over 10M companies in our database!” is a meaningless statement if you can’t tell whether the data is a representative sample of Indian restaurants in the world, or perhaps more realistically, what they just happened to scrape.

Unless we’re talking at least 200M+ unique organizations strong, you’re looking at a biased dataset. And that’s still a conservative minimum.

This is common knowledge for data buyers, who make up for the lack of a known bias by evaluating datasets for known, easily verifiable data, like the Fortune 1000.

Given enough evaluation feedback cycles, most organization data brokers end up biased towards the Fortune 1000.

If your target is enterprise b2b, you’re in luck. You can find that data anywhere. Just check your spam folder.

If it’s anything even remotely more niched, like rubber gasket manufacturers or global non-profits focused on relieving poverty, you’re probably scraping this data yourself off a conference site.

And if your market intelligence application needs the closest thing to a truly representative sample of global organizations, it might seem impossible.

For data brokers, it just doesn’t make any sense to boil the ocean. It’s cheaper and easier to focus data entry resources on a few markets and whatever coverage gap feedback they get from lost deals.

Even if they did manage to compile all the companies on Earth, they would have to do it over and over again to keep their records fresh.

It’s an absurd and impractical human labor cost to maintain. So no one employs hundreds of people just to enter org data. Not even us.

We employ machines instead, which crawl millions of publicly accessible websites, interpret raw text into data autonomously, and structure each detail into facts on every organization known to the public web.

Which, as it turns out, is our known bias.

Read More

Is Christian Bale a Christian? Is Mitt Romney a glove?

Download This Dataset of 12,118 Yahoo Answers for $1

With only 2 weeks left till May 4th (be with you), the internet is bursting with excitement over all the work that needs to be done before Yahoo Answers finally 404s.

From scheduling a 2nd COVID vaccine to your annual panic attack at missing the tax filing deadline (you probably didn’t, it was extended to May 17 in the U.S.), there is nothing short of a lengthy agenda for everyone ahead of the shutdown of this iconic website.

(more…)

Read More

How to Estimate the Size of a Market with the Diffbot Knowledge Graph

Organizations are one of our most popular standard entities in the Diffbot Knowledge Graph, for good reason. Behind 200M+ company data profiles is an architecture that enables incredibly precise search and summarization, allowing anyone to estimate the size of a market and forecast business opportunity in any niche.

 

Pre-Requisites

 

Step 1 – Find Companies Like X

In a perfect world, every market and industry on the planet is neatly organized into well defined categories. In practice, this gets close, but not close enough, especially for niche markets.

What we’ll need instead is a combination of traits, including industry classifiers, keywords, and other characteristics that define companies in a market.

This is much easier to define by starting with companies we know that fit the bill. Think of it as searching for “companies like X”.

 
Box of Panettone cake

 

As an example, let’s start with finding companies like Bauducco, producer of this lovely Panettone cake. This is a market we’re hoping to sell say, a commercial cake baking oven to.

The closest definition of a market I might imagine for them is something like “packaged foods”. We could google this term and get some really generic hits for “food and beverage companies”, or we can do better.

We’ll start by looking this company up on Diffbot’s Knowledge Graph with a query like this

 
View In Knowledge Graphtype:Organization homepageUri:”bauducco.com”
 
Next, click through the most relevant result to a company profile.

Now let’s gather everything on this page that describes a company like Bauducco.

 
Diffbot company profile page for Bauducco

 

Under the company summary, the closest descriptor to their signature Panettone is “cakes”. Note that.

Under industries, they might be involved in agriculture to some degree, but we’re not really looking for other companies that are involved in agriculture. “Food and Drink Companies” will do!

That’s it.

Now that we have these traits, let’s construct a search query with DQL:

 
View In Knowledge Graphtype:Organization industries:"Food and Drink Companies" description:or("cakes", "cake")

Diffbot search results - 47,000 companies like Bauducco

 

Nearly 48,000 results! That’s a huge list of potential customers. Like the original google search, it’s a bit too generic to work with. Unlike results from Google though, we can segment this down as much as we’d like with just a few more parameters.

💡 Pro Tip: To see a full list of available traits to construct your query with, go to enhance.diffbot.com/ontology.

 

Step 2 – Remove Irrelevant Traits

What I’m first noticing is that there are a lot of international brands on this list. I’m interested in selling to companies like Bauducco in the U.S., so let’s trim this list to just companies with a presence in the United States.

 
View In Knowledge Graphtype:Organization industries:"Food and Drink Companies" description:or("cakes", "cake") locations.country.name:"United States"

Diffbot search results - companies like Bauducco in the U.S.
 

Note that there are two “location” attributes. A singular and a plural version. The plural version (“locations”) will match all known locations of a company. The singular version (“location”) will only match the known headquarters of a company.

Down to 8800 results. Much better. We’re not really interested in ice cream companies in this market either (after all, we’re selling a baking oven), so we’ll use the not() operator to filter ice cream companies out.

 
View In Knowledge Graphtype:Organization industries:"Food and Drink Companies" description:or("cakes", "cake") not(description:"ice cream") locations.country.name:"United States"
 

Let’s also say our oven is really only practical for large operations of at least 100 employees. We’ll add a minimum employee threshold to our query.

 
View In Knowledge Graphtype:Organization industries:"Food and Drink Companies" description:or("cakes", "cake") not(description:"ice cream") locations.country.name:"United States" nbEmployeesMin>=100


 

262 results. Now we’re really getting somewhere. Let’s stop here to calculate our total addressable market.

 

Step 4 – Calculate Total Addressable Market

To calculate TAM, we simply multiply the number of potential customers by the annual contract value of each customer.

TAM = Number of Potential Customers x Annual Contract Value

At a $1M average contract value with 262 potential customers, our TAM is approximately $262M.

This is just a starting point of course, we’ll want to assess existing competition, pricing sensitivity, as well as how much of the existing market would be willing to switch for our unique value proposition. We’ll leave that for another day.

 

Takeaways

Try replicating these steps for a market of your choosing. The ability to filter and summarize practically any field in the ontology provides limitless potential for market and competitive intelligence.

Need some inspiration? Here’re some additional examples:

Read More

How We Increased Our Lead Contact Rate by 46% with Diffbot Enhance

Hi! This is Jerome from Diffbot. You might’ve seen us around before. We’re known for our automatic extraction APIs, and our knowledge graph of the public web. Today, I’d like to introduce you to Diffbot Enhance, lead enrichment anywhere you need it.

Lead enrichment doesn’t get enough credit

When I first saw it in action, it looked like a gimmick - just fields populated in a CRM sold with shockingly pricey annual contracts up-sold alongside Salesforce.

Like keeping your personal address book up to date. Helpful? Sure. Necessary? Not really.

Sales always insists it’s helpful though. I didn’t get it.

Fast forward a few years, we noticed one day that 62% of our inbound leads never make it to a demo call. 62%! These are people who choose to ignore the self-start trial option, fill out a 6 field form, pass a captcha, and click a button that literally says request a demo.

Screenshot of sign up modal on Diffbot's homepage

(more…)

Read More