Articles by: Jerome Choo

Growth at Diffbot

If an API is free, you’re the marketing

Back in 2017, we wrote about why websites don’t have APIs. Mainly, that they’re expensive to maintain and opens a company up to all sorts of data security liabilities, especially if the API itself isn’t the core product, as is the case with Twitter and Reddit.

So why then do APIs even exist at all? Like the old adage about free products — if an API is free, you’re the marketing.

You receive free access to a data platform, they gain a channel for people to discover their product.

(Paid APIs are a different case. Unlike free APIs, there is a direct value exchange that should theoretically lead to a viable business model.)

For many high growth tech startups, this is a compelling strategy. Notion released their free API less than a year after raising a massive $275M round in 2021. Slack continues to release developer friendly updates to their free API as marketing for their ever growing integration directory, a core feature of their messaging product.

Let’s be clear — there’s nothing inherently wrong with free-because-marketing APIs. After all, you’re probably using these APIs for similar non-altruistic reasons. The promise of free instant distribution to a huge audience or access to on-demand compute is a tempting draw.

But common power dynamics suggest that you’re eventually likely to be at the losing end of this relationship the moment their marketing spend >> marketing results.

How do you know if building on a free-because-marketing API is the right move for you? There are some obvious considerations, like measuring engineering lift against outcomes and preparing fallback options. Feel free to google that if you want to read AI generated SEO garbage.

Personally, I’d like to focus on one of the most overlooked considerations when deciding to build on an API.

Identifying APIs with a reason to stick around

In my opinion, there are only 3 types of APIs with a reason to stick around.

  1. Paid APIs, like Stripe, where the API itself is the product you’re paying for
  2. Product Experience APIs, like Notion and Slack, where your integration directly improves the core product experience
  3. Non-Profit APIs, like, where access is intentionally provided for the public good.

Salesforce, the #1 API on Postman, is a great example of a product experience API that encourages 3rd party developers to improve the product experience with data extensions and workflows. They might have started out as “the cloud CRM” back in the day, but their value proposition today is more accurately summed up as “the CRM connected to everything.” They have a reason to stick around.

Be careful with some product experience APIs sharing generous data access. If your integration does not directly improve upon their core product experience, you might be left in the lurch the moment they stop feeling so generous. (RIP Apollo)

In a 2014 conference presentation, Netflix revealed that despite enabling public access to their catalog API, 11 years of public API requests amounted to just one day of private (internal) API requests. While presented as an engineering prioritization problem, the simple truth is that there is simply no justification for a marketing channel whose impact is a mere drop in the bucket. They stopped issuing new API keys and shut down the API entirely by the end of 2014.

A free, for-profit API had no reason to stick around. (Though it did improve the product experience)

You should feel fairly confident building on APIs that pass this “reason to stick around” sniff test. Watch out for those pesky acquisition shutdowns though. Weatherkit is just not the same.

Finally, allow me to plug Diffbot. We’re profitable, and have been around for 12 years now as a dependable platform of paid web data APIs powering everything between market intelligence apps like AlphaSense and consumer apps like Readwise. We have a reason to stick around.

By the way, we also give free access to students, which we reserve the right to pull the plug on if marketing spend >> marketing results. I’m dead serious. Like the title of this post, students are very much the marketing. That’s why we have paid plans for anyone commercially serious about using our APIs.

It’s a great deal for one-off student projects, and we get to share cool projects built with Diffbot (like this one by Julien!). If you make something cool you have my promise we’ll keep your token active at least through your job interviews. We’ll make something work, and not in a u/spez kind of way.

A Prompt Template For Structured News Summarization

In the 2002 movie Time Machine, Dr. Alexander Hartdegen, played by Guy Pearce, invents a time machine and travels forward in time to 2030. He stops by the New York Public Library and meets Vox-114, a holographic library assistant who is “connected to every database on the planet”. Vox-114 retrieves and summarizes facts conversationally, with a simple wave of a holographic hand. He even insists that time travel is not possible!

Well, we’re finally here with 7 years to spare. ChatGPT does it all, including the same reference Vox made to fiction when asked about time travel, but decidedly less tongue in cheek.

Except that ChatGPT is not “a compendium of all human knowledge”. It’s a language model trained on gargantuan stores of human knowledge to predict next word associations with human-like conversational precision.

Let’s try a contrived example using GPT-3.5

Oho! That actually looks pretty good. In fact, all of these headlines are real events, but the dates are garbage. 1 of them is correct, 3 off by a year, and 3 off by some months. Here’s how it breaks down (sources linked) —

  • In January 2015, Twitter introduced “While you were away” feature which shows users the most popular tweets they might have missed.
  • In April January/February 2015, Twitter announced its acquisition of live-streaming app Periscope.
  • In June 2015 July 2016, Twitter launched its first advertising campaign called “See What’s Happening” to promote the platform.
  • In August 2015 May 2016, Twitter made changes to its 140-character limit, allowing users to include images, GIFs, videos, and polls without affecting the character count. (Twitter did make changes on the character limit in August 2015, but it was specifically to DMs.)
  • In October 2015 October 2016, Twitter announced the shutdown of Vine, its short-form video platform.
  • In November October 2015, Twitter rolled out a new feature called Moments, which provides a curated collection of tweets on a specific topic.
  • In December October 2015, Twitter CEO Jack Dorsey announced that the company would be laying off 8% of its workforce in an effort to cut costs.

I don’t think Dr. Hartdegen would be impressed. But this isn’t breaking news, many of us are well aware by now that ChatGPT makes stuff up and isn’t knowledgeable of events beyond 2021.

In this prompt study, I constrain GPT’s fact recall to a trusted news graph, and take advantage of its language transformation capabilities to cluster and generate top line summaries of similar events. Response output is also formatted in JSON, making it easy to plug into data pipelines.

The technique can be applied to both point-in-time media research or real time monitoring. I will demonstrate how to do both.

ChatGPT talks to a knowledge graph

A common misunderstanding for ChatGPT’s lack of current event knowledge is that it lacks training from recent news.

While technically true, further training only serves to reinforce word patterns in its model, which bears the same limitations (lack of provenance and inaccurate dates) as the events it attempted to retrieve on the Twitter example above.

Instead, we will supply recent knowledge in the prompt, which will also enable GPT to understand and act on it structurally (e.g. citing from the corpus)

Let’s try to figure out what happened at Twitter in 2015 again. This time, we will provide a sample of 50 headlines mentioning Twitter in 2015, sourced from the Diffbot Knowledge Graph. Here is the DQL used to pull this sample:

type:Article title:'Twitter''Business' tags.{label:'Twitter' score>0.95} language:'en' date>="2015-01-01" date<="2015-12-31" sortBy:date

We’ll want to format the response as CSV and request the date and headline fields. Plug the CSV results of this query into a prompt template as follows:

The following is a list of headlines related to Twitter each with a date attached. Generate a list of the top 5 things that happened at Twitter based on these headlines alone. Use the following forma for each item on the list:

On <March 11, 2015>, <summary of what happened>.

When reviewing these headlines, ignore stories, gossip, editorials, opinions, politics, or any headlines not related to a decision or action made by Twitter the company. Focus only on headlines that could exist on a Twitter press release. Do not hallucinate.

Order the list by earliest to latest.

2015-03-11	Twitter updates its rules to specifically ban ‘revenge porn’
2015-01-07	The Story of Twitter's Fail Whale
2015-11-23	Bezos tweets! Twitter feud with Warren Buffett next?
2015-06-11	Twitter's Dick Costolo (briefly) got richer by quitting
2015-10-04	Twitter names Jack Dorsey as CEO
2015-06-06	Here's an Android app that gives people in censored countries access to Twitter
2015-11-02	Twitter ditches stars and favorites for hearts and likes
2015-10-05	Twitter Names Co-Founder Jack Dorsey CEO
2015-10-13	Why Twitter Is Laying Off 8 Percent of Its Employees
2015-03-26	Twitter's Periscope Live Streaming App Makes Everyone a Reality Star
2015-12-21	How Jack Dorsey Runs Both Twitter, Square
2015-07-26	When will Twitter name a new CEO?
2015-09-15	Twitter Courts U.S. Presidential Campaigns With New Donations Service
2015-11-03	Inside Twitter's big diversity problem
2015-06-11	Twitter (TWTR) CEO Dick Costolo Stepping Down
2015-07-21	Twitter throws frat-themed party in midst of discrimination suit
2015-06-22	Twitter Says Its New Chief Must Work Full Time
2015-12-17	Twitter blows up over Martin Shkreli's arrest
2015-08-09	#Touchdown! NFL partners with Twitter
2015-09-02	Twitter could name its new CEO today
2015-07-11	Twitter Accidentally Made Scott Walker a Presidential Candidate Ahead of Schedule
2015-10-13	Twitter just hired Google's $130 million man
2015-10-26	Twitter still hasn't found its groove - stock tanks
2015-10-06	Saudi prince now owns 5% of Twitter
2015-07-27	Conan O'Brien accused of stealing jokes from Twitter
2015-10-05	Jack Dorsey Will Return As Twitter CEO
2015-08-19	#EpicFail: Twitter falls below $26 IPO price
2015-07-13	Twitter shares soar on phony Bloomberg story
2015-03-09	Twitter Acquires Live-Video Streaming Startup Periscope
2015-01-26	Twitter Chat on the Internet of Things
2015-03-12	Twitter bans 'revenge porn'
2015-06-03	Big Twitter investor Chris Sacca explains what the company needs to do next
2015-06-01	A Twitter bot has spent the entire day scolding people who are talking about Caitlyn Jenner
2015-02-05	Twitter’s battle against abuse has high stakes for the company
2015-01-15	New York Post and UPI Twitter accounts hacked
2015-06-11	For Twitter, Future Means Here and Now
2015-07-26	Kim Kardashian tweets and Twitter's boss listens
2015-04-28	Twitter Shares Tumble After Nasdaq Leaks Results Early
2015-09-18	Twitter Helps Coke With Ad ‘Emoji’
2015-10-12	Twitter cuts 8% of its workforce
2015-03-01	ISIS supporters threaten Twitter founder Jack Dorsey
2015-12-15	Twitter is too cool for school. It must go mainstream
2015-07-31	James Woods Sues Twitter Troll for $10 Million Over ‘Cocaine Addict’ Tweet
2015-10-14	The 129 finance people you have to follow on Twitter
2015-04-02	Chelsea Manning joins Twitter and gets over 1,000 followers before posting
2015-10-12	Did the NFL kill Twitter accounts for Deadspin, SB Nation?
2015-04-28	Will Dick Costolo's Twitter Keep Disappointing Investors?
2015-06-10	Twitter CEO Dick Costolo quits
2015-02-17	Boss Talk: Twitter Safety Officer Talks Building Trust

This is GPT-3.5’s response:

Based on the provided headlines, here are the top 5 things that happened at Twitter:
1. On March 11, 2015, Twitter updated its rules to specifically ban 'revenge porn.'
2. On June 11, 2015, Twitter CEO Dick Costolo announced his resignation.
3. On October 4, 2015, Twitter named Jack Dorsey as CEO.
4. On October 5, 2015, it was officially confirmed that Jack Dorsey would be the full-time CEO of Twitter.
5. On November 2, 2015, Twitter replaced stars and favorites with hearts and likes.

Note: The list is ordered chronologically based on the headlines.

A great start. It was at this point where I proceeded to spend two frustrating weeks getting ChatGPT to actually cite from my provided sources, a surprisingly complicated task.

I’ll spare you the bitter details of dead ends and square ones. Instead, here is what I learned:

  • Instruction order matters. If you are providing context, state this before dropping in the context. Always end with instructions.
  • ChatGPT will error with too much text in the prompt. A rookie mistake. Don’t stuff your prompt with thousands of headlines. More below.
  • ChatGPT will ignore your instructions if there is too much context. Even if your prompt is within the token limit, providing too much context in your prompt will cause ChatGPT to ignore your instructions. I’ve found the sweet spot to be roughly 60 headlines.
  • Lose the URLs. Indexes or IDs work better. URLs in your context dump will just take up precious token space. ID your sources to save on tokens.

Here is the prompt that finally got ChatGPT to cite from my provided headlines:

The following is the CSV output of a search for articles on the company FTX:

FTX collapse: Black Swan really happen!,ART195719715414,1668614400000
"FTX did not have permission to take on Dutch clients, says central bank",ART6137214573,1668614760000
<...and 58 more headlines>

Based on the above headlines, summarize 1-4 major events that happened at FTX in a JSON array of events with each event having a key name, date (in its original epoch time date format), and a JSON array of citationIds to the relevant articles. Cluster similar headlines to the same event, even if they may not occur on the same date. If they do not have the same date, use the earliest known date. Ignore headlines about FTX's stock price, stock performance, dividends, or market performance. Also ignore headlines about gossip, social commentaries, thoughts, potentials, explorations, hearsay, maybes, and opinions. Summaries should include a verb. Only return the JSON, no further text or explanation, do not use markdown.

With this prompt, ChatGPT returns the following response:

    "name": "FTX Collapse",
    "date": 1668614400000,
    "citationIds": ["ART195719715414"]
    "name": "FTX Did Not Have Permission to Serve Dutch Clients",
    "date": 1668614760000,
    "citationIds": ["ART6137214573"]
    "name": "Celebrities Named in FTX Suit",
    "date": 1668615900000,
    "citationIds": ["ART257680608131"]
    "name": "FTX's Bankruptcy Explained",
    "date": 1668616440000,
    "citationIds": ["ART226107828117"]
    "name": "Billionaire Mike Novogratz Warns of More Failures",
    "date": 1668617820000,
    "citationIds": ["ART149839769712"]
    "name": "Coinbase CFO Anticipates Contagion Impact of FTX Collapse",
    "date": 1668620400000,
    "citationIds": ["ART127028717456", "ART246501061069"]
    "name": "FTX Investors Sue Bankman-Fried Over Yield-Bearing Crypto Accounts",
    "date": 1668621600000,
    "citationIds": ["ART237167380194"]
    "name": "Relief Fund Established for Partners Affected by FTX Collapse",
    "date": 1668623761000,
    "citationIds": ["ART76130626106", "ART211198078567"]
    "name": "FTX Management and Bahamas in Legal Battle Over Bankruptcy Control",
    "date": 1668628260000,
    "citationIds": ["ART112350911287"]
    "name": "Former SEC Official Doubts FTX Crash Will Lead to Crypto Regulations",
    "date": 1668624600000,
    "citationIds": ["ART35623621602"]

This prompt transforms 60 raw headlines to 10 top line summaries citing real news article sources without hallucinations. It also formats the response into a JSON array! We finally have our working proof of concept.

This is our boilerplate prompt. From here, we can also filter the output to specific types of events.

Traditionally this would mean setting up complex layers of string filters and even classification models to identify different types of articles. With GPT, we can simply tune the instruction with plain English to highlight only the events we care about.

Let’s take it back to Twitter and focus on top lines from the last week.

First, tweak the DQL to query for headlines in the news within the last 7 days.

type:Article title:'Twitter''Business' tags.{label:'Twitter' score>0.95} language:'en' date<=7d sortBy:date

Then we’ll adjust the instruction of the prompt to only top line revenue related headlines on Twitter. Here is the new instruction snippet and corresponding response:

Based on the above headlines, summarize any major events that happened at Twitter in an array of JSONs with each JSON item having a key name, date (in its original epoch time date format), and a JSON array of citationIds to the relevant articles. Cluster similar headlines to the same event, even if they may not occur on the same date. If they do not have the same date, use the earliest known date. 

Only summarize headlines directly related to revenue. If there are no headlines related to revenue, return an empty array. Summaries should include a verb. Only return the JSON, no further text or explanation, do not use markdown.
    "name": "Twitter's ad revenue plummets",
    "date": 1685990580000,
    "citationIds": [
    "name": "Twitter bans over 25 lakh accounts in India",
    "date": 1685685720000,
    "citationIds": [
    "name": "Twitter Value Collapses After Fidelity Writedown",
    "date": 1685634000000,
    "citationIds": [

How about something trickier? Like politics?

Based on the above headlines, summarize any major events that happened at Twitter in an array of JSONs with each JSON item having a key name, date (in its original epoch time date format), and a JSON array of citationIds to the relevant articles. Cluster similar headlines to the same event, even if they may not occur on the same date. If they do not have the same date, use the earliest known date. 

Only summarize headlines directly related to politics. If there are no headlines related to politics, return an empty array. Summaries should include a verb. Only return the JSON, no further text or explanation, do not use markdown.
    "name": "Twitter's Disagreement with EU on Disinformation Code",
    "date": 1686061842000,
    "citationIds": [
    "name": "Senators Warning Twitter and Elon Musk over Data Privacy",
    "date": 1685951340000,
    "citationIds": [

One more go, this time we’ll give it a curveball.

Based on the above headlines, summarize any major events that happened at Twitter in an array of JSONs with each JSON item having a key name, date (in its original epoch time date format), and a JSON array of citationIds to the relevant articles. Cluster similar headlines to the same event, even if they may not occur on the same date. If they do not have the same date, use the earliest known date. 

Only summarize headlines directly related to sushi. If there are no headlines related to sushi, return an empty array. Summaries should include a verb. Only return the JSON, no further text or explanation, do not use markdown.

Being explicit is important here. ChatGPT loves to please. Without the option to return an empty array, ChatGPT becomes highly unpredictable. In my testing, I got it to hallucinate sushi related headlines while simultaneously informing me that there are no sushi related headlines.

How do I use this?

Check out the Github repo for some code examples in Python that can be easily translated into a news monitoring workflow.

Not a developer? Stay tuned for a feature I’m building in LeadGraph that uses this technique to summarize and highlight the latest headlines from your target accounts.

Bonus: Examining the rise and fall of FTX

If we can reliably summarize the top lines from a blob of 60 headlines, what would it look like if we ran this prompt across all known articles on a company like FTX?

I hoped to generate something close to the timeline walls you see in history museums.

And boy did I.

The script takes a single input – the name of an organization – and summarizes the major events within blobs of headlines. Here are the high level order of operations:

  1. Enhance the org name with Diffbot KG to obtain a foundingDate
  2. Use the foundingDate as a start date in our Diffbot News Graph article query mentioning the company (60 at a time)
  3. Plug the 60 headlines into a request to the chat completions OpenAI endpoint using the gpt-3.5-turbo model
  4. Write GPT’s JSON response into a jsonl file
  5. Loop steps 2-4 until there are no articles left

The same Github repo includes a Python script to reproduce this yourself. You will need an OpenAI API token as well as a Diffbot token. A warning — processing 60 headlines at a time takes awhile, but the results are stunning.

Automatically Enrich Your Pipeline Without Code

Congratulations! You’ve reached the problem past you decided would be a “good” problem to have for future you — managing the flood of thousands of inbound leads.

What started as a trickle of leads is now a new full-time job of googling, validating, scoring, and assigning each lead to the right account rep. It’s hardly sustainable, and you’re finally putting your foot down to do something about it.

There are options of course. Everything from your $20,000+ auto-renewing annual contracts with big sales intelligence data platforms promising to solve all your data woes to hiring an intern. None of which will solve your problem today, and as it turns out, you’ll probably still end up doing the majority of the work (I know I did…).

Thankfully these days, no code platforms like exist to solve this. Databar’s no code API connector allow you to automate your revenue operating system without touching a single line of code. And with Diffbot’s latest partnership with, you can now enrich your thousands of inbound leads with facts from Diffbot’s Knowledge Graph in just a few clicks.

Screenshot of Diffbot’s integration on

What can I do with Databar & Diffbot?

Starting today, Databar will offer the following enrichments

Upload a CSV into’s familiar spreadsheet interface and follow the prompts to deploy any of these enrichments.

Do I need a Diffbot account to use Databar?

Nope! If all you need automated are the enrichments listed above, a Diffbot account is not required to enhance your leads. However, if you wish to dive deeper into all the possible ways to enhance your database of accounts and contacts directly, contact us at

How do I get started?

Sign up for here for free.

Every Company That Sells Organization Data is Biased

Yes, even the biggest leaders in market intelligence. Even us.

Some focus solely on startups. Some only on venture-backed companies. But you probably wouldn’t even know. Because most won’t (or can’t) tell you what their data is biased towards! 🤭

“We have over 10M companies in our database!” is a meaningless statement if you can’t tell whether the data is a representative sample of Indian restaurants in the world, or perhaps more realistically, what they just happened to scrape.

Unless we’re talking at least 200M+ unique organizations strong, you’re looking at a biased dataset. And that’s still a conservative minimum.

This is common knowledge for data buyers, who make up for the lack of a known bias by evaluating datasets for known, easily verifiable data, like the Fortune 1000.

Given enough evaluation feedback cycles, most organization data brokers end up biased towards the Fortune 1000.

If your target is enterprise b2b, you’re in luck. You can find that data anywhere. Just check your spam folder.

If it’s anything even remotely more niched, like rubber gasket manufacturers or global non-profits focused on relieving poverty, you’re probably scraping this data yourself off a conference site.

And if your market intelligence application needs the closest thing to a truly representative sample of global organizations, it might seem impossible.

For data brokers, it just doesn’t make any sense to boil the ocean. It’s cheaper and easier to focus data entry resources on a few markets and whatever coverage gap feedback they get from lost deals.

Even if they did manage to compile all the companies on Earth, they would have to do it over and over again to keep their records fresh.

It’s an absurd and impractical human labor cost to maintain. So no one employs hundreds of people just to enter org data. Not even us.

We employ machines instead, which crawl millions of publicly accessible websites, interpret raw text into data autonomously, and structure each detail into facts on every organization known to the public web.

Which, as it turns out, is our known bias.

Download This Dataset of 12,118 Yahoo Answers for $1

With only 2 weeks left till May 4th (be with you), the internet is bursting with excitement over all the work that needs to be done before Yahoo Answers finally 404s.

From scheduling a 2nd COVID vaccine to your annual panic attack at missing the tax filing deadline (you probably didn’t, it was extended to May 17 in the U.S.), there is nothing short of a lengthy agenda for everyone ahead of the shutdown of this iconic website.

Continue reading

How to Estimate the Size of a Market with the Diffbot Knowledge Graph

Organizations are one of our most popular standard entities in the Diffbot Knowledge Graph, for good reason. Behind 200M+ company data profiles is an architecture that enables incredibly precise search and summarization, allowing anyone to estimate the size of a market and forecast business opportunity in any niche.


Step 1 – Find Companies Like X

In a perfect world, every market and industry on the planet is neatly organized into well defined categories. In practice, this gets close, but not close enough, especially for niche markets.

What we’ll need instead is a combination of traits, including industry classifiers, keywords, and other characteristics that define companies in a market.

This is much easier to define by starting with companies we know that fit the bill. Think of it as searching for “companies like X”.

Box of Panettone cake

As an example, let’s start with finding companies like Bauducco, producer of this lovely Panettone cake. This is a market we’re hoping to sell say, a commercial cake baking oven to.

The closest definition of a market I might imagine for them is something like “packaged foods”. We could google this term and get some really generic hits for “food and beverage companies”, or we can do better.

We’ll start by looking this company up on Diffbot’s Knowledge Graph with a query like this

type:Organization homepageUri:””

Next, click through the most relevant result to a company profile.

Now let’s gather everything on this page that describes a company like Bauducco.

Diffbot company profile page for Bauducco

Under the company summary, the closest descriptor to their signature Panettone is “cakes”. Note that.

Under industries, they might be involved in agriculture to some degree, but we’re not really looking for other companies that are involved in agriculture. “Food and Drink Companies” will do!

That’s it.

Now that we have these traits, let’s construct a search query with DQL:

type:Organization industries:"Food and Drink Companies" description:or("cakes", "cake")

Diffbot search results - 47,000 companies like Bauducco

Nearly 48,000 results! That’s a huge list of potential customers. Like the original google search, it’s a bit too generic to work with. Unlike results from Google though, we can segment this down as much as we’d like with just a few more parameters.

💡 Pro Tip: To see a full list of available traits to construct your query with, go to

Step 2 – Remove Irrelevant Traits

What I’m first noticing is that there are a lot of international brands on this list. I’m interested in selling to companies like Bauducco in the U.S., so let’s trim this list to just companies with a presence in the United States.

type:Organization industries:"Food and Drink Companies" description:or("cakes", "cake")"United States"

Diffbot search results - companies like Bauducco in the U.S.

Note that there are two “location” attributes. A singular and a plural version. The plural version (“locations”) will match all known locations of a company. The singular version (“location”) will only match the known headquarters of a company.

Down to 8800 results. Much better. We’re not really interested in ice cream companies in this market either (after all, we’re selling a baking oven), so we’ll use the not() operator to filter ice cream companies out.

type:Organization industries:"Food and Drink Companies" description:or("cakes", "cake") not(description:"ice cream")"United States"
Let’s also say our oven is really only practical for large operations of at least 100 employees. We’ll add a minimum employee threshold to our query.

type:Organization industries:"Food and Drink Companies" description:or("cakes", "cake") not(description:"ice cream")"United States" nbEmployeesMin>=100

262 results. Now we’re really getting somewhere. Let’s stop here to calculate our total addressable market.

Step 4 – Calculate Total Addressable Market

To calculate TAM, we simply multiply the number of potential customers by the annual contract value of each customer.

TAM = Number of Potential Customers x Annual Contract Value

At a $1M average contract value with 262 potential customers, our TAM is approximately $262M.

This is just a starting point of course, we’ll want to assess existing competition, pricing sensitivity, as well as how much of the existing market would be willing to switch for our unique value proposition. We’ll leave that for another day.


Try replicating these steps for a market of your choosing. The ability to filter and summarize practically any field in the ontology provides limitless potential for market and competitive intelligence.

Need some inspiration? Here’re some additional examples:

How We Increased Our Lead Contact Rate by 46% with Diffbot Enhance

Hi! This is Jerome from Diffbot. You might’ve seen us around before. We’re known for our automatic extraction APIs, and our knowledge graph of the public web. Today, I’d like to introduce you to Diffbot Enhance, lead enrichment anywhere you need it.

Lead enrichment doesn’t get enough credit

When I first saw it in action, it looked like a gimmick - just fields populated in a CRM sold with shockingly pricey annual contracts up-sold alongside Salesforce.

Like keeping your personal address book up to date. Helpful? Sure. Necessary? Not really.

Sales always insists it’s helpful though. I didn’t get it.

Fast forward a few years, we noticed one day that 62% of our inbound leads never make it to a demo call. 62%! These are people who choose to ignore the self-start trial option, fill out a 6 field form, pass a captcha, and click a button that literally says request a demo.

Screenshot of sign up modal on Diffbot's homepage

Continue reading