Analyze Your Total Addressable Market (TAM) With Diffbot’s Knowledge Graph

Total addressable market (TAM) is the — hopefully — large figure that represents potential revenue for a given product or service. These figures are useful for fundraising, assessing market saturation, and the prioritization of opportunities.

In our recently published guide to writing a market intelligence report with the Knowledge Graph we worked through creating a report for a fictitious Acme Energy. Acme Energy provides backup energy services and energy disruption mitigation for hospitals. In this guide we’ll work through finding and visualizing three useful TAM-related datasets with Diffbot’s Knowledge Graph.

In particular, we’ll look at how you can quickly surface the datasets needed for the following three visualizations:


  • Access to Diffbot’s Knowledge Graph (find a free two week trial here)
  • Google Sheets (or equivalent spreadsheet software)
  • I’ll use Infogram to visualize the data. Feel free to use any charting tool with mapping capabilities.

Step One: Define Service Set

There are three ways to calculate TAM, one of the most straightforward (if you have existing products or services) is as follows:

  • (# of potential customers) x (annual contract value)

In our case let’s look at a hypothetical in which Acme Energy sells two service sets.

  • $5,000 ACV deals to hospitals with less than 500 employees
  • $100,000 ACV deals to hospitals with greater than 500 employees

Because we have two distinct sets of customers here, we’ll need to calculate both TAMs separately and add them together. In particular, we’ll need to calculate the following:

  • (# of hospitals with less than 500 employees) x $5,000
  • (# of hospitals with more than 500 employees) x $100,000

In the next step we’ll find our figures for the first portion of these formulas.

Step Two: Calculate Total Addressable Market

In Diffbot’s Knowledge Graph we can query for organizations based on specific firmographics. Both industries and number of employees are attached to organizations, which makes it easy to return the number of hospitals needed for our calculation. Below I’ll show two routes to obtaining your data. The first will utilize the visual query builder, which allows you to craft basic search queries in a beginner-friendly way. The second involves using Diffbot Query Language (DQL), which is slightly more involved, but allows for greater control over your query. New to using DQL? Start by simply pasting in the queries typed out below or check out our DQL Quick Start guide.

Using the Visual Query Builder

We can form an initial hospital query using a few fields: industries, nbEmployees, and location. Start by choosing the type of entity you want returned (organization). Then simply toggle the location to United States, the industry name to hospitals, and the nbEmployees to <=500.

One quick query returns over 100,000 results! To obtain the second group of hospitals (with greater than 500 employees), simply alter the nbEmployees field. Also of note to the right of the screen is the preview of your query. This shows you the DQL version of your query and is a great way to start familiarizing yourself with what this query language looks like.

Using Diffbot Query Language

While this visual query is a great starting point, this particular data set could use some more work. As I looked through the returned organizations I saw some veterinary hospitals, optometric clinics, and home health businesses returned. While these may in some senses be “hospitals,” they aren’t what we’re looking for here. This is an instance in which DQL comes in handy.

The eventual query I settled on specifies that we don’t want organizations who are in sometimes related industries to hospitals, and that “hospital” should be in the name of the organization returned. This seemed to provide the most reliable dataset.

type:Organization"United States" industries:"Hospitals" not(industries:or("optometrists","home health care","physiotherapy organization", "financial services companies")) name:"Hospital" nbEmployees>=500

This query returns 1,244 results, the number of large hospitals for one half of our TAM equation. By changing the nbEmployees to nbEmployees<=500 we can find our other number. Plugged into the equation this means that our TAM is as follows.

  • (1,244 x $100,000) + (11,151 x $5,000) = $180,155,000

While we could export all of this data, using DQL enables facet queries, which are a useful way to quickly summarize the results of a specific field. In this case we can use this to return a summary of which states provide the most TAM.

type:Organization"United States" industries:"Hospitals" not(industries:or("optometrists","home health care","physiotherapy organization", "financial services companies")) name:"Hospital" nbEmployees<=500

To obtain the complete dataset we'll yet again need to alter the nbEmployees field and then download the results. I ended up pulling both datasets into the same spreadsheet to perform the simple TAM arithmetic to all states at once.

After converting the number of large and small hospitals per state into state-by-state TAM, we can analyze the data as we wish. In my case I pulled the numbers into a data visualization tool to see which regions have the largest opportunities.

What we've done here is quickly survey the number of hospitals by location and size across the United States. This search wouldn't have been possible in consumer search engines. And it's a good starting point. But the general trend above is still similar to a population density map. Perhaps there's more we can do to surface where opportunity lies for our fictitious Acme Energy.

Step Three: Analyze Competitors

In case our initial query of small hospitals didn't show this to be the case, the Knowledge Graph excels at long tail (SMB and MMKT) information. We have over 250M organizations in total, with solid coverage worldwide and across many, many industries.

To show this at work, let's surface a dataset of Acme Energy's competitors and plot it on a similar map to our TAM by state graphic.

Using the Visual Query Builder

After several exploratory queries, the query that yielded the best results for competitors for Acme Energy relied on the description field. This field is a few sentence summary of what an organization does. While we can look at energy companies from an industry level, this is a much more general query. What we're after here are American companies who provide services related to backup power.

Our visual query builder results return 327 backup energy providers across the United States. Clicking through some of the organization's profiles, they offer the precise service set of Acme Energy. The only downside to using the visual query builder is that there is not presently the ability to facet (provide a summary view). This means that you would need to export the data to csv and do a small bit of data wrangling to determine the number of competitors by state.

Using Diffbot Query Language

With Diffbot Query Language we can use the same query as we generated with the visual query builder and simply add a facet statement to the end (similarly to how we faceted to gain TAM by state).

type:Organization description:"backup power""United States"

After exporting our facet view, we can move straight to visualization or analysis.

Step Four: Analyze Competitors By TAM

While our competitors map largely also follows population density (with the exception of New York), with some simple arithmetic we can gain an even clearer view of where opportunity may lie.

Using our datasets for TAM by state and competitors by state, we can simply divide the two to provide a general view of how much unclaimed market there is.

Loading the resulting data into the same format provides the following visualization:

While state-by-state location may not matter for some industries (say, SAAS), many market intelligence analyses go to great depth to obtain state-by-state data. In this case we've surfaced relative opportunity in North Dakota and Iowa that wasn't present in our initial data set.

Our Knowledge Graph is based on web-wide crawls that update our organization database every few days. Want to see what coverage is like for your industry? Try out a free two-week trial or contact sales for a customized demo!

Dear Diffy, Find Me A Coworking Space

Disclaimer: this article is about a very mundane consumer search. With this said, how knowledge work and fact accumulation are often performed have wide-reaching implications for knowledge work flows.

The other day I was searching for coworking spaces.

As in many domains of knowledge, data coverage online was largely human curated. Lists with some undisclosed methodology provided the writer’s favorite coworking spots by city.

Sure, search engines will return a list plotted to a map in any major search engine. But I’m sure we’ve all run into the following.

  1. Load map…
  2. Pan slightly to surface more results…
  3. Zoom slightly to surface more results…
  4. Pan the opposite direction to try and find a result that had caught our eye…
  5. Try to recall the name that caught our eye in a new search…

Five steps to seek further data points on a single search result. Devoid of context, data provenance, and the ability to analyze at scale.

Sure, consumer search works in many, many cases. So do phone books.

If you’re a power user, a data hoarder, or a productivity buff, you can likely see the appeal of a search that actually returns comprehensive data. If you’re building an intelligent application or performing market intelligence, using search that won’t let you explore the underlying data is just a waste of time.

So after this predictable foray in which I ignored the advice of several articles, scrolled around a map, and got sidetracked once or twice, I decided to resort to a different sort of search: Diffbot’s Knowledge Graph.


  • The title of our article may not make much sense if you haven’t been acquainted with Diffy, Diffbot’s web-reading bot
  • You see the promise of external web data for many applications… if it were structured (or at least felt disappointment at consumer search engines keeping you from public web data)

Opening the Knowledge Graph, it took all of 20 seconds to return data on over 4,000 coworking spaces. And sure, unless you’re selling a service to coworking space, you may wonder why anyone would need all this data as a personal consumer…

4000+ coworking space entities in ~20s

Maybe it’s simple curiosity. Maybe it’s the principle of it all; the fact that all of this information is publicly available online, but not in a structured format. Maybe this is just an analogy for non-consumer searches that also can’t be performed on major search engines. Any way you take it, search of the present is flawed for many uses, and it’s still our primary collective data source.

So what does search in the Knowledge Graph look like?

Well it starts with entities.

Knowledge graphs are built around entities (think people, places, or things) and relationships between entities. The types of relationships that can occur between entities, and the types of facts attached to entities are prescribed by a schema. One of the major “selling points” for knowledge graphs is that they have flexible schemas. That is — more so than other types of databases — they can adapt to what types of facts matter out in the world.

The Importance of Structured Web Data

At their core knowledge graphs (the category of graphs) can be built from any underlying data set. In the case of Diffbot’s Knowledge Graph, it’s the world’s largest structured feed of web data. Diffbot is one of only a handful of organizations to crawl the web. And using machine vision and natural language processing we’re able to pull out mentions of entities as well as infer facts and relationships.

Why is this important?

The web is largely made up of unstructured or semi-structured data. This means you can’t easily filter, sort, or manipulate this data at scale. While the internet is our largest collective source of knowledge, it’s not organized for modern knowledge work.

Diffbot’s products center around organizing the world’s information, whether through our AI-enabled web scrapers, our Knowledge Graph, or our Natural Language API. The ability to source the information from the web in a structured way provides the bedrock for machine learning initiatives, market intelligence, news monitoring, as well as the monitoring of large ecommerce datasets.

The State of Coworking Spaces As Told By AI

So what can you learn from a coworking space dataset that’s much more explorable than consumer search?

It turns out a lot.

While each individual data point is all available online, it’s not aggregated anywhere else in quite as explorable of a format.

In our case we can start with a simple facet query. Faceted search provides a summary view of the value of one fact type attached to a set of entities. So with this sort of query we can quickly discover what locations have the most coworking spaces.

By simply adding we can turn over 4,000 unique results into an observation. While data found about these coworking spaces across the web would be in many different formats (and in many languages), knowledge graphs help to consolidate similar entities around standard fields.

An additional strength of knowledge graphs is that data points can be consolidated from many different sources with data provenance and then built off of. Using natural language processing and machine learning, fields can be computed or inferred from many underlying data sources. Our original query looked at organization entities with “coworking spaces” as part of their description. But an AI-generated field of “descriptors” allows for additional granularity. Let’s look at a facet view of the most common services offered by coworking spaces.

Depending on your experience with a range of coworking spaces, descriptors such as “expat,” “civil & social organization,” or “self improvement” may be novel. By amalgamating tens of thousands of online mentions, articles, and entries into this subset of org entities, the Knowledge Graph dramatically cuts down on time of fact accumulation.

One final area in which consumer search is severely lacking (or just in practice unpractical) is that of market research. Industry-specific events such as funding rounds, openings of new offices, key executive hires or leavings, or clues as to private organization revenue can be hard to pinpoint across the web. Softer signals like sentiment around topics or velocity of news coverage can also be informative.

Diffbot’s article index is roughly 50x the size of Google News. Unlike traditional content channels, you aren’t presented with content that’s gamed the system or paid to get your attention. Additionally, where consumer search engines are siloed by language or location, Diffbot’s article index is pan-lingual. With articles augmented by additional filterable fields underlying articles can become unique observations on sentiment, key happenings, and more. All underlying article data is returned as well, supporting the ability to mine in once you’ve found an interesting angle.

For a deeper dive into creating custom news feeds around organizations and events be sure to check out our Knowledge Graph news monitoring test drive.


Maybe you don’t buy the segue from what really is a consumer search (“coworking spaces near me”) and the copious coworking data available in the Knowledge Graph. But the fact of the matter is that a great deal of knowledge work still relies on human fact accumulation. Without automated ways to structure unstructured data, there’s a definite floor to the cost per fact.

Knowledge graphs provide a bedrock for knowledge workflows reengineered from the ground up. In particular:

  • Knowledge graphs mirror what we care about “in the world” (entities and relationships)
  • Knowledge graphs provide flexible schemas allowing for fact types attached to entities to change over time (as the world changes)
  • Automated knowledge graphs provide one of the only feasible ways to structure market intel and news monitoring data that can be spread across the web
  • Knowledge graphs that don’t expose their underlying data aren’t suitable for use in intelligent applications or machine learning use cases
  • Knowledge graphs that provide additionally computed fields (sentiment, tags, inferences on revenue or events) provide additional value for market intelligence and news monitoring

The Top Coding Bootcamps For Founders According To The Knowledge Graph

Last week we took a look at the top universities for female founders. In our results, we noted that our web-reading AI associates tech bootcamp attendance with education, and a large cluster of founders attended specific universities in conjunction with bootcamps.

New to the Knowledge Graph? Diffbot’s Knowledge Graph is constructed by crawling a vast majority of the web and structuring data on pages using NLP and machine vision. The end result is one of the world’s largest databases of organizations, people, articles, products and more, all linked and with data provenance.

To return results from the Knowledge Graph, you submit queries which filter which entities to return. In this case we queried the Knowledge Graph to return individuals who:

  1. Attended an educational institution with the name of a top bootcamp
  2. Have held a job title including “CEO,” “chief executive officer,” or “founder”

We then returned a facet (summary) view of how many of these individuals attended each bootcamp.

Continue reading

The Best Schools For Female Founders According To The Knowledge Graph

Upon seeing Crunchbase’s annual ranking of the best schools for graduating entrepreneurs, we wanted to see how our Knowledge Graph results stack up.

The Diffbot Knowledge Graph is sourced from crawling a majority of the web and extracting entities and facts using NLP and machine vision.

Two prominent entity types are person and organization entities. When paired together powerful observations sourced from across the web are possible. In this exploration we returned all person entities within the Knowledge Graph who are currently founders and who are female. We filtered to make sure each organization had at least some publicly disclosed funding, and then we took a look at a summary view of which schools these founders had attended. You can check out the Knowledge Graph query here with a free trial.

While the top schools for female founders were consistent with Crunchbase’s coverage, you may wonder why the numbers vary so dramatically. Crunchbase’s ranking this year was looking at 2019-2020 graduates, and Crunchbase’s data is centered around tech and startup firmographics. While Diffbot’s Knowledge Graph certainly has firmographic details on tech-centered companies, our database of organizations is much wider ranging (over 250M+ orgs at last count). This means our list includes founders of all sorts of endeavors: non-profits, artistic organizations, medical organizations, and tech companies to name a few.

Continue reading

Startup Revenue By County With Diffbot’s Knowledge Graph

What can you do with billions of web-sourced facts on hundreds of millions of organizations? Beyond analyzing the facts themselves, you (or a machine of your choice) can learn a lot. Historically, our Knowledge Graph has had one of the largest collections of publicly-disclosed organization revenue. Recently, we’ve applied machine learning processes across many org fields to estimate revenue for private organizations as well.

Continue reading

Using the Knowledge Graph to Segment Big Tech Investments By Industry

Every big tech investment is big news. If your firm raises a funding round with prestigious investors or is acquired, you better bet you’ll spread the news far and wide.

But where can you go for this information en masse? Even covering a handful of big investors over a handful of years can lead to a list of thousands of invested in firms. And a list of firms themselves isn’t that useful. Sure, some big names pop out. But how do you see what “plays” big tech is making?

That’s where our web-reading bots come in. By working through billions of web pages using NLP and machine vision, Diffbot’s Knowledge Graph is the largest public-web sourced database of organizations, articles, people, products, and events. For each entity — organization, articles, people, etc. — facts are vetted and accumulated to create a filterable, searchable database of “things.” So when we wanted to check out which industries big tech has invested in over the last decade, we knew right where to turn. No analyst middlepersons, just public web data structured into a market intel-rich format.

Big Tech Investment By Industry 2010-2021

Distribution of industries of organizations invested in by Facebook, Alphabet, Amazon, Microsoft, Apple, and Netflix from 2010 to July 2021. Firmographic data sourced from Diffbot’s Knowledge Graph.
Continue reading

Context Matters, Tracking Quote Spread Across The Web In A Historic Year

Hindsight is 20/20. And as we usher in a new president in what has been one of the most tumultuous years in American history, we can begin to see clarity about the forces that moved throughout our jobs, our lives, and our collective imagination.

Another way to put this is that over time we tend to have more context.

Within Diffbot’s Knowledge Graph, one unique lens through which we can leverage the context of semantic data is by looking at the speakers of quotes.

When our AI reads articles it pulls out quotes, and when it can it attributes a speaker to these quotes. As our crawlers traverse the entirety of the public web, sources of quotes are validated and over time some quotes circulate more than others.

When performing a facet search, this lets us essentially show something like a retweet count for the entire web. This answers questions like whose voices are being heard? And what speakers are the most widely cited in a given topic?

To commemorate the end of an era, let’s take a look at a few of the most circulated statements of the last 365 days.

What were the 10 most circulated quotes across the web by President Joe Biden in the last 365 days?

Continue reading

Stories By DQL: Tracking the Sentiment of a City

The story: sentiment of news mentions of Gaza fluctuate by as much as 2000% a week. 90% of news mentions about Minneapolis have had negative sentiment through the first week in June 2020 (they’re typically about 50% negative). Positive sentiment news mentions about New York City have steadily increased week by week through the pandemic.

Locations are important. They help form our identities. They bring us together or apart. Governance organizations, journalists, and scholars routinely need to track how one location perceives another. From threat detection to product launches, news monitoring in Diffbot’s Knowledge Graph makes it easy to take a truly global news feed and dissect how entities being talked about.

In this story by DQL discover ways to query millions of articles that feature location data (towns, cities, regions, nations).

How we got there: One of the most valuable aspects of Diffbot’s Knowledge Graph is the ability to utilize the relationships between different entity types. You can look for news mentions (article entities) related to people, products, brands, and more. You can look for what skills (skill or people entities) are held by which companies. You can look for discussions on specific products.
Continue reading

Stories By DQL: George Floyd, Police, and Donald Trump

We will get justice. We will get it. We will not let this door close.

– Philonise Floyd, Brother of George Floyd

News coverage this week centered on George Floyd, police, and Donald Trump. COVID-19 related news continue to dominate globally.
That’s the macro story from all Knowledge Graph article published in the last week. But Knowledge Graph article entities provide users with many ways to traverse and dissect breaking news. By facet searching for the most common phrases in articles tagged “George Floyd” you see a nuanced view of the voices being heard.

In this story hopefully you can begin to see the power of global news mentions that can be sliced and diced on so many levels. Wondering how to gain these insights for yourself? Below we’ll work through how to perform these queries in detail.

How we got there: Diffbot’s Knowledge Graph holds hundreds of millions of article entities at any given moment. These articles are of truly global origins, and are parsed by our cutting-edge machine vision and natural language processing systems to take unstructured article data and transform it into structured, query-able entities.

Continue reading