Converting text documents into knowledge graphs with the Diffbot Natural Language API

Most of the world’s knowledge is encoded in natural language (e.g., news articles, books, emails, academic papers). It is estimated that 80 percent of business-relevant information originates in unstructured form, primarily text. However, the ambiguous nature of human communication makes it difficult for software engineers and data scientists to leverage this information in their applications.

After years of research, we are proud to announce the Diffbot Natural Language API, a new product to help businesses convert their text documents into knowledge graphs. Knowledge graphs represent information about real-world entities (e.g., people, organizations, products, articles) via their relationships with other entities (e.g., founded by, educated at, was mentioned in). This is the same production-grade technology that we use to build the world’s largest knowledge graph from the web, and we are making it available to all.


Businesses use the Natural Language API to:

  • Monitor the news and stay on top of developments in their industry, identify trends, and monitor the sentiment around their brand and competitors.
  • Explore and understand the contents of large collections of internal documents.
  • Build their own knowledge graphs, such as the largest genealogy of the Game of Thrones world built to date.

State of the Art Accuracy

We’ve benchmarked our NL API against all major natural language processing products (Google, Amazon, IBM Watson, Microsoft). You can read more on our Natural Language API page. In summary, we’ve come out ahead of all competition in terms of:

    • Entity Linking: Linking a name to its corresponding entity in a Knowledge Graph
    • Relation Extraction: Identifying the relationship between entities in the text (e.g., founded by, educated at, child, spouse)
    • Entity Sentiment: the sentiment of the author towards an entity. Example: “I love Apple products, but the Mac Pro is too pricey.” is positive towards Apple and negative towards the Mac Pro.

Powered by the Diffbot Knowledge Graph

The Natural Language API is powered by the Diffbot Knowledge Graph, a knowledge graph of the web containing over 10 billion entities, including people, organizations, locations, products, and articles. This integration allows businesses to get additional information about each entity mentioned in a text document, including images, textual descriptions, and hundreds of additional data points.

The integration with the Diffbot Knowledge Graph also enables the disambiguation of entities such as “Apple” (the company) vs “apple” (the fruit) since they have different unique identifiers in the Knowledge Graph. Ignoring entity disambiguation can have serious consequences. For instance, many stock market traders have invested in the ticker “ZOOM” (Zoom Technologies) when they intended to invest in the ticker “ZM” (Zoom Video Communications). In addition, trading robots are famous for confusing Anne Hathaway and Berkshire Hathaway.

Join Diffbot’s mission to democratize access to the world’s web data with a 14-day free trial, today!