Articles by: Diffy

Quasi-sentient robot. Stares at web pages all day.

Converting text documents into knowledge graphs with the Diffbot Natural Language API

Most of the world’s knowledge is encoded in natural language (e.g., news articles, books, emails, academic papers). It is estimated that 80 percent of business-relevant information originates in unstructured form, primarily text. However, the ambiguous nature of human communication makes it difficult for software engineers and data scientists to leverage this information in their applications.

After years of research, we are proud to announce the Diffbot Natural Language API, a new product to help businesses convert their text documents into knowledge graphs. Knowledge graphs represent information about real-world entities (e.g., people, organizations, products, articles) via their relationships with other entities (e.g., founded by, educated at, was mentioned in). This is the same production-grade technology that we use to build the world’s largest knowledge graph from the web, and we are making it available to all.

Continue reading

Meet Diffy, The Data Extraction Robot


As a symbol of our organization, Diffy, whose sticker is highly coveted by fans, may someday become as iconic as other company mascots. Although not always physically present for events, Diffy is a brand ambassador who is instantly recognizable (at least with the Diffbot team).

First and foremost, what are you?

Continue reading

Welcome Tomer Balan – Senior Software Engineer

Hi there!

My name is Tomer, and I’m a new Senior Software Engineer at Diffbot! I am from Tel Aviv, Israel, a city with beautiful beaches, restaurants, bars, and a lot of technology.

As far back as I can remember, I have always been passionate about identifying and solving business problems using technology. For that reason, I chose to study Industrial and Management Engineering at Ben-Gurion University.

While working as a Data Solutions Engineer at eBay, I signed up for a Diffbot trial and was amazed by the product and technology. I found Diffbot’s technology to be helpful for many use-cases in the company. By day, I used the Analyze and Product API’s, and by night, I investigated the new Knowledge Graph. Diffbot helped eBay reduce 30% of its manual work which lowered expenses and saved time. As a former Diffbot customer, I was fortunate to have been able to develop a great connection with the CEO, Mike and the rest of the team.

I am proud to be one of the first customers to join the Diffbot team. One of my main responsibilities at Diffbot is to build tools that enable non-technical users to understand our technology. I will be working on some very exciting projects so stay tuned!

In my free time, you can find me playing poker (online and offline), playing volleyball on the beach, watching European soccer, hanging out with friends and traveling (17 countries and counting!).  

I look forward to my new adventures and challenges at Diffbot!

Welcome Priya Venkateshan – Senior Software Engineer

Hi, my name is Priya, and I’m a Senior Software Engineer at Diffbot.

I grew up in Bangalore, India, and since everyone around me was getting into programming, I did too; though I secretly wanted to be in medicine or journalism. However, thanks to the professors and peers at NITK Surathkal, I discovered machine learning, got really into it, and was soon trying to learn information retrieval concepts from grainy MIT OpenCourseWare videos. Soon after, I enrolled in UC Irvine’s master’s program in Computer Science and enjoyed the wide variety of cutting-edge courses in machine learning concepts. During my time there, I worked on a thesis involving graphical models for coreference resolution.

My experience in the industry, especially on eCommerce data, has taught me about the importance of domain knowledge and scale in making machine learning models useful in a real-world context, as well as given me an interest in best practices regarding supporting data and models in production. Prior to Diffbot, I worked at eBay, where we crawled product websites to augment the eBay catalog. Diffbot was one of the tools our organization used towards this goal. Having worked on web crawling and data extraction in a specific domain, it is exciting to work on the generalized version of those problems at Diffbot, especially since I get to impact a state-of-the-art, commercial knowledge graph and work with some of the best minds focused on these problems.

I write and edit fiction in my spare time, where I use technology to make life easier for fiction writers.

Welcome Rick Deininger – Technical Support Engineer

Hello! My name is Rick Deininger and I am the newest member of the technical support team at Diffbot.

I began working with computers when I was very little. However, as I grew up, I had less access to computers so I was not able to pursue those interests as much. After graduating high school, I attended several schools for widely-varied fields such as Systems Administration, Computer Engineering, and Technical Writing. Unfortunately, various circumstances prevented me from completing a degree.

Eventually, I suffered a serious illness that forced me to take some time off from pursuing an education and career. During this time, I operated online businesses, and built small websites and scripts to support them. None of these businesses were ever a big success, but in the course of developing them, I did a lot of web scraping and learned several languages and technologies including Ruby, Python, JS, PHP, Selenium, and various kinds of web servers and databases.

A few years ago I had an opportunity to go back to school for a couple of years, and I did so with the hope that it would allow me to break back into the IT field. I attended De Anza College and completed 3 AA degrees simultaneously, including one in Network Administration and another in Information Security. During this time, I had also studied to get my AWS Solutions Architect certification, which I completed a few months after graduating.

I had never heard of Diffbot before finding the posting for this role online. What first attracted me to the company was its focus on web scraping, a subject that has at times been both my “bread and butter” and a bit of a hobby. However, as I have gained knowledge about the company and its product offerings, I have developed an increasing interest in the Diffbot Knowledge Graph, which seems sure to one day be a cornerstone of the IT industry. My mind is boggled by the sheer amount of data that Diffbot has mined through machine learning and natural language processing, and I can’t wait to see all the ways that other businesses are utilizing this data and this service to enhance their own product offerings.

I will be remotely supporting Diffbot’s products and APIs for their various customers. I have developed strong written communication and troubleshooting skills over the years, and I hope to bring these skills to bear to provide a high level of support to Diffbot’s customers.

Welcome Dimitris Kontokostas – Software Engineer

Hello there! I am Dimitris, and I am a new Software Engineer at Diffbot!

As the name implies, I am Greek. In particular, I am from the northern part of the mainland (far away from the beautiful beaches). I hold a degree in Electronics & Computer Engineering from the Technical University of Crete, a MSc in Web Science from Aristotle University of Thessaloniki, and a PhD on Large-scale Knowledge Extraction & Quality Assessment from the University of Leipzig. Through my MSc and PhD programs, I was exposed to knowledge graphs and felt really fascinated by the field. I have been working with knowledge graphs for almost 10 years now, and I am still amazed by the new possibilities they can unlock.

The Diffbot Knowledge Graph is really intriguing. It operates on web-scale using state of the art research on knowledge extraction and expands way beyond notable entities (e.g. celebrities, established companies, etc) that most knowledge graphs are limited to. I felt this was a perfect match for my research and professional skills and a great challenge to work on. My primary focus at Diffbot will be data quality and knowledge fusion. I will extend existing approaches and create new techniques to further improve & evaluate the accuracy of our knowledge graph.

In my free time, I enjoy spending time with family & friends, playing basketball, running and skiing.

I look forward to my new knowledge graph adventures at Diffbot!

Welcome Julia Wiedmann – Machine Learning Engineer (Research)


Hi there! My name is Julia and I am a Machine Learning Research Engineer at Diffbot.

Born and raised in Vienna, Austria, I moved to London to study Business. While I enjoyed my degree, I realized I was much more interested in technology, so I taught myself enough coding and math to qualify for a Masters degree in Computer Science at University College London (UCL). I recently finished my PhD at the University of Oxford, primarily developing new methods for machine learning-based web data extraction which can extract data from new pages at scale and without site-level supervision. My research interests widely overlap with Diffbot’s work, so I interned here last summer building the EventsAPI. The EventsAPI is a new addition to the Automatic API’s, which automatically extracts events from the open web. This allows events to be added to the Diffbot Knowledge Graph and customers to use the API directly. Now as an excited full timer, I am focusing on improving and expanding the Automatic API extraction methods.

In my free time I love all sorts of sports, such as cycling, skiing, hiking, and running.

I look forward to more adventures at Diffbot!


Welcome Lorenzo Torres – Technical Sourcer

I received my Bachelor’s Degree from University of Southern California (Fight On!) and then became a food truck driver and manager for a waffle dessert company in Pasadena, CA called Waffles de Liege. After 3 years, I felt that I wasn’t doing anything fulfilling and realized that I wanted to make an impact and difference in people’s lives. I left the industry and spoke to friends and family about the next career. They had mentioned recruiting and how it can change a person’s a life, as well as make a difference in the proper field. I went to various recruiting networking events and created my position with a start up called FabFitFun via their “create your job” submission. Impressed by my background, they hired me as an intern and then converted me full time into a Recruiting Coordinator. From there, I went to Robert Half for half a year for legal recruiting, and then a smaller tech recruiting agency called Strategic Employment Partners for Technical Sourcing. With minimal sourcing knowledge and only 7 months into the company, I received the 2017 Recruiter of the Year Award. I was then transferred to their New York branch, and once again received the 2018 Recruiter of the Year Award. I now find myself for the first time in an in house Technical Sourcer position with an awesome team, product, and company vision I can truly be behind.

Outside of work, I love to sing with friends and family via covers online and sometimes karaoke (was in choir when I was 10 years old all the way up to 19, and also trained formally), eating a lot (trying new restaurants and cooking), watching WWE/UFC/anything entertaining, play Dance Dance Revolution (used to compete), and just having a good ole time with friends and family.

I’m excited to help grow out the teams here in Diffbot, and can’t wait for what the future holds for all of us.

Welcome Paramita Mirza – Machine Learning Engineer Intern


Hello there! I’m Paramita (sounds similar to ‘parameter’) and I’m excited to be part of the Relation Extraction research team at Diffbot.

I received my PhD degree from the University of Trento/FBK-ICT in spring 2016. Under the supervision of Sara Tonelli, my PhD research focused on extracting temporal and causal relations between events from natural language texts, as part of the NewsReader project. I then joined Gerhard Weikum‘s group at Max Planck Institute for Informatics, where YAGO (Yet Another Great Ontology) Knowledge Base was developed, as a Postdoc. My research interests have always been revolved around Information Extraction and Machine Learning approach for it, making sense of unstructured text and building structured knowledge out of it. 

I learned more about Diffbot when I attended ISWC’18 in Monterey, and found out that they’re on their steadfastly way on building a gigantic knowledge graph out of the whole web. I got hooked and decided to temporarily escape rainy Germany to sunny California. Here at Diffbot, I’ll be working on extracting temporal qualifiers for facts extracted to build Diffbot Knowledge Graph, making sure that we can differentiate fresh facts from the historical ones. I would love to hear your ideas and suggestions on how to make that happen!

Turn Existing Customer Data into Fresh Marketing Opportunities with Knowledge Graph

I wanted to use our own tech and show that you can cross-reference your sales data with the 10+ billion entities stored in Diffbot Knowledge Graph, to find marketing opportunities with a little #KnowledgeHack. I wasn’t disappointed with what I found.

Because the Diffbot Knowledge Graph (KG) focuses on people, companies, and location data, I wanted to see how it could help me target the right people with a timely message via one of the major ad platforms like Facebook, AdWords, or LinkedIn.

This “how-to” guide shows you, step by step, how I used the Diffbot Knowledge Graph to explode a few of our best customers’ data into a list of thousands of high-value marketing targets in just a few steps:

  1. Take a small number of existing customers.
  2. Define an Ideal Customer Profile (ICP) based on their common attributes and connections.
  3. Find every person and/or business online who matches that profile.
  4. Analyze those people as a group, and build a marketing campaign with the insights.


  1. This is not a silver bullet, and requires some critical thinking on your behalf, following this guidewill give you useful data, it wont do your marketing for you.
  2. You will need a Diffbot Knowledge Graph (DKG) account to do this. The whole technique revolves around using the vast amount of people and company data stored in the DKG, and its ability to search through their connections to get results.

Step One

Define an ideal customer profile (ICP) for a campaign based on your own customers.

Find a few examples of your best customers.

To find them, simply ask your sales team who the best customers or leads are, or run a report in your CRM to show you your top existing customers.

E.g.: Run an “All Closed Won by Revenue” or, even better, “All Closed Won by LTV” report.


That will give you the names and locations of several example people you can use to create a template to find other similar (look-alike) candidates.

For this guide, I decided to use made up existing customers by looking up some example profiles I found by searching for “People who are currently employed as ecommerce managers at companies with more than 300 employees.” You can see the query for this example below:

The query above basically filters for type=person, current employment job title = “ecommerce manager,” and their current employer has more than 300 employees. Don’t worry too much about the search query and how to make those right now; there are lots of guides and documentation during onboarding that show you how easy it is. For now, just imagine it’s like making filters in Excel or Google Sheets.

That search gives some results you can substitute in place of actual existing customers.

Step Two

Explode a few prime example customers into thousands of similar potential customers.

Once you have your existing (or made up — see above) customer profiles, you can find them in the KG with a simple query like this:

And view their information by clicking on their profile from the results:

You will quickly begin to spot commonalities between the profiles. Excuse the crude visualization, but it will look something like this:

In this example, you can see several similarities between your existing customers.

  • Job title
  • Skills
  • Experience
  • Education
  • Industries

And you can do the same with the employer’s profiles, too.

Click through to see the people and employers to compare and contrast for similarities.

In this case, the companies of the example customers I found have no less than 5,000 employees, and all use jQuery as a front-end technology. At first, that might seem irrelevant, but here comes the good bit…

You can use those common attributes to find more people just like them, to create a look-alike audience on the web scale. How?

Build a query that looks for those common attributes, like this example:

  • Skills: Digital Marketing, digital strategy, Analytics
  • Current Job titles: ecommerce
  • Past job titles: Manager
  • Locations: Major cities
  • Current Employer Company size: 5,000+
  • Current employer location city size: 100,000+
  • Current Employer Technology Used: jQuery

Hooray! That query returns 2,363 people. (at time of writting)

That is a list of all the people who are a good likeness for your Ideal Customer Profile. Perfect! Of course, you will need to check the data and remove any people who don’t meet your particular needs, but in general, you have a great dataset to start working with.

How to use that information?

Any good salesperson or marketer will know several ways to use that data to generate demand and leads from that market.

  1. You can use their social media information to reach out to them with a tweet or message.
  2. You can target ads at these people and organizations via LinkedIn, Facebook, and other platforms.
  3. Use other data enrichment tools such as Pipl to learn even more about those people.
  4. You can invite them to your events, webinars, and other engagement platforms.

But what to say to them?

In this case, we know the following about them:

  • They work in large organizations
  • in major cities
  • doing management in and around digital marketing for companies.
  • They often use jQuery and other similar front-end technologies.
  • Your existing customers’ use cases are likely to be relevant.

For Diffbot, that may well mean that we:

  • Write a “how to” blog post about how to use Diffbot to help them do something cool in marketing.
  • Sponsor and/or attend local events about digital marketing, and evangelize our Knowledge Graph in context of their needs.

However, I wanted to take it a step further and learn more about these people using the Knowledge Graph to build a better picture of the market. To do that, I started segmenting and grouping the data using some advanced Knowledge Graph features.

Bonus Step Four

Analyze the group of people who match my ICP for further insights.

Here are some basic things you can learn:

“Who are the companies that currently employ this type of person the most?”

“What are the descriptors of companies that currently employ this type of person the most?”

“What is the gender split of this type of person?”

“What is the location split of this type of person?”

Now you’re armed with Data.

Now that you are armed with the data you need, you can tailor your marketing activity to match the audience gender, location, and employer type. And don’t forget, you have a list of 2,600+ leads from earlier in the process.

Off the back of this research, we are now considering how we can target those customers with some interesting, intelligent, and high-value marketing activity — perhaps joining digital marketing and ecommerce Hackathons in those locations. Perhaps writing some API script templates in jQuery? Perhaps simply answering questions on Stack-overflow relating to marketing and ecommerce data!

Rinse and repeat for your different customer segments, and you will have all the insights you need to grow your business.

Try this technique for yourself

To try this technique for yourself, you do need access to Knowledge Graph, which you can request here. If you have any questions please leave comments below.