Articles by: Diffy

Quasi-sentient robot. Stares at web pages all day.

Converting text documents into knowledge graphs with the Diffbot Natural Language API

Most of the world’s knowledge is encoded in natural language (e.g., news articles, books, emails, academic papers). It is estimated that 80 percent of business-relevant information originates in unstructured form, primarily text. However, the ambiguous nature of human communication makes it difficult for software engineers and data scientists to leverage this information in their applications.

After years of research, we are proud to announce the Diffbot Natural Language API, a new product to help businesses convert their text documents into knowledge graphs. Knowledge graphs represent information about real-world entities (e.g., people, organizations, products, articles) via their relationships with other entities (e.g., founded by, educated at, was mentioned in). This is the same production-grade technology that we use to build the world’s largest knowledge graph from the web, and we are making it available to all.

Continue reading

Meet Diffy, The Data Extraction Robot

Diffy

As a symbol of our organization, Diffy, whose sticker is highly coveted by fans, may someday become as iconic as other company mascots. Although not always physically present for events, Diffy is a brand ambassador who is instantly recognizable (at least with the Diffbot team).

First and foremost, what are you?

Continue reading

Welcome Tomer Balan – Senior Software Engineer

Hi there!

My name is Tomer, and I’m a new Senior Software Engineer at Diffbot! I am from Tel Aviv, Israel, a city with beautiful beaches, restaurants, bars, and a lot of technology.

As far back as I can remember, I have always been passionate about identifying and solving business problems using technology. For that reason, I chose to study Industrial and Management Engineering at Ben-Gurion University.

While working as a Data Solutions Engineer at eBay, I signed up for a Diffbot trial and was amazed by the product and technology. I found Diffbot’s technology to be helpful for many use-cases in the company. By day, I used the Analyze and Product API’s, and by night, I investigated the new Knowledge Graph. Diffbot helped eBay reduce 30% of its manual work which lowered expenses and saved time. As a former Diffbot customer, I was fortunate to have been able to develop a great connection with the CEO, Mike and the rest of the team.

I am proud to be one of the first customers to join the Diffbot team. One of my main responsibilities at Diffbot is to build tools that enable non-technical users to understand our technology. I will be working on some very exciting projects so stay tuned!

In my free time, you can find me playing poker (online and offline), playing volleyball on the beach, watching European soccer, hanging out with friends and traveling (17 countries and counting!).  

I look forward to my new adventures and challenges at Diffbot!

Welcome Priya Venkateshan – Senior Software Engineer

Hi, my name is Priya, and I’m a Senior Software Engineer at Diffbot.

I grew up in Bangalore, India, and since everyone around me was getting into programming, I did too; though I secretly wanted to be in medicine or journalism. However, thanks to the professors and peers at NITK Surathkal, I discovered machine learning, got really into it, and was soon trying to learn information retrieval concepts from grainy MIT OpenCourseWare videos. Soon after, I enrolled in UC Irvine’s master’s program in Computer Science and enjoyed the wide variety of cutting-edge courses in machine learning concepts. During my time there, I worked on a thesis involving graphical models for coreference resolution.

My experience in the industry, especially on eCommerce data, has taught me about the importance of domain knowledge and scale in making machine learning models useful in a real-world context, as well as given me an interest in best practices regarding supporting data and models in production. Prior to Diffbot, I worked at eBay, where we crawled product websites to augment the eBay catalog. Diffbot was one of the tools our organization used towards this goal. Having worked on web crawling and data extraction in a specific domain, it is exciting to work on the generalized version of those problems at Diffbot, especially since I get to impact a state-of-the-art, commercial knowledge graph and work with some of the best minds focused on these problems.

I write and edit fiction in my spare time, where I use technology to make life easier for fiction writers.

Welcome Dimitris Kontokostas – Software Engineer

Hello there! I am Dimitris, and I am a new Software Engineer at Diffbot!

As the name implies, I am Greek. In particular, I am from the northern part of the mainland (far away from the beautiful beaches). I hold a degree in Electronics & Computer Engineering from the Technical University of Crete, a MSc in Web Science from Aristotle University of Thessaloniki, and a PhD on Large-scale Knowledge Extraction & Quality Assessment from the University of Leipzig. Through my MSc and PhD programs, I was exposed to knowledge graphs and felt really fascinated by the field. I have been working with knowledge graphs for almost 10 years now, and I am still amazed by the new possibilities they can unlock.

The Diffbot Knowledge Graph is really intriguing. It operates on web-scale using state of the art research on knowledge extraction and expands way beyond notable entities (e.g. celebrities, established companies, etc) that most knowledge graphs are limited to. I felt this was a perfect match for my research and professional skills and a great challenge to work on. My primary focus at Diffbot will be data quality and knowledge fusion. I will extend existing approaches and create new techniques to further improve & evaluate the accuracy of our knowledge graph.

In my free time, I enjoy spending time with family & friends, playing basketball, running and skiing.

I look forward to my new knowledge graph adventures at Diffbot!

Welcome Julia Wiedmann – Machine Learning Engineer (Research)

 

Hi there! My name is Julia and I am a Machine Learning Research Engineer at Diffbot.

Born and raised in Vienna, Austria, I moved to London to study Business. While I enjoyed my degree, I realized I was much more interested in technology, so I taught myself enough coding and math to qualify for a Masters degree in Computer Science at University College London (UCL). I recently finished my PhD at the University of Oxford, primarily developing new methods for machine learning-based web data extraction which can extract data from new pages at scale and without site-level supervision. My research interests widely overlap with Diffbot’s work, so I interned here last summer building the EventsAPI. The EventsAPI is a new addition to the Automatic API’s, which automatically extracts events from the open web. This allows events to be added to the Diffbot Knowledge Graph and customers to use the API directly. Now as an excited full timer, I am focusing on improving and expanding the Automatic API extraction methods.

In my free time I love all sorts of sports, such as cycling, skiing, hiking, and running.

I look forward to more adventures at Diffbot!

 

Welcome Lorenzo Torres – Technical Sourcer

I received my Bachelor’s Degree from University of Southern California (Fight On!) and then became a food truck driver and manager for a waffle dessert company in Pasadena, CA called Waffles de Liege. After 3 years, I felt that I wasn’t doing anything fulfilling and realized that I wanted to make an impact and difference in people’s lives. I left the industry and spoke to friends and family about the next career. They had mentioned recruiting and how it can change a person’s a life, as well as make a difference in the proper field. I went to various recruiting networking events and created my position with a start up called FabFitFun via their “create your job” submission. Impressed by my background, they hired me as an intern and then converted me full time into a Recruiting Coordinator. From there, I went to Robert Half for half a year for legal recruiting, and then a smaller tech recruiting agency called Strategic Employment Partners for Technical Sourcing. With minimal sourcing knowledge and only 7 months into the company, I received the 2017 Recruiter of the Year Award. I was then transferred to their New York branch, and once again received the 2018 Recruiter of the Year Award. I now find myself for the first time in an in house Technical Sourcer position with an awesome team, product, and company vision I can truly be behind.

Outside of work, I love to sing with friends and family via covers online and sometimes karaoke (was in choir when I was 10 years old all the way up to 19, and also trained formally), eating a lot (trying new restaurants and cooking), watching WWE/UFC/anything entertaining, play Dance Dance Revolution (used to compete), and just having a good ole time with friends and family.

I’m excited to help grow out the teams here in Diffbot, and can’t wait for what the future holds for all of us.

Welcome Paramita Mirza – Machine Learning Engineer Intern

 

Hello there! I’m Paramita (sounds similar to ‘parameter’) and I’m excited to be part of the Relation Extraction research team at Diffbot.

I received my PhD degree from the University of Trento/FBK-ICT in spring 2016. Under the supervision of Sara Tonelli, my PhD research focused on extracting temporal and causal relations between events from natural language texts, as part of the NewsReader project. I then joined Gerhard Weikum‘s group at Max Planck Institute for Informatics, where YAGO (Yet Another Great Ontology) Knowledge Base was developed, as a Postdoc. My research interests have always been revolved around Information Extraction and Machine Learning approach for it, making sense of unstructured text and building structured knowledge out of it. 

I learned more about Diffbot when I attended ISWC’18 in Monterey, and found out that they’re on their steadfastly way on building a gigantic knowledge graph out of the whole web. I got hooked and decided to temporarily escape rainy Germany to sunny California. Here at Diffbot, I’ll be working on extracting temporal qualifiers for facts extracted to build Diffbot Knowledge Graph, making sure that we can differentiate fresh facts from the historical ones. I would love to hear your ideas and suggestions on how to make that happen!

Turn Existing Customer Data into Fresh Marketing Opportunities with Knowledge Graph

I wanted to use our own tech and show that you can cross-reference your sales data with the 10+ billion entities stored in Diffbot Knowledge Graph, to find marketing opportunities with a little #KnowledgeHack. I wasn’t disappointed with what I found.

Because the Diffbot Knowledge Graph (KG) focuses on people, companies, and location data, I wanted to see how it could help me target the right people with a timely message via one of the major ad platforms like Facebook, AdWords, or LinkedIn.

This “how-to” guide shows you, step by step, how I used the Diffbot Knowledge Graph to explode a few of our best customers’ data into a list of thousands of high-value marketing targets in just a few steps:

  1. Take a small number of existing customers.
  2. Define an Ideal Customer Profile (ICP) based on their common attributes and connections.
  3. Find every person and/or business online who matches that profile.
  4. Analyze those people as a group, and build a marketing campaign with the insights.

Caveats

  1. This is not a silver bullet, and requires some critical thinking on your behalf, following this guidewill give you useful data, it wont do your marketing for you.
  2. You will need a Diffbot Knowledge Graph (DKG) account to do this. The whole technique revolves around using the vast amount of people and company data stored in the DKG, and its ability to search through their connections to get results.

Step One

Define an ideal customer profile (ICP) for a campaign based on your own customers.

Find a few examples of your best customers.

To find them, simply ask your sales team who the best customers or leads are, or run a report in your CRM to show you your top existing customers.

E.g.: Run an “All Closed Won by Revenue” or, even better, “All Closed Won by LTV” report.

 

That will give you the names and locations of several example people you can use to create a template to find other similar (look-alike) candidates.

For this guide, I decided to use made up existing customers by looking up some example profiles I found by searching for “People who are currently employed as ecommerce managers at companies with more than 300 employees.” You can see the query for this example below:

The query above basically filters for type=person, current employment job title = “ecommerce manager,” and their current employer has more than 300 employees. Don’t worry too much about the search query and how to make those right now; there are lots of guides and documentation during onboarding that show you how easy it is. For now, just imagine it’s like making filters in Excel or Google Sheets.

That search gives some results you can substitute in place of actual existing customers.

Step Two

Explode a few prime example customers into thousands of similar potential customers.

Once you have your existing (or made up — see above) customer profiles, you can find them in the KG with a simple query like this:

And view their information by clicking on their profile from the results:

You will quickly begin to spot commonalities between the profiles. Excuse the crude visualization, but it will look something like this:

In this example, you can see several similarities between your existing customers.

  • Job title
  • Skills
  • Experience
  • Education
  • Industries

And you can do the same with the employer’s profiles, too.

Click through to see the people and employers to compare and contrast for similarities.

In this case, the companies of the example customers I found have no less than 5,000 employees, and all use jQuery as a front-end technology. At first, that might seem irrelevant, but here comes the good bit…

You can use those common attributes to find more people just like them, to create a look-alike audience on the web scale. How?

Build a query that looks for those common attributes, like this example:

  • Skills: Digital Marketing, digital strategy, Analytics
  • Current Job titles: ecommerce
  • Past job titles: Manager
  • Locations: Major cities
  • Current Employer Company size: 5,000+
  • Current employer location city size: 100,000+
  • Current Employer Technology Used: jQuery

Hooray! That query returns 2,363 people. (at time of writting)

That is a list of all the people who are a good likeness for your Ideal Customer Profile. Perfect! Of course, you will need to check the data and remove any people who don’t meet your particular needs, but in general, you have a great dataset to start working with.

How to use that information?

Any good salesperson or marketer will know several ways to use that data to generate demand and leads from that market.

  1. You can use their social media information to reach out to them with a tweet or message.
  2. You can target ads at these people and organizations via LinkedIn, Facebook, and other platforms.
  3. Use other data enrichment tools such as Pipl to learn even more about those people.
  4. You can invite them to your events, webinars, and other engagement platforms.

But what to say to them?

In this case, we know the following about them:

  • They work in large organizations
  • in major cities
  • doing management in and around digital marketing for companies.
  • They often use jQuery and other similar front-end technologies.
  • Your existing customers’ use cases are likely to be relevant.

For Diffbot, that may well mean that we:

  • Write a “how to” blog post about how to use Diffbot to help them do something cool in marketing.
  • Sponsor and/or attend local events about digital marketing, and evangelize our Knowledge Graph in context of their needs.

However, I wanted to take it a step further and learn more about these people using the Knowledge Graph to build a better picture of the market. To do that, I started segmenting and grouping the data using some advanced Knowledge Graph features.

Bonus Step Four

Analyze the group of people who match my ICP for further insights.

Here are some basic things you can learn:

“Who are the companies that currently employ this type of person the most?”

“What are the descriptors of companies that currently employ this type of person the most?”

“What is the gender split of this type of person?”

“What is the location split of this type of person?”

Now you’re armed with Data.

Now that you are armed with the data you need, you can tailor your marketing activity to match the audience gender, location, and employer type. And don’t forget, you have a list of 2,600+ leads from earlier in the process.

Off the back of this research, we are now considering how we can target those customers with some interesting, intelligent, and high-value marketing activity — perhaps joining digital marketing and ecommerce Hackathons in those locations. Perhaps writing some API script templates in jQuery? Perhaps simply answering questions on Stack-overflow relating to marketing and ecommerce data!

Rinse and repeat for your different customer segments, and you will have all the insights you need to grow your business.

Try this technique for yourself

To try this technique for yourself, you do need access to Knowledge Graph, which you can request here. If you have any questions please leave comments below.

Introducing the Diffbot Knowledge Graph

Meet the largest database of human knowledge ever created: Diffbot Knowledge Graph

Diffbot is pleased to announce the launch of a new product: Diffbot Knowledge Graph.

What is the Knowledge Graph?

Eight years ago, Diffbot revolutionized web data extraction with AI data extractors (AI:X). Now, Diffbot is set to disrupt how businesses interact with data from the web again with the all-new DKG (Diffbot Knowledge Graph).

“What we’ve built is the first Knowledge Graph that organizations can use to access the full breadth of information contained on the Web. Unlocking that data and giving organizations instant access to those deep connections completely changes knowledge-based work as we know it.”

– Mike Tung, founder and CEO of Diffbot.

Unlocking knowledge from the Web

Ever wished there was a search engine that gave you answers to your questions with data, rather than a list of links to URLs?

Using our trademark combination of machine learning and computer vision the DKG is curated by AI and built for enterprize, unlocking the entire Web as a source of searchable data. The DKG is a graph database of over 10 billion connected entities (people, companies, products, articles, and discussions) covering over 1+ trillion facts!

In contrast to other solutions marketed as Knowledge Graphs, the DKG is:

  • Fully autonomous and curated using Artificial Intelligence, unlike other knowledge graphs which are only partially autonomous and largely curated through manual labor.
  • Built specifically to provide knowledge as the end product, paid for and owned by the customer. No other company makes this available to their customers, as other knowledge graphs have been built to support ad-based search engine business models.
  • Web-wide, regardless of originating language. Diffbot technology can extract, understand, and make searchable any information in French, Chinese, and Cyrillic just as easily as in English.
  • Constantly rebuilt, from scratch, which is critical to the business value of the DKG. This rebuilding process ensures that DKG data is fresh, accurate, and comprehensive.

Why?

A Web-wide, comprehensive, and interconnected knowledge graph has the power to transform how enterprises do business. In our vision of the future, human beings won’t spend time sifting through mountains of data trying to determine what’s true. AI is so much better at doing that.

Right now, 30 percent of a knowledge worker’s job is data gathering. There’s a big opportunity in the market for a horizontal knowledge graph — a database of information about people, businesses, and things. Other knowledge graphs are little more than restructured Wikipedia facts with the simplest, most narrow connections drawn between. We knew we could do better.  So we’re building the first comprehensive map of human knowledge by analyzing every page on the Internet.

Knowledge is needed for AI

The other reason we’re building the DKG is to enable the next generation of AI to understand the relationships between the entities in the world it represents. True AI needs the ability to make informed decisions based on deep understanding and knowledge of how entities and concepts are linked together.

We’ve already seen some fantastic research from universities and industry built on top of the DKG – including the particularly interesting creation of a state-of-the-art Q&A AI, which has been very impressive.

Evolution from Data to Knowledge

There is a subtle but pivotal difference between data and knowledge. While data helps many businesses, knowledge has the power to be transformative for any business.

Define “Data”:

Facts and statistics collected together for reference or analysis.

Define “Knowledge”:

Facts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject.

– Oxford Dictionary

The key to the DKG’s value is how it encompasses the whole Web, and how it joins together all the data points from many sources into individual entities, and  – importantly – how it then connects those entities together according to their relationships.

By building a practical contextual understanding of all data online, the DKG is able to answer complex questions like: “How many people with the skill “JAVA” who used to work at IBM as a junior, now work at Facebook as a senior manager?” by providing you with a number and a list of people who meet the criteria.

To access the DKG, Diffbot created a search query language called Diffbot Query Language (DQL). It’s flexible enough to let you perform granular searches to find the one exact piece of information you need out of the trillions, or to gather massive datasets for broad analysis. DQL has all the tools you need to access the world’s largest knowledge source with highly accurate, precise searches.

Ready to Use Now

Now, any business that wants instant access to all of the world’s knowledge can simply sign up for the DKG and turn the entire Web into their personal database for business intelligence across:

  • People: skills, employment history, education, social profiles
  • Companies: rich profiles of companies and the workforce globally, from Fortune 500 to SMBs
  • Locations: mapping data, addresses, business types, zoning information
  • Articles: every news article, dateline, byline from anywhere on the Web, in any language
  • Products: pricing, specifications, and, reviews for every SKU across major ecommerce engines and individual retailers
  • Discussions: chats, social sharing, and conversations everywhere from article comments to web forums like Reddit
  • Images: billions of images on the web organized using image recognition and metadata collection

Want to learn more about the Diffbot Knowledge Graph?


Knowledge Graph in the Press