Converting text documents into knowledge graphs with the Diffbot Natural Language API

Most of the world’s knowledge is encoded in natural language (e.g., news articles, books, emails, academic papers). It is estimated that 80 percent of business-relevant information originates in unstructured form, primarily text. However, the ambiguous nature of human communication makes it difficult for software engineers and data scientists to leverage this information in their applications.

After years of research, we are proud to announce the Diffbot Natural Language API, a new product to help businesses convert their text documents into knowledge graphs. Knowledge graphs represent information about real-world entities (e.g., people, organizations, products, articles) via their relationships with other entities (e.g., founded by, educated at, was mentioned in). This is the same production-grade technology that we use to build the world’s largest knowledge graph from the web, and we are making it available to all.

(more…)

Read More

Welcome Huzail Ssemakula – Technical Recruiter

Hello everyone,

My name is Huzail Ssemakula (pronounced who-zail semakula). I will be joining Diffbot as a technical recruiter on our amazing recruiting team.

My recruiting journey started at Amazon where I helped grow several teams such as Amazon Prime and the Alexa technologies. Since then, my career took me to Google and Cruise Automation where I focused on working with teams solving Machine Learning problems. I am very impressed with what Diffbot has achieved with such a small group of talented people. I look forward to helping the company achieve its mission, by bringing in more talented individuals to the Diffbot family.

On a personal note, I enjoy spending time with my family doing outside activities. We live in the Seattle area surrounded by nature and parks. I’m also huge on sports, soccer/futbol being my number one. Fun facts; I’m left handed, born and raised in Uganda, and I can speak several languages.

I can’t wait to meet you all in person once the world is back in order 🙂

Read More

Welcome Ariadne Caldwell – Executive Assistant to Diffbot’s CEO and Founder, Mike Tung

Hi everyone, I’m Ariadne Caldwell. Recently, I joined Diffbot as the Executive Assistant to CEO, Mike Tung. For the past five years, I have supported C-Level and high profile Executives across industries such as SaaS, Real Estate and Food & Hospitality. I’m passionate and enthusiastic in helping support teams who solve complex problems with industry leading solutions.

I love working on special projects and company initiatives. In my previous roles I have led social media strategy, creation and execution of a podcast, managed recruitment processes, edited and produced videos, designed brand collateral, and other tasks that go outside of the typical Executive Assistant scope of work. 

My goals are to provide proactive and strategic administrative support across the organization. I believe relationship building is key to forming an inclusive and welcoming company culture.

Born and raised in the Bay Area, I am a San Francisco State University graduate with a Bachelor of Science in Business Administration – International Business. 

I enjoy traveling, scuba diving, writing, reading, and spending time with my family. 

Very excited to be a new member of the Diffbot team! 

Read More

Welcome Ondrej Pacovsky – Machine Learning Engineer

Hi there! I am Ondřej Pacovský, from the mighty Czech Republic. I had just started as a Senior Machine Learning Engineer in the research group.

A little bit about myself – I made my first cash writing software when I was 14 and decided to focus my endeavors making computers smarter, that is, artificial intelligence. After graduating from Charles University and Sussex University, I started working in game development as an AI expert and also a lead developer on the side. I then joined Google, and worked on various machine learning projects, most notably Gmail Priority Inbox and the Google Knowledge Graph. I returned to Prague to co-found Eyen, a company specializing in cryo electron microscopy data analysis and special-purpose GPU development.

With Eyen working beautifully on its own, I was looking for a meaningful opportunity to push the boundaries of AI. The Diffbot’s mission fits that goal perfectly – in fact, what we’re doing here was my initial dream when I was thinking about intelligent computers as a boy: a machine that learns about the world just by observing it.

I greatly enjoy teaching my own small biological brains to be smarter than me. I play ice hockey, soccer, tennis, squash and have climbed a 6000m mountain. I also developed and installed my own smart home and enjoy making wood furniture.

 

Read More

Welcome Steve Peterson – Enterprise Account Executive

Hi everyone, I’m Steve Peterson. I’ve just joined Aron’s team to help with Enterprise Sales to Global 2000 accounts.  Having had some experience with Big Data and Analytics tools at Oracle and IBM, I’m excited to join a company with a vision to distill the entire Web into a structured format easily incorporated by businesses into their Data pipelines. This is a very exciting place to be!  I’m working on helping Diffbot become an essential piece of the data landscape for Billion dollar companies in areas like PR, Marketing, Sales and even areas like M&A.  Most companies never heard of zoom.info 3 years ago, now it’s table-stakes for most Sales orgs. I see Diffbot with a similar future, but in many parts of the Enterprise Data Stack.

I’m based in the north part of Phoenix, just past the first set of mountains heading to Flagstaff.  My wife, Cari, and I have 5 kids between us, and 5 grandkids, with one on the way (we’re thankful she had some kids early in life!).  We enjoy concerts, the pool, camping and boating.  I take my home office buddy Dallas (she’s the sweetest dog you’ve ever met — a grey-pit bull) with me on desert walks almost every morning.

 

Read More

KnowledgeNet: A Benchmark for Knowledge Base Population

EMNLP 2019 paper, datasetleaderboard and code

Knowledge bases (also known as knowledge graphs or ontologies) are valuable resources for developing intelligence applications, including search, question answering, and recommendation systems. However, high-quality knowledge bases still mostly rely on structured data curated by humans. Such reliance on human curation is a major obstacle to the creation of comprehensive, always-up-to-date knowledge bases such as the Diffbot Knowledge Graph.

The problem of automatically augmenting a knowledge base with facts expressed in natural language is known as Knowledge Base Population (KBP). This problem has been extensively studied in the last couple of decades; however, progress has been slow in part because of the lack of benchmark datasets. 

 

Knowledge Base Population (KBP) is the problem of automatically augmenting a knowledge base with facts expressed in natural language.

 

KnowledgeNet is a benchmark dataset for populating Wikidata with facts expressed in natural language on the web. Facts are of the form (subject; property; object), where subject and object are linked to Wikidata. For instance, the dataset contains text expressing the fact (Gennaro Basile; RESIDENCE; Moravia), in the passage:

“Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries. He settled at Brunn, in Moravia, and lived about 1756…”

KBP has been mainly evaluated via annual contests promoted by TAC. TAC evaluations are performed manually and are hard to reproduce for new systems. Unlike TAC, KnowledgeNet employs an automated and reproducible way to evaluate KBP systems at any time, rather than once a year. We hope a faster evaluation cycle will accelerate the rate of improvement for KBP.

Please refer to our EMNLP 2019 Paper for details on KnowlegeNet, but here are some takeaways:

  • State-of-the-art models (using BERT) are far from achieving human performance (0.504 vs 0.822).
  • The traditional pipeline approach for this problem is severely limited by error propagation.
  • KnowledgeNet enables the development of end-to-end systems, which are a promising solution for addressing error propagation.

Read More

The State of Donald Trump’s Media

As we hurdle towards the end of 2019 and, just as inevitably, another election cycle here in the US, we decided to task Diffy with a special mission using the Diffbot Knowledge Graph: analyzing our global obsession with President Donald Trump.

While the most important takeaway is almost certainly that President Trump gets plenty of headlines, the results of our analysis are no less newsworthy in their own right.

After pouring through more than 158 million stories published globally in 2019, we discovered:

  • China was, by far, the largest distributor of news in 2019, producing nearly 13.5 million stories this year (in total, not all about Donald Trump), with the United States media a “close” second at just over 10.6 million stories.
  • Hong Kong dominated the news in China, while Donald Trump captured the most US headlines and, surprisingly, the most Russian headlines as well (surpassing even Vladamir Putin)
  • Germany shared the US obsession with Trump, writing more than 90k stories about the President in 2019
  • Ukraine, Impeachment, and Joe Biden all shared the most stories with Trump in 2019
  • President Trump enjoyed more than 15x the media coverage of his nearest Democratic opponent.

Check out the full report below and share.

 

Read More

Can I Access All Google Knowledge Graph Data Through the Google Knowledge Graph Search API?

The Google Knowledge Graph is one of the most recognizable sources of contextually-linked facts on people, books, organizations, events, and more. 

Access to all of this information — including how each knowledge graph entity is linked — could be a boon to many services and applications. On this front Google has developed the Knowledge Graph Search API.

While at first glance this may seem to be your golden ticket to Google’s Knowledge Graph data, think again. 
(more…)

Read More

Analyzing the EU – Data, AI, and Development Skills Report 2019

 

Given how popular our 2019 Machine Learning report turned out to be with our community, we wanted to revisit the question both with a more specific geography and a broader set of questions.

For this report, we focused on the EU. With Brexit still looming large, we took a look at the EU (Britain included) to see what the breakdown of AI-related skills looked like in the Union: who has the most talent? Who produces the most talent per capita? Which countries have the most equitable gender split?

Click through the full report below to find out more…

 

Read More