Welcome Huzail Ssemakula – Technical Recruiter

Hello everyone,

My name is Huzail Ssemakula (pronounced who-zail semakula). I will be joining Diffbot as a technical recruiter on our amazing recruiting team.

My recruiting journey started at Amazon where I helped grow several teams such as Amazon Prime and the Alexa technologies. Since then, my career took me to Google and Cruise Automation where I focused on working with teams solving Machine Learning problems. I am very impressed with what Diffbot has achieved with such a small group of talented people. I look forward to helping the company achieve its mission, by bringing in more talented individuals to the Diffbot family.

On a personal note, I enjoy spending time with my family doing outside activities. We live in the Seattle area surrounded by nature and parks. I’m also huge on sports, soccer/futbol being my number one. Fun facts; I’m left handed, born and raised in Uganda, and I can speak several languages.

I can’t wait to meet you all in person once the world is back in order 🙂

Read More

Welcome Ariadne Caldwell – Executive Assistant to Diffbot’s CEO and Founder, Mike Tung

Hi everyone, I’m Ariadne Caldwell. Recently, I joined Diffbot as the Executive Assistant to CEO, Mike Tung. For the past five years, I have supported C-Level and high profile Executives across industries such as SaaS, Real Estate and Food & Hospitality. I’m passionate and enthusiastic in helping support teams who solve complex problems with industry leading solutions.

I love working on special projects and company initiatives. In my previous roles I have led social media strategy, creation and execution of a podcast, managed recruitment processes, edited and produced videos, designed brand collateral, and other tasks that go outside of the typical Executive Assistant scope of work. 

My goals are to provide proactive and strategic administrative support across the organization. I believe relationship building is key to forming an inclusive and welcoming company culture.

Born and raised in the Bay Area, I am a San Francisco State University graduate with a Bachelor of Science in Business Administration – International Business. 

I enjoy traveling, scuba diving, writing, reading, and spending time with my family. 

Very excited to be a new member of the Diffbot team! 

Read More

Welcome Ondrej Pacovsky – Machine Learning Engineer

Hi there! I am Ondřej Pacovský, from the mighty Czech Republic. I had just started as a Senior Machine Learning Engineer in the research group.

A little bit about myself – I made my first cash writing software when I was 14 and decided to focus my endeavors making computers smarter, that is, artificial intelligence. After graduating from Charles University and Sussex University, I started working in game development as an AI expert and also a lead developer on the side. I then joined Google, and worked on various machine learning projects, most notably Gmail Priority Inbox and the Google Knowledge Graph. I returned to Prague to co-found Eyen, a company specializing in cryo electron microscopy data analysis and special-purpose GPU development.

With Eyen working beautifully on its own, I was looking for a meaningful opportunity to push the boundaries of AI. The Diffbot’s mission fits that goal perfectly – in fact, what we’re doing here was my initial dream when I was thinking about intelligent computers as a boy: a machine that learns about the world just by observing it.

I greatly enjoy teaching my own small biological brains to be smarter than me. I play ice hockey, soccer, tennis, squash and have climbed a 6000m mountain. I also developed and installed my own smart home and enjoy making wood furniture.

 

Read More

Welcome Steve Peterson – Enterprise Account Executive

Hi everyone, I’m Steve Peterson. I’ve just joined Aron’s team to help with Enterprise Sales to Global 2000 accounts.  Having had some experience with Big Data and Analytics tools at Oracle and IBM, I’m excited to join a company with a vision to distill the entire Web into a structured format easily incorporated by businesses into their Data pipelines. This is a very exciting place to be!  I’m working on helping Diffbot become an essential piece of the data landscape for Billion dollar companies in areas like PR, Marketing, Sales and even areas like M&A.  Most companies never heard of zoom.info 3 years ago, now it’s table-stakes for most Sales orgs. I see Diffbot with a similar future, but in many parts of the Enterprise Data Stack.

I’m based in the north part of Phoenix, just past the first set of mountains heading to Flagstaff.  My wife, Cari, and I have 5 kids between us, and 5 grandkids, with one on the way (we’re thankful she had some kids early in life!).  We enjoy concerts, the pool, camping and boating.  I take my home office buddy Dallas (she’s the sweetest dog you’ve ever met — a grey-pit bull) with me on desert walks almost every morning.

 

Read More

KnowledgeNet: A Benchmark for Knowledge Base Population

EMNLP 2019 paper, datasetleaderboard and code

Knowledge bases (also known as knowledge graphs or ontologies) are valuable resources for developing intelligence applications, including search, question answering, and recommendation systems. However, high-quality knowledge bases still mostly rely on structured data curated by humans. Such reliance on human curation is a major obstacle to the creation of comprehensive, always-up-to-date knowledge bases such as the Diffbot Knowledge Graph.

The problem of automatically augmenting a knowledge base with facts expressed in natural language is known as Knowledge Base Population (KBP). This problem has been extensively studied in the last couple of decades; however, progress has been slow in part because of the lack of benchmark datasets. 

 

Knowledge Base Population (KBP) is the problem of automatically augmenting a knowledge base with facts expressed in natural language.

 

KnowledgeNet is a benchmark dataset for populating Wikidata with facts expressed in natural language on the web. Facts are of the form (subject; property; object), where subject and object are linked to Wikidata. For instance, the dataset contains text expressing the fact (Gennaro Basile; RESIDENCE; Moravia), in the passage:

“Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries. He settled at Brunn, in Moravia, and lived about 1756…”

KBP has been mainly evaluated via annual contests promoted by TAC. TAC evaluations are performed manually and are hard to reproduce for new systems. Unlike TAC, KnowledgeNet employs an automated and reproducible way to evaluate KBP systems at any time, rather than once a year. We hope a faster evaluation cycle will accelerate the rate of improvement for KBP.

Please refer to our EMNLP 2019 Paper for details on KnowlegeNet, but here are some takeaways:

  • State-of-the-art models (using BERT) are far from achieving human performance (0.504 vs 0.822).
  • The traditional pipeline approach for this problem is severely limited by error propagation.
  • KnowledgeNet enables the development of end-to-end systems, which are a promising solution for addressing error propagation.

Read More

The State of Donald Trump’s Media

As we hurdle towards the end of 2019 and, just as inevitably, another election cycle here in the US, we decided to task Diffy with a special mission using the Diffbot Knowledge Graph: analyzing our global obsession with President Donald Trump.

While the most important takeaway is almost certainly that President Trump gets plenty of headlines, the results of our analysis are no less newsworthy in their own right.

After pouring through more than 158 million stories published globally in 2019, we discovered:

  • China was, by far, the largest distributor of news in 2019, producing nearly 13.5 million stories this year (in total, not all about Donald Trump), with the United States media a “close” second at just over 10.6 million stories.
  • Hong Kong dominated the news in China, while Donald Trump captured the most US headlines and, surprisingly, the most Russian headlines as well (surpassing even Vladamir Putin)
  • Germany shared the US obsession with Trump, writing more than 90k stories about the President in 2019
  • Ukraine, Impeachment, and Joe Biden all shared the most stories with Trump in 2019
  • President Trump enjoyed more than 15x the media coverage of his nearest Democratic opponent.

Check out the full report below and share.

 

Read More

Can I Access All Google Knowledge Graph Data Through the Google Knowledge Graph Search API?

The Google Knowledge Graph is one of the most recognizable sources of contextually-linked facts on people, books, organizations, events, and more. 

Access to all of this information — including how each knowledge graph entity is linked — could be a boon to many services and applications. On this front Google has developed the Knowledge Graph Search API.

While at first glance this may seem to be your golden ticket to Google’s Knowledge Graph data, think again. 
(more…)

Read More

Analyzing the EU – Data, AI, and Development Skills Report 2019

 

Given how popular our 2019 Machine Learning report turned out to be with our community, we wanted to revisit the question both with a more specific geography and a broader set of questions.

For this report, we focused on the EU. With Brexit still looming large, we took a look at the EU (Britain included) to see what the breakdown of AI-related skills looked like in the Union: who has the most talent? Who produces the most talent per capita? Which countries have the most equitable gender split?

Click through the full report below to find out more…

 

Read More

Turn Existing Customer Data into Fresh Marketing Opportunities with Knowledge Graph

I wanted to use our own tech and show that you can cross-reference your sales data with the 10+ billion entities stored in Diffbot Knowledge Graph, to find marketing opportunities with a little #KnowledgeHack. I wasn’t disappointed with what I found.

Because the Diffbot Knowledge Graph (KG) focuses on people, companies, and location data, I wanted to see how it could help me target the right people with a timely message via one of the major ad platforms like Facebook, AdWords, or LinkedIn.

This “how-to” guide shows you, step by step, how I used the Diffbot Knowledge Graph to explode a few of our best customers’ data into a list of thousands of high-value marketing targets in just a few steps:

  1. Take a small number of existing customers.
  2. Define an Ideal Customer Profile (ICP) based on their common attributes and connections.
  3. Find every person and/or business online who matches that profile.
  4. Analyze those people as a group, and build a marketing campaign with the insights.

Caveats

  1. This is not a silver bullet, and requires some critical thinking on your behalf, following this guidewill give you useful data, it wont do your marketing for you.
  2. You will need a Diffbot Knowledge Graph (DKG) account to do this. The whole technique revolves around using the vast amount of people and company data stored in the DKG, and its ability to search through their connections to get results.

Step One

Define an ideal customer profile (ICP) for a campaign based on your own customers.

Find a few examples of your best customers.

To find them, simply ask your sales team who the best customers or leads are, or run a report in your CRM to show you your top existing customers.

E.g.: Run an “All Closed Won by Revenue” or, even better, “All Closed Won by LTV” report.

 

That will give you the names and locations of several example people you can use to create a template to find other similar (look-alike) candidates.

For this guide, I decided to use made up existing customers by looking up some example profiles I found by searching for “People who are currently employed as ecommerce managers at companies with more than 300 employees.” You can see the query for this example below:

The query above basically filters for type=person, current employment job title = “ecommerce manager,” and their current employer has more than 300 employees. Don’t worry too much about the search query and how to make those right now; there are lots of guides and documentation during onboarding that show you how easy it is. For now, just imagine it’s like making filters in Excel or Google Sheets.

That search gives some results you can substitute in place of actual existing customers.

Step Two

Explode a few prime example customers into thousands of similar potential customers.

Once you have your existing (or made up — see above) customer profiles, you can find them in the KG with a simple query like this:

And view their information by clicking on their profile from the results:

You will quickly begin to spot commonalities between the profiles. Excuse the crude visualization, but it will look something like this:

In this example, you can see several similarities between your existing customers.

  • Job title
  • Skills
  • Experience
  • Education
  • Industries

And you can do the same with the employer’s profiles, too.

Click through to see the people and employers to compare and contrast for similarities.

In this case, the companies of the example customers I found have no less than 5,000 employees, and all use jQuery as a front-end technology. At first, that might seem irrelevant, but here comes the good bit…

You can use those common attributes to find more people just like them, to create a look-alike audience on the web scale. How?

Build a query that looks for those common attributes, like this example:

  • Skills: Digital Marketing, digital strategy, Analytics
  • Current Job titles: ecommerce
  • Past job titles: Manager
  • Locations: Major cities
  • Current Employer Company size: 5,000+
  • Current employer location city size: 100,000+
  • Current Employer Technology Used: jQuery

Hooray! That query returns 2,363 people. (at time of writting)

That is a list of all the people who are a good likeness for your Ideal Customer Profile. Perfect! Of course, you will need to check the data and remove any people who don’t meet your particular needs, but in general, you have a great dataset to start working with.

How to use that information?

Any good salesperson or marketer will know several ways to use that data to generate demand and leads from that market.

  1. You can use their social media information to reach out to them with a tweet or message.
  2. You can target ads at these people and organizations via LinkedIn, Facebook, and other platforms.
  3. Use other data enrichment tools such as Pipl to learn even more about those people.
  4. You can invite them to your events, webinars, and other engagement platforms.

But what to say to them?

In this case, we know the following about them:

  • They work in large organizations
  • in major cities
  • doing management in and around digital marketing for companies.
  • They often use jQuery and other similar front-end technologies.
  • Your existing customers’ use cases are likely to be relevant.

For Diffbot, that may well mean that we:

  • Write a “how to” blog post about how to use Diffbot to help them do something cool in marketing.
  • Sponsor and/or attend local events about digital marketing, and evangelize our Knowledge Graph in context of their needs.

However, I wanted to take it a step further and learn more about these people using the Knowledge Graph to build a better picture of the market. To do that, I started segmenting and grouping the data using some advanced Knowledge Graph features.

Bonus Step Four

Analyze the group of people who match my ICP for further insights.

Here are some basic things you can learn:

“Who are the companies that currently employ this type of person the most?”

“What are the descriptors of companies that currently employ this type of person the most?”

“What is the gender split of this type of person?”

“What is the location split of this type of person?”

Now you’re armed with Data.

Now that you are armed with the data you need, you can tailor your marketing activity to match the audience gender, location, and employer type. And don’t forget, you have a list of 2,600+ leads from earlier in the process.

Off the back of this research, we are now considering how we can target those customers with some interesting, intelligent, and high-value marketing activity — perhaps joining digital marketing and ecommerce Hackathons in those locations. Perhaps writing some API script templates in jQuery? Perhaps simply answering questions on Stack-overflow relating to marketing and ecommerce data!

Rinse and repeat for your different customer segments, and you will have all the insights you need to grow your business.

Try this technique for yourself

To try this technique for yourself, you do need access to Knowledge Graph, which you can request here. If you have any questions please leave comments below.

Read More