Turn Existing Customer Data into Fresh Marketing Opportunities with Knowledge Graph

I wanted to use our own tech and show that you can cross-reference your sales data with the 10+ billion entities stored in Diffbot Knowledge Graph, to find marketing opportunities with a little #KnowledgeHack. I wasn’t disappointed with what I found.

Because the Diffbot Knowledge Graph (KG) focuses on people, companies, and location data, I wanted to see how it could help me target the right people with a timely message via one of the major ad platforms like Facebook, AdWords, or LinkedIn.

This “how-to” guide shows you, step by step, how I used the Diffbot Knowledge Graph to explode a few of our best customers’ data into a list of thousands of high-value marketing targets in just a few steps:

  1. Take a small number of existing customers.
  2. Define an Ideal Customer Profile (ICP) based on their common attributes and connections.
  3. Find every person and/or business online who matches that profile.
  4. Analyze those people as a group, and build a marketing campaign with the insights.

Caveats

  1. This is not a silver bullet, and requires some critical thinking on your behalf, following this guidewill give you useful data, it wont do your marketing for you.
  2. You will need a Diffbot Knowledge Graph (DKG) account to do this. The whole technique revolves around using the vast amount of people and company data stored in the DKG, and its ability to search through their connections to get results.

Step One

Define an ideal customer profile (ICP) for a campaign based on your own customers.

Find a few examples of your best customers.

To find them, simply ask your sales team who the best customers or leads are, or run a report in your CRM to show you your top existing customers.

E.g.: Run an “All Closed Won by Revenue” or, even better, “All Closed Won by LTV” report.

 

That will give you the names and locations of several example people you can use to create a template to find other similar (look-alike) candidates.

For this guide, I decided to use made up existing customers by looking up some example profiles I found by searching for “People who are currently employed as ecommerce managers at companies with more than 300 employees.” You can see the query for this example below:

The query above basically filters for type=person, current employment job title = “ecommerce manager,” and their current employer has more than 300 employees. Don’t worry too much about the search query and how to make those right now; there are lots of guides and documentation during onboarding that show you how easy it is. For now, just imagine it’s like making filters in Excel or Google Sheets.

That search gives some results you can substitute in place of actual existing customers.

Step Two

Explode a few prime example customers into thousands of similar potential customers.

Once you have your existing (or made up — see above) customer profiles, you can find them in the KG with a simple query like this:

And view their information by clicking on their profile from the results:

You will quickly begin to spot commonalities between the profiles. Excuse the crude visualization, but it will look something like this:

In this example, you can see several similarities between your existing customers.

  • Job title
  • Skills
  • Experience
  • Education
  • Industries

And you can do the same with the employer’s profiles, too.

Click through to see the people and employers to compare and contrast for similarities.

In this case, the companies of the example customers I found have no less than 5,000 employees, and all use jQuery as a front-end technology. At first, that might seem irrelevant, but here comes the good bit…

You can use those common attributes to find more people just like them, to create a look-alike audience on the web scale. How?

Build a query that looks for those common attributes, like this example:

  • Skills: Digital Marketing, digital strategy, Analytics
  • Current Job titles: ecommerce
  • Past job titles: Manager
  • Locations: Major cities
  • Current Employer Company size: 5,000+
  • Current employer location city size: 100,000+
  • Current Employer Technology Used: jQuery

Hooray! That query returns 2,363 people. (at time of writting)

That is a list of all the people who are a good likeness for your Ideal Customer Profile. Perfect! Of course, you will need to check the data and remove any people who don’t meet your particular needs, but in general, you have a great dataset to start working with.

How to use that information?

Any good salesperson or marketer will know several ways to use that data to generate demand and leads from that market.

  1. You can use their social media information to reach out to them with a tweet or message.
  2. You can target ads at these people and organizations via LinkedIn, Facebook, and other platforms.
  3. Use other data enrichment tools such as Pipl to learn even more about those people.
  4. You can invite them to your events, webinars, and other engagement platforms.

But what to say to them?

In this case, we know the following about them:

  • They work in large organizations
  • in major cities
  • doing management in and around digital marketing for companies.
  • They often use jQuery and other similar front-end technologies.
  • Your existing customers’ use cases are likely to be relevant.

For Diffbot, that may well mean that we:

  • Write a “how to” blog post about how to use Diffbot to help them do something cool in marketing.
  • Sponsor and/or attend local events about digital marketing, and evangelize our Knowledge Graph in context of their needs.

However, I wanted to take it a step further and learn more about these people using the Knowledge Graph to build a better picture of the market. To do that, I started segmenting and grouping the data using some advanced Knowledge Graph features.

Bonus Step Four

Analyze the group of people who match my ICP for further insights.

Here are some basic things you can learn:

“Who are the companies that currently employ this type of person the most?”

“What are the descriptors of companies that currently employ this type of person the most?”

“What is the gender split of this type of person?”

“What is the location split of this type of person?”

Now you’re armed with Data.

Now that you are armed with the data you need, you can tailor your marketing activity to match the audience gender, location, and employer type. And don’t forget, you have a list of 2,600+ leads from earlier in the process.

Off the back of this research, we are now considering how we can target those customers with some interesting, intelligent, and high-value marketing activity — perhaps joining digital marketing and ecommerce Hackathons in those locations. Perhaps writing some API script templates in jQuery? Perhaps simply answering questions on Stack-overflow relating to marketing and ecommerce data!

Rinse and repeat for your different customer segments, and you will have all the insights you need to grow your business.

Try this technique for yourself

To try this technique for yourself, you do need access to Knowledge Graph, which you can request here. If you have any questions please leave comments below.

Read More

What’s the Difference Between Web Scraping and Diffbot?

Web scraping is one of the best techniques for extracting important data from websites to use in your business or applications, but not all data is created equal and not all web scraping tools can get you the data you need.

Collecting data from the web isn’t necessarily the hard part. Web scraping techniques utilize web crawlers, which are essentially just programs or automated scripts that collect various bits of data from different sources.

Any developer can build a relatively simple web scraper for their own use, and there are certainly companies out there that have their own web crawlers to gather data for them (Amazon is a big one).

But the web scraping process isn’t always straightforward, and there are many considerations that cause scapers to break or become less efficient. So while there are plenty of web crawlers out there that can get you some of the data you need, not all can produce results.

Here’s what you need to know.

Don’t Miss: 9 Things Diffbot Does That Others Don’t

Getting Enough (of the Right) Data

There are actually plenty of ways you can get data from the web without using a web crawler. For instance, many sites have official APIs that will pull data for you. For example, Twitter has one here. If you wanted to know how many people were mentioning you on Twitter, you could use the API to gather that data without too much effort.

The problem, however, is that your options when using site-specific API are somewhat limited; you can only get information from one site at a time, and some APIs (like Twitter) are rate limited, meaning that you have to pay fees to access more information.

In order to make data useful, you need a lot of it. That’s where more generic web crawlers come in handy; they can be programmed to pull data from numerous sites (hundreds, thousands, even millions) if you know what data you’re looking for.

The key is that you have to know what data you’re looking for. Your average web crawler can pull data, but it can’t always give you structured data.

If you were looking to pull news articles or blog posts from multiple websites, for example, any web scraper could pull that content for you. But it would also pull ads, navigation, and a variety of other data you don’t want. It would then be your job to sort through that data for the content you do want.

If you want to pull the most accurate data, what you really need is a tool that can extract clean text from news articles and blog posts without extraneous data in the mix.

This is precisely why Diffbot has tools like our Article API (which does the above) as well as a variety of other specific APIs (like Product, Video, and Image and Page extraction) that can get you the right data from hundreds of thousands of websites automatically with zero configuration.

How Structure Affects Your Outcome

You also have to worry about the quality of the data you’re getting, especially if you’re trying to extract a lot of it from hundreds or thousands of sources.

Apps, programs and even analysis tools – or anything you would be feeding data to – for the most part rely on highly structured data, which means that the way your data is delivered is important.

Web crawlers can pull data from the web, but not all of them can give you structured data, or at least high-quality structured data.

Think of it like this: You could go to a website, find a table of information that’s relevant to your needs, and then copy it and paste it into an Excel file. It’s a time-consuming process, which a web scraper could handle for you en masse, and much faster than you could do it by hand.

But what it can’t do is handle websites that don’t already have that information formatted perfectly, like sites with badly formatted HTML code with little to no underlying structure, for example.

Sites with CAPTCHA codes, pay walls, or other authentication systems may be difficult to pull data from with a simple scraper. Session-based sites that track users with cookies, those that have server admins that block access to non-servers, or those that have a lack of complete item listings or poor search features can all wreak havoc when it comes to getting well-organized data.

While a simple web crawler can give you structured data, it can’t handle complexities or abnormalities that pop up when browsing thousands of sites at once. This means that no matter how powerful it is you’re still not getting all the data you could possibly get.

That’s why Diffbot works so well; we’re built for complexities.

 

Our APIs can be tweaked for complicated scenarios, and we have several other features, like entity tagging that can find the right data sources from poorly structured sites.

We offer proxying for difficult-to-reach sites that block traditional crawlers, as well as automatic ban detection and automatic retries, making it easier to get data from difficult sites. Our infrastructure is based on gigablast, which we’ve open sourced.

Why Simple Crawlers Aren’t Enough

There are many other issues with your average web crawler as well, including things like maintenance and stale data.

You can design a web crawler for specific purposes, like pulling clean text from a single blog or pulling product listings from an ecommerce site. But in order to get the sheer amount of data you need, you have to run your crawler multiple times, across thousands or more sites, and you have to adjust for every complex site as needed.

This can work fine for smaller operations, like if you wanted to crawl your own ecommerce site to generate a product database, for instance.

If you wanted to do this on multiple sites, or even on a single site as large as Amazon (which boasts nearly 500 million products and rising), you would have to run your crawler every minute of every day across multiple clusters of servers in order to get any fresh, usable data.

Should your crawler break, encounter a site that it can’t handle, or simply need an update to gather new data (or maybe you’re using multiple crawlers to gather different types of data), you’re facing countless hours of upkeep and coding.

That’s one of the biggest things that separates Diffbot from your average web scraping: we do the grunt work for you. Our programs are quick, easy to use (any developer can run a complex crawl in a matter of seconds).

As we said, any developer can build a web scraper. That’s not really the problem. The problem is that not every developer can (or should) spend most of their time running, operating, and optimizing a crawler. There are endless important tasks that developers are paid to do, and babysitting web data shouldn’t be one of them.

Here’s a rundown of what makes Diffbot so different and why it matters to you

Final Thoughts

There are certainly instances where a basic web scraper will get the job done, and not every company needs something robust to gather the data they need.

However, knowing that the more data you have (especially if that data is fresh, well-structured and contains the information you want) the better your results will be, there is something to be said for having a third party vendor on your side.

And just because you can build a web crawler doesn’t mean you should have to. Developers work hard building complex programs and apps for businesses, and they should focus on their craft instead of spending energy scraping the web.

Let me tell you from personal experience, writing and maintaining a web scraper is the bane of most developer’s existence. Now no one is forced to draw the short straw.

That’s why Diffbot exists.

Read More

Video: Crawling Basics and Advanced Techniques for Web Site Data Extraction

Just for the visual and auditory learners — and/or those of you who prefer their web crawling with the dulcet tones of yours truly — a couple of Crawlbot tutorials to help you get up and running:

Crawlbot Basics

A quick overview of Crawlbot using the Analyze API to automatically identify and extract products from an e-commerce site.

Advanced Usage

This tutorial discusses some of the methods for narrowing your crawl within a site, and setting up a repeat or recurring crawl.

Related links:

Read More

Various Ways to Control Your Crawlbot Crawls for Web Data

In 2013 we welcomed Matt Wells, founder of Gigablast (and henceforth known as our grand search poobah) aboard to head up our burgeoning crawl and search infrastructure. Since then we’ve released Crawlbot 2.0, our Bulk Service/Bulk API, and our Search API — and are hard at work on more exciting stuff.

Crawlbot 2.0 included a number of ways to control which parts of sites are spidered, both to improve performance and to make sure only specific data is returned in some cases. Here’s a quick overview of the various ways to control Crawlbot.

(more…)

Read More

Article API: Returning Clean and Consistent HTML

We’ve long offered HTML as a response element in our Article API (as an alternative to our plain-text text field). This is useful for maintaining inline images, text formatting, external links, etc.

Until recently, the HTML we returned was a direct copy of the underlying source, warts and all — which, if you work with web markup, you’ll know tilts heavily toward the “warts” side. Now though, as many of our long-waiting customers have started to see, our html field is now returning normalized markup according to our new HTML Specification.

(more…)

Read More