How Employbl Saved 250 Hours Building Their Career-Matching Database

We started with about 1,000 companies in the Employbl database, mostly in the Bay Area. Now with Diffbot we can expand to other cities and add thousands of additional companies.
Connor Leech – CEO @Employbl

Fixing tech starts with hiring. And fixing hiring is an information problem. That’s what Connor Leech, cofounder and CEO at Employbl discovered when creating a new talent marketplace meant to connect tech employees with the information-rich hiring marketplace they deserve.

Tech job seekers rely on a range of metrics to gauge the opportunity and stability of a potential employer.

While information like funding rounds, founders, team size, industry, and investors are often public, it can be hard to grab the myriad fields candidates value in a up-to-date format from around the web.

These difficulties are amplified by the fact that many tech startups are often “long tail” entities that also regularly change.

To solve these issues, Employbl partnered with Diffbot to gain web-wide firmographic coverage for brand new and established orgs.

Diffbot’s Firmographic Data Toolkit

Data on hundreds of millions of companies
Linked data including top management, employees, skills, news mentions
Robust filtering by over 20 fields (industry codes, location, subsidiaries, founding date, employees, market cap, descriptions, many others)
Graphable relationships
Access within our Knowledge Graph dashboard as well as integrations in Excel, Google Sheets, and Zapier (for person data)
Data enrichment and search capabilities

The Problem

Employbl started with a good deal of information. Their database contained info meant for job seekers on over 1,000 tech companies . The issue was that there was no central location to update this data from. If they continued manually, valuable time for an early stage startup would be eaten up. They’re index likely wouldn’t get larger as too much time would be spent just keeping it up to date. If they continued scraping, the range of sites they needed information from would lead to a ton of scraper maintenance (and each scraper would need to be tailored to each site).

In short, Employbl needed web-wide coverage for large and small tech companies that updated when company sites updated. And they needed this data in a structured format with useful fields users could filter companies with.

Employbl originally tried Clearbit for enriching organizational data. But found that there was no service tier aligned with newly started orgs or solo entrepreneurs. Additionally, Clearbit would charge the same amount whether an organizational data enrichment call returned data or not.

How Employbl Built Out a Richer Database in Less Time

Building out the scraping ourselves would have taken weeks or months to develop and would have taken away focus. And it positively wouldn’t be as comprehensive as the data we get from Diffbot.
Connor Leech – CEO @Employbl

Manually, it was taking Employbl about 3 minutes to update each company in their database. At 5,000 companies, that would be over 250 hours of time! Alternatively, setting up scraping of their own may have taken weeks or months and would have required maintenance.

Diffbot’s Knowledge Graph and corresponding API provide a wide-ranging ontology of relationships between — and facts about — hundreds of millions of organizations and people.

Data in the Knowledge Graph is sourced from the public web, with an average of over 20 “facts” per entity, and an average of 6 “origins” for each fact. This allows for greater data coverage and precision in a range of fields, with no manual extraction or custom build scrapers.

Knowledge Graph data was able to provide Employbl up-to-date data from across the web including information on funding rounds, investors, number of employees, location, detailed industry data, and more. All details individual seeking to work in tech find important when job searching!

Key Takeaways

Though a great deal of firmographic data is public, manual and rule-based extraction is hard to scale
Rich insights in firmographic data are a function of how up-to-date, structured, and comprehensive data sources can be.
Even a single — if broad — field like tech, can require web data extraction from a TON of domains.

What This Meant For Employbl

The major roadblock to Employbl’s focus on democratizing tech and startup hiring data was access to scalable structured web data on companies. By handing off the task of crawling and structuring the public web, Employbl was able to focus on more meaningful developments to their platform.

In addition to being able to expand their database by thousands of companies, Diffbot’s structured web data was able to expand their range of fields attached to firmographic data as well as provide data provenance.

That’s a ton of time for an early stage startup and saving it allowed their team to move on to matters more important than data gathering. All while adding richness and longevity to the data they already offered.

Public Web Data For The Public

Employbl doesn’t charge for their platform. And a primary driver in the creation of the platform included helping those affected by Covid-19 related layoffs in Silicon Valley.

The conspicuous lack of enterprise data enrichment tools at a price point of a new enterprise or individual entrepreneurs is telling. With many data enrichment platforms you’re paying for shadily bought data, rather than harnessing what’s already public on the web.

Diffbot structures “guilt-free” public web data so you can query the web like a database and provide public data back to the public in a more usable format.