Disclaimer: this article is about a very mundane consumer search. With this said, the implications of how knowledge work and fact accumulation are often performed have wide-reaching implications for knowledge work flows.
The other day I was searching for coworking spaces.
As in many domains of knowledge, data coverage online was largely human curated. Lists with some undisclosed methodology provided the writer’s favorite coworking spots by city.
Sure, search engines will return a list plotted to a map in any major search engine. But I’m sure we’ve all run into the following.
- Load map…
- Pan slightly to surface more results…
- Zoom slightly to surface more results…
- Pan the opposite direction to try and find a result that had caught our eye…
- Try to recall the name that caught our eye in a new search…
Five steps to seek further data points on a single search result. Devoid of context, data provenance, and the ability to analyze at scale.
Sure, consumer search works in many, many cases. So do phone books.
If you’re a power user, a data hoarder, or a productivity buff, you can likely see the appeal of a search that actually returns comprehensive data. If you’re building an intelligent application or performing market intelligence, using search that won’t let you explore the underlying data is just a waste of time.
So after this predictable foray in which I ignored the advice of several articles, scrolled around a map, and got sidetracked once or twice, I decided to resort to a different sort of search: Diffbot’s Knowledge Graph.
- The title of our article may not make much sense if you haven’t been acquainted with Diffy, Diffbot’s web-reading bot
- You see the promise of external web data for many applications… if it were structured (or at least felt disappointment at consumer search engines keeping you from public web data)
Knowledge Graph Search
Opening the Knowledge Graph, it took all of 20 seconds to return data on over 4,000 coworking spaces. And sure, unless you’re selling a service to coworking space, you may wonder why anyone would need all this data as a personal consumer…
Maybe it’s simple curiosity. Maybe it’s the principle of it all; the fact that all of this information is publicly available online, but not in a structured format. Maybe this is just an analogy for non-consumer searches that also can’t be performed on major search engines. Any way you take it, search of the present is flawed for many uses, and it’s still our primary collective data source.
So what does search in the Knowledge Graph look like?
Well it starts with entities.
Knowledge graphs are built around entities (think people, places, or things) and relationships between entities. The types of relationships that can occur between entities, and the types of facts attached to entities are prescribed by a schema. One of the major “selling points” for knowledge graphs is that they have flexible schemas. That is — more so than other types of databases — they can adapt to what types of facts matter out in the world.
The Importance of Structured Web Data
At their core knowledge graphs (the category of graphs) can be built from any underlying data set. In the case of Diffbot’s Knowledge Graph, it’s the world’s largest structured feed of web data. Diffbot is one of only a handful of organizations to crawl the web. And using machine vision and natural language processing we’re able to pull out mentions of entities as well as infer facts and relationships.
Why is this important?
The web is largely made up of unstructured or semi-structured data. This means you can’t easily filter, sort, or manipulate this data at scale. While the internet is our largest collective source of knowledge, it’s not organized for modern knowledge work.
Diffbot’s products center around organizing the world’s information, whether through our AI-enabled web scrapers, our Knowledge Graph, or our Natural Language API. The ability to source the information from the web in a structured way provides the bedrock for machine learning initiatives, market intelligence, news monitoring, as well as the monitoring of large ecommerce datasets.
The State of Coworking Spaces As Told By AI
So what can you learn from a coworking space dataset that’s much more explorable than consumer search?
It turns out a lot.
While each individual data point is all available online, it’s not aggregated anywhere else in quite as explorable of a format.
In our case we can start with a simple facet query. Faceted search provides a summary view of the value of one fact type attached to a set of entities. So with this sort of query we can quickly discover what locations have the most coworking spaces.
By simply adding
facet:locations.city.name we can turn over 4,000 unique results into an observation. While data found about these coworking spaces across the web would be in many different formats (and in many languages), knowledge graphs help to consolidate similar entities around standard fields.
An additional strength of knowledge graphs is that data points can be consolidated from many different sources with data provenance and then built off of. Using natural language processing and machine learning, fields can be computed or inferred from many underlying data sources. Our original query looked at organization entities with “coworking spaces” as part of their description. But an AI-generated field of “descriptors” allows for additional granularity. Let’s look at a facet view of the most common services offered by coworking spaces.
Depending on your experience with a range of coworking spaces, descriptors such as “expat,” “civil & social organization,” or “self improvement” may be novel. By amalgamating tens of thousands of online mentions, articles, and entries into this subset of org entities, the Knowledge Graph dramatically cuts down on time of fact accumulation.
One final area in which consumer search is severely lacking (or just in practice unpractical) is that of market research. Industry-specific events such as funding rounds, openings of new offices, key executive hires or leavings, or clues as to private organization revenue can be hard to pinpoint across the web. Softer signals like sentiment around topics or velocity of news coverage can also be informative.
Diffbot’s article index is roughly 50x the size of Google News. Unlike traditional content channels, you aren’t presented with content that’s gamed the system or paid to get your attention. Additionally, where consumer search engines are siloed by language or location, Diffbot’s article index is pan-lingual. With articles augmented by additional filterable fields underlying articles can become unique observations on sentiment, key happenings, and more. All underlying article data is returned as well, supporting the ability to mine in once you’ve found an interesting angle.
For a deeper dive into creating custom news feeds around organizations and events be sure to check out our Knowledge Graph news monitoring test drive.
Maybe you don’t buy the segue from what really is a consumer search (“coworking spaces near me”) and the copious coworking data available in the Knowledge Graph. But the fact of the matter is that a great deal of knowledge work still relies on human fact accumulation. Without automated ways to structure unstructured data, there’s a definite floor to the cost per fact.
Knowledge graphs provide a bedrock for knowledge workflows reengineered from the ground up. In particular:
- Knowledge graphs mirror what we care about “in the world” (entities and relationships)
- Knowledge graphs provide flexible schemas allowing for fact types attached to entities to change over time (as the world changes)
- Automated knowledge graphs provide one of the only feasible ways to structure market intel and news monitoring data that can be spread across the web
- Knowledge graphs that don’t expose their underlying data aren’t suitable for use in intelligent applications or machine learning use cases
- Knowledge graphs that provide additionally computed fields (sentiment, tags, inferences on revenue or events) provide additional value for market intelligence and news monitoring