We will get justice. We will get it. We will not let this door close.
– Philonise Floyd, Brother of George Floyd
News coverage this week centered on George Floyd, police, and Donald Trump. COVID-19 related news continue to dominate globally.
That’s the macro story from all Knowledge Graph article published in the last week. But Knowledge Graph article entities provide users with many ways to traverse and dissect breaking news. By facet searching for the most common phrases in articles tagged “George Floyd” you see a nuanced view of the voices being heard.
In this story hopefully you can begin to see the power of global news mentions that can be sliced and diced on so many levels. Wondering how to gain these insights for yourself? Below we’ll work through how to perform these queries in detail.
How we got there: Diffbot’s Knowledge Graph holds hundreds of millions of article entities at any given moment. These articles are of truly global origins, and are parsed by our cutting-edge machine vision and natural language processing systems to take unstructured article data and transform it into structured, query-able entities.
Some of the most noteworthy fields through which one can traverse article entities include:
- PublisherCountry
- Sentiment Scores
- Date of Publication
- SiteName
- Tags
- Language
- And Title
Because the Knowledge Graph preserves relationships between diverse entity types, you can even explore articles through the lens of organizations, locations, people, and more.
Faceted searching (read a general overview or find out more in the docs) allows you to gain a summary view of what are often thousands of results. This can be particularly valuable for tracking article sentiment over time, the number of articles published on a topic over time, or finding the most popular topics being discussed.
It’s worth noting that while queries utilizing the above fields may often return large numbers of articles, that all article text, media, and discussion data is also preserved within article entities. Yes, this means that while you may initially explore articles to get a high-level view (or feed a news monitoring dashboard), that you can read articles that meet your criteria directly within the Knowledge Graph.
Knowledge Graph access is essentially access to the largest global news index in the world.
In this story by DQL (Diffbot Query Language), we’ll walk through the queries utilized to create the following graphic. In particular, these queries show you have to…
- Return all global articles published in the last 7 days
- Rank the most popular tags (topics) for all articles published in the last 7 days
- Gain a high level view of where news is being published
- Explore what tags most often coincide with tags of interest
Note that while some of these queries can be accomplished through the useful visual query editor, that many of the following queries must be entered as a query in DQL (don’t worry, we’ll provide you with the query to get you started).
Let’s start generally (and at the top). As we progress through this story by DQL we’ll gain specificity and use additional query selectors.
Within the Knowledge Graph you can easily return all articles using either the visual query editor or a DQL query. Start like this:
type:Article
Every DQL query starts with the type of entity you want returned (or in the case of a facet search, you want to view a field of). To bound by date, simply add a date selector:
type:Article date<7d
You can also set a range like follows. This query is structured to return a date range relative to today. But you can also specify a definite time range using UNIX epoch time.
//articles from the last 7 days excluding yesterday
type:Article date<7d date>1d
At the time of this publication, over 700,000 global articles had been indexed in the Knowledge Graph over the preceding 7 days.
An impressive result, but what are they all about? Let’s see gain a high-level view of the most popular tags. In the case of the Knowledge Graph, tags can be taken from literal categories as provided by publishers, or crafted as our natural language processing parses articles to discern what they’re about.
type:Article date<7d facet:tags.label
The above DQL query returns a list of the most common tags (topics) among articles inside the Knowledge Graph that were published in the last week. As of the writing of this post, these were largely related to George Floyd protests and Covid-19. Potentially interesting outliers include Solar City Corporation, rosters (in relation to sports teams), and posts about Instagram.
Tackling another high-level view of the same articles, we can see where they were primarily published.type:Article date<7d facet:publisherCountry
The above query powered the donut chart in the above graphic. And while macro news numbers may not approach a granularity that’s practical, note that the queries we’re working through can be utilized in tandem. One can facet search for the most common tags published in a given location, or return sentiment or discussions by country or publisher. While it takes effort to learn DQL, a world of contextual news mention data is available to those who do.
On the right of the graphic you may notice you can toggle the word cloud of tags to show tags that most often coincided with the single two largest topical clusters within news this week: George Floyd, and Covid-19.
To utilize this particularly powerful query, you simply add one additional phrase to our earlier tag facet search:type:Article tags.label:"George Floyd" date<7d facet:tags.label
Here we’ve constrained the initially returned articles to those with tags labelled “George Floyd.” Note you can replace “George Floyd” with any other tag to find a list of tags that most commonly coincide.
Finally, faceting by quotes. This query with which we started of this story by DQL is particularly powerful for gaining insights into what voices are being heard on a given topic. It’s like a retweet count for the news!//facet by most common quotes found within articles tagged "George Floyd" from the last week
type:Article tags.label:"George Floyd" date<7d facet:quotes.quote
At this point you’ve likely thought up a few queries of your own you would like to explore. By signing up for a free 14-day trial, you’ll get immediate access to the Knowledge Graph and can start exploring global news data right away.
Need some ideas? Try out some of the related queries below, check out the Knowledge Graph Documentation, or other stories by DQL.
//what tags have coincided with articles mentioning Donald Trump most often in the last 30 days
type:Article tags.label:"Donald Trump" date<30d facet:tags.label
//what tags have coincided with articles tagged 'disaster' and published in Spain in the last 60 days
type:Article publisherCountry:"Spain" tags.label:"disaster" date<60d facet:tags.label
//all articles with Kamala Harris in the title sorted from oldest to newest
type:Article title:"Kamala Harris" revSortBy:date
You must be logged in to post a comment.