Diffbot’s New Product API Teaches Robots to Shop Online

Diffbot’s human wranglers are proud today to announce the release of our newest product: an API for… products!

The Product API can be used for extracting clean, structured data from any e-commerce product page. It automatically makes available all the product data you’d expect: price, discount/savings amount, shipping cost, product description, any relevant product images, SKU and/or other product IDs.

(more…)

Read More

Announcing Crawlbot: Smart Site Spidering and Extraction

Today we’re happy to announce the public availability of Crawlbot, our computer-vision-powered site crawler and extractor.

If you want structured data from an entire site, Crawlbot will fully spider a domain and hand off the right pages to Diffbot APIs. The result? A queryable index of the entire site’s data, or a complete download of the site’s structured data in easy-to-read — for a robot — JSON.

(more…)

Read More

New Feature: Correct and *Concatenate* Multi-Page Articles

Our Article API automatically joins multiple-page articles into a single “text” or “html” field.

On some sites though our algorithm is unable to concatenate for various reasons (typically non-standard pagination design convention). Furthermore, any site with an overridden “text” field (via a Custom API rule) will no longer automatically concatenate multiple pages.

nextPage

We’re happy to introduce an oft-requested fix for this. From now on, if you create a ‘nextPage’ rule in our Custom API Toolkit (developer login required) we will automatically follow the specified link specified — and any subsequent links, up to ten pages — and concatenate into a single result. Moreover, you’ll only be charged for a single API call.

For more information check out our overview in Diffbot Support, or have a go in our Custom API Toolkit.

Read More

Diffbot’s HackerNews Trend Analyzer

Like any good developer service, we’re fans of Hacker News. Making the vaunted Frontpage is a, well, vaunt-worthy accomplishment (we’ve been there once), so we thought we’d use our APIs to analyze and identify any trends in what content makes the Frontpage.

The result is Diffbot’s HackerNews Trend Analyzer. Feel free to click that link and play around, or read more here for details on how we did it.

(more…)

Read More

New Feature: Custom Timeouts

The slowest part of any Diffbot API request is the call-response to third-party content. Depending on the third party server’s responsiveness and location, it could be anywhere from a third of a second to tens of seconds before we receive content to process. (Diffbot internal rendering and processing, by comparison, averages just over 100 milliseconds.) (more…)

Read More