From the Changelog: New Status Page, Crawlbot and Product API Updates

We’ve been so busy in 2016 we’ve barely had time to announce what we’ve done. We have tracked it however at our riveting Changelog, which we invite you to peruse as soon as humanly (or, if you choose, robotically) possible.

Some choice new features and updates from the past months include:

  • Crawlbot can now optionally spider across multiple domains, can have repeat settings adjusted in between crawl rounds, and can be supplied with custom headers for sending unique cookie, referrer, or other values to sites you’re crawling.
  • Crawlbot and our Bulk Processing Service now feature automated intelligent retries for making sure your crawls and bulk jobs achieve the highest processing success rate.
  • The Product API has seen significant improvements to product specification extraction (and normalization), the addition of our automatic categorization (in the form of inferredCategory), and updated image extraction.
  • And too many more to list here. See the whole shebang at http://www.diffbot.com/dev/docs/changelog.

John Davi

John runs everything product for Diffbot. Drop him a line at john at diffbot if you have questions.