One of our most common feature requests: can Diffbot APIs access content behind a login or firewall? Until recently, the answer was mostly “no.”
But now we’ve recently added new features to all of our APIs, both Automatic and Custom, that should allow much broader access to non-publicly available content:
All Diffbot APIs now support the passing of custom HTTP headers (Wikipedia), including cookie, user-agent and referer. If you include these custom header fields in your API request, Diffbot will use these in place of default headers in fetching the third-party content.
If you’re trying to process content that requires basic HTTP authentication (Wikipedia), simply include the username and password in the
url field of your Diffbot request, and Diffbot will send along the authentication information when retrieving the third-party URL.
POST the Content to Diffbot
If you have access to content (intranet, local files), you can now POST the markup or text (for the Article API) directly to all Diffbot APIs. Diffbot will render and process the page content, and return the structured information, just as if it had fetched it directly.