Article API: Returning Clean and Consistent HTML

We’ve long offered HTML as a response element in our Article API (as an alternative to our plain-text text field). This is useful for maintaining inline images, text formatting, external links, etc.

Until recently, the HTML we returned was a direct copy of the underlying source, warts and all — which, if you work with web markup, you’ll know tilts heavily toward the “warts” side. Now though, as many of our long-waiting customers have started to see, our html field is now returning normalized markup according to our new HTML Specification.

What this means: you can reliably count on a consistent set of elements and attributes, and overall markup structure, in all HTML returned by our Article API. Images and videos are returned inline and within figure elements; all block-level text is returned wrapped in paragraph tags; all script and style and other ancillary markup is stripped completely.

Sample, Before:

<div class="entry-content">
<p class="body">We've long offered HTML as a response element in our Article API (as an alternative to our plain-text text field). This is useful for maintaining inline images, text formatting, external links, etc.<br /><br />
Until recently, the HTML we returned was a direct copy of the underlying source, warts and all.</p>
<div class="image floatLeft" id="mainImage"><img src="diffy.png" class="primary"></div>
</div>

Sample, After:

<p>We've long offered HTML as a response element in our Article API (as an alternative to our plain-text text field). This is useful for maintaining inline images, text formatting, external links, etc.</p>
<p>Until recently, the HTML we returned was a direct copy of the underlying source, warts and all.</p>
<figure>
    <img src="diffy.png">
    <figcaption>Diffy the robot makes a surprise appearance</figcaption>
</figure>

(Trust me, the “after” is much better!)

We like it so much that we’ve also made html a default response element in our Article API. Take a look at the HTML Specification, start parsing some markup, and let us know if you have any questions or issues.

John Davi

John runs everything product for Diffbot. Drop him a line at john at diffbot if you have questions.