Extracting Product Variant Data with DiffbotAPI

Diffbot API allows you to automatically gather ecommerce information such as images, description, brand, prices and specs from product pages, but what about when product pages contain mutiple variants of the product, being offered at different prices?

A product variant is when there are variations of a base product, such as mulitiple sizes, colors, or styles that may have their own pricing and availability. For many kinds of products–ranging from apparel, to home goods, to car parts, these product variants are crucial to understand. For example, you wouldn’t want to get kid-sized shoes sent to you for adult-sized feet. Product variants also give you clues as to which variations of a product are available from the merchant, and which might be sold-out.

Diffbot’s APIs might not always be able to extract variants automatically using AI, but thankfully Diffbot includes a powerful Custom API that allows you to both correct and augment what is extracted.

Let’s take a look at this product page – in this example a bedding sheets set from Macys – that has product variants. If we pass this URL to Diffbot API, Diffbot automatically extracts the base product’s title, text, price, sku, images, as well as the thread count and fabric. However, it does not extract the variants.

In this example, the sheets come in multiple sizes (from Twin to California King) and come in colors ranging from a classic white to Pomegrante (which unsurprisingly has plenty in stock). We can easily see as a human that the add-to-bag price depends on the size, and not the color.

Let’s make our AI see this too.

To do this we can use an X-Eval rule, essentially a Javascript function with our own custom scraping logic to augment what Diffbot already extracts. An X-eval can be specified when creating a custom rule using the Custom API.

function () {
  start();
  var variants = [];
  
  /* get sizes*/
  var sizes = $('li.swatch-itm').filter((i,e) => {
    return !$(e).hasClass('unavailable');
  });
  for (var i = 0; i < sizes.length; i++){
    var sizes = $('li.swatch-itm').filter((i,e) => {
        return !$(e).hasClass('unavailable');
    });
    var sizeEl = sizes[i];
    sizeEl.click();
    /* get colors. click first */
    var colors = $('li.color-swatch').filter((i,e) => {
      return !$(e).hasClass('unavailable');
    });
    if (colors.length > 0) {
      colors[0].click();
    }
    var price = $('div.price').text().match(/([0-9.]+)/)[1]; 
    for(var j = 0; j < colors.length; j++) {
      var colorEl = colors[j];
      variants.push({
      'size': sizeEl.textContent.trim(),
      'color': $(colorEl).find('.color-swatch-div').attr('aria-label'),
      'offerPrice': price
      }); 
    }
  }
  save ("variants", variants);
  end();
}

All X-eval functions start with a start(); invocation and end with end(); to signal that the function is complete (important when there are callbacks that execute after function return).

We proceed by enumerating the list of available sizes using Jquery, which is supported in X-eval functions. We then click on the DOM element corresponding to each size, and then use another Jquery selector to select the list of available colors. Finally, we use a third Jquery selector to select the offer price, and save this combination of (size, color, price) to a variants array.

The last step is calling save() on variants, which saves the variants array as a property of the product JSON that is returned by Diffbot. Our final extracted product now has these variants captured.