3 Challenges to Getting Product Data from Ecommerce Websites

Online retailers and ecommerce businesses know that there’s nothing more important than your product and your customer (and how your customer relates to your product).

Making sure that product information on your site is accurate and up-to-date is essential to that customer relationship. In fact, 42% of shoppers admit to returning products that were inaccurately described on a website, and more often than not, disappointment in incorrectly listed information results in lost loyalty.

That’s where having access to high-quality product data can come in handy. Product feeds can help keep that data organized and availed for review, so you can easily assess if there is information missing from your site that may be invaluable to your customer.

But aside from keeping your own product information up to date, product data is also valuable for many other facets of your business. It can help you purchase or curate products, compare competitor offerings, and even drive your marketing decisions.

The trouble, however, is that it can be notoriously difficult to collect, and unless you have the ability to gather that information quickly and comprehensively, it may not do you any good. Here’s what you should know.

Don’t miss: 5 More Things Retailers Can Do With Product Data

Why Product Data Is So Useful

Product data from ecommerce sites can be used for a variety of purposes throughout your company, both from internal and external sources. Here are just a few areas you can use product data to drive sales.

Sales strategy. Understanding your competitor’s strategy is important when developing your own. What are other brands selling that you’re not? What areas of the market are you covering that they’re not? Knowing what products are selling elsewhere helps you get a leg up on the competition and improve your product offering for better sales.

Pricing data. Product data allows you to find the cheapest sources of a product on the web and then resell or adjust your prices to stay competitive.

Curating other products. Many sites collect products from other retailers and feature them on their own pages (subscription boxes or resellers, for example) or to increase the number of products they sell on their own site. Curating those products from multiple sites that have their own suppliers and retailers with their own product data can make the whole process rather complex, however.

Affiliate marketing. Some sites might embed affiliate links in product reviews, monetize user-generated content with those links and then build product-focused inventories based on consumer response. In order to do all of that, you need product data. Product data can help build any affiliate sites or networks and help give the most accurate inventory information to marketers.

Product inventory management. Many ecommerce sites rely on manufacturers to provide data sets with specific product information, but collecting, organizing and managing that data can be difficult and time consuming. APIs and other product data scraping tools can help collect the most accurate data from suppliers and manufacturers to ensure that databases are complete.

There are plenty more things you can do with data once it’s collected, but the trick is that you need access to that data in the first place. Unfortunately, that data can be harder to gather than you might think.

Challenges of Scraping Product Data

There are a few challenges that may hinder your ability to use product data to inform your decisions and improve your own product offerings.

Challenge #1: Getting High-Quality Data

High-quality data drives business, from customer acquisition, sales, marketing and almost every touchpoint in the customer journey. Poor data can impact the decisions you make about your brand, your competition, and even your product offerings. The more comprehensive and accurate the data is, the higher the quality.

Quality data should contain all relevant product attributes for each individual product, including data fields like price, description, images, reviews, and so on.

When it comes to pulling product feeds or crawling ecommerce sites for product data, there are several obstacles that you might face. Websites may have badly formatted HTML code with little or no structural information, which may make it difficult to extract the exact data you want.

Authentication systems may also prevent your web scraper from having access to complete product feeds or tuck away important information behind paywalls, CAPTCHA codes or other barriers, leaving your results incomplete.

Additionally, some websites may be hostile to web scrapers and prevent you from extracting even basic data from their site. In this instance, you need advanced scraping techniques.

Challenge #2: Getting Properly Structured Data

Merchants may also receive incomplete product information from suppliers and populate it later on, after you’ve already scraped their site for product information, which would require you to re-scrape and reformat data for each unique site.

If you wanted to pull data from multiple channels, your web scraper would need to be able to identify and convert product information into readable data for every site you want to pull data from. Unfortunately, not all scrapers are up to the challenge.

Product prices can also change frequently, which results in stale data. This means that in order to get fresh data, you would need to scrape thousands of sites daily.

Challenge #3: Scaling Your Web Scraper

If you were going to pull data from multiple sites, or even thousands of sites at once (or even Amazon’s massive product database), you would either need to build a scraper for each specific site or build a scraper that can scrape multiple sites at once.

The problem with the first option is that it can be time consuming to build and maintain tens or even a hundred scrapers. Even Amazon with their hefty development team and budget doesn’t do that.

Building a robust scraper that can pull from multiple sources can also be difficult for many companies, however. In-house developers already have important tasks to handle and shouldn’t be burdened with creating and maintaining a web scraper on top of their responsibilities.

How Do You Overcome These Challenges?

To get the most comprehensive data, you need to gather product data from more than one source – data feeds, APIs, and screen scraping from ecommerce sites. The more places you can pull data from, the more complete your data will be.

You will also need to be able to pull information frequently. The longer you wait to gather data, the more that data will change, especially in ecommerce.

Prices change, products are sold out and added on a daily basis, which means that if you want the highest quality data, you will need to pull that information as often as possible (at least once a day ideally).

You will also need to determine the best structure for your data (typically JSON or CSV, but it can vary) based on what your team needs. Whatever format you choose should be organized efficiently in case updates need to be made from fresh data pulls or you need to integrate your data with other software or programs.

The best way to handle each of these issues is to either build a robust web scraper that can handle all of these at once or to find a third party developer that has one available to you (which we do here). Otherwise you will need to address each of these issues individually to ensure you’re getting the best data available.

Here are 5 more surprising things you can do with product data

Final Thoughts

Unless you have high-quality data, you won’t be able to make the best decisions for your customers, but in order to get the highest quality data, you need a robust web scraper that can handle the challenges that come along the way.

Look for tools that give you the ability to refresh your product data feeds frequently (at least once a day or more), that give you structured data that helps you integrate that information quickly with other resources, and that can give you access to as many sites as you need.