Custom Extraction with SEO Crawler to Optimize Your E-commerce Site

Custom Extraction with SEO Crawler to Optimize Your E-commerce Site

An SEO Crawler is a service used for technical SEO audits. It checks if the web resource has any problems with visibility for search robots and whether the chosen architecture is effective. It also shows existing client and server errors. You can check the web site online right in your browser.

For more information on this topic, see What is the Technical SEO Audit . Also you can try our new Technical SEO Checklist too.

However, an SEO Crawler can tell you more than just about technical bugs. Experienced SEO professionals often use a custom web crawler to improve conversions and make the site structure more user friendly.

Different tools are used for SEO crawling. Among the most popular are the SEO Crawler and the Screaming Frog SEO Spider. Already from their descriptions, it is clear that these services are useful not only for technical audits. The intuitive SEO Crawler theme immediately catches your eye.

Seo Crawler

Extraordinary possibilities of SEO Crawler services. What is custom extraction

Any of the SEO Crawler tools allow you to extract a large array of information, including non-standard ones. This is called custom extraction – extraction of a certain type of data from HTML code that can be exported to a CSV file or excel table. In fact, the crawler pulls technical and commercial information from the specified web resource and then provides it in a convenient structured form.

This is legal since only open information is collected and analyzed.

What data can be extracted with SEO Crawler?

  • Headers – XPath search language allows you to extract the headers h1, h2, and h3. Related article –  H1 Tag.
  • Data with the hreflang attribute – specifically, the SEO Spider collects this information by default by displaying the language and region codes for each URL.
  • Structured Data – Tools scan pages for the availability and quality of implementation. Find out more in the What is Structured Data? article.
  • Meta Tags – You can quickly and easily view the title and description of your competitors. You can find more information on this topic in the articles – Title Tags and How to Build a Nice Meta Description.
  • Social media tags – You can extract Facebook, Open Graph, and Twitter tags, which allow social media to better understand the content on the web resource pages. Read more in the article Open Graph Meta Tags.
  • Interactive phone numbers – the service finds all the links that are used for instant dialing.
  • Email addresses – the program allows you to collect and structure all emails that are on the pages. 
  • Iframe elements – different types of information can be retrieved from this attribute, such as Youtube videos that are embedded.
  • AMP – the output after crawling will be a list of AMP URLs. For more information, see What You Need to Know About Accelerated Mobile Pages.
  • Keywords – services indicate which keywords each page is optimized for.
  • Links – you can choose to extract links to a specific domain with a specific anchor of text. You could also collect all the links that are on the site. There are many options.
  • Custom data is one of the main advantages of all crawlers. They allow you to collect specific information from your site or from competitors’ sites such as prices for specific products or the number of comments on blog articles.

How to extract custom data. Screaming Frog SEO Spider Service Example

Among all the tabs that the Screaming Frog SEO Spider service produces when analyzing a site, the custom data tab is the most valuable if there is a need to obtain certain information from your own or someone else’s web resource.

Screaming Frog

Custom Extraction allows you to extract data using XPath, CSS, or Regex.

XPath works in search and extract format. It provides all the values of the required HTML elements and their attributes based on XPath selectors. It is also the most powerful method for obtaining custom data but requires experience and knowledge to compose the correct queries. 

CSS is a simple and efficient way to extract data using CSS selectors. You can simply specify `a` in the request to get a list with all the links that are on the page.

Regex retrieves values that match the specified regular expression. The method is ideal for advanced purposes. But to get all the benefits of customizing the data retrieval process, it is important to be able to work with regular expressions.

Screaming Frog

Most of the time SEO professionals use XPath and CSS to retrieve information from a website.

At the first stage of working with the crawler, you need to specify what type of data you want to receive:

  1. The selected HTML element and its content
  2. The inner content of a specific HTML element
  3. Text content of the selected element and all nested elements

Next, you need to enter a specific search in the corresponding field of the extractor such as “How to form it?”. The fastest way is to go to the page and view the HTML code of the site element you are interested in. This can be done by right-clicking on it and selecting Inspect Element.

Let’s see how much a laptop costs at Walmart.


As you can see, the product price is placed here in the <span> tag with the price-characteristic class. If you want to collect all prices from the site, enter the following XPath search in the line of the Configuration> Custom> Extractions section:


To make it easier to view the cost, rename the corresponding column.

The service itself tells you whether the request you are using is valid. If it is valid, there will be a green checkmark at the end of the request line. If not, there will be a red cross.

Next, enter the URL of the site you want to explore in the special line of the service. Click Start.

Screaming Frog

As your personal price scanner runs, the extracted data will begin to appear on the Custom Extraction tab. Wait until the indicator shows that the scan is complete, after which you can export the data to a table for easy viewing.

If you want to simultaneously explore a list of URLs, the service allows you to select the appropriate option.

Screaming Frog

4 use cases for custom extraction for e-commerce

All the data that can be extracted from the HTML-code of the site. When used correctly, it can improve marketing performance. Nevertheless, we will look at a few illustrative example scenarios to demonstrate all the possibilities of custom crawler tools for e-commerce sites.

Scenario 1: Retrieving Product Prices

Let’s say you want to open an online store in a certain niche and don’t know yet what influences the pricing in it. After analyzing the market, you have determined that there are several strong competitors. In order not to check the cost of all goods manually, you can use the SEO Crawler service to get this data in a convenient excel table.

To do this, you need to do the following:

  1. See how the cost of a product is indicated in the HTML code.
  2. Create a corresponding request.
  3. Select the pages to be analyzed. This can be done using an XML sitemap.
  4. Upload the received data to xlsx or CSV file.

Crawling services allow you to upload a table with a list of URLs, product names (heading h1), and their prices. And so on for each competitor.

Scenario 2: Extract Photos

Let’s say you are an official distributor of Apple products and want to post a photo of the product on your website. Custom crawlers can also help with this.

To extract photos, follow these instructions:

  1. Learn how the HTML code specifies the properties and URLs for images.
  2. Form a request in the crawling service with the appropriate ID. Remember to put the img tag from which you want to get the content of the src attribute (that is, the URL of the image).
  3. If you crawled the entire site, then select only those data that correspond to the product cards. Or you can download a list of URLs to parse.

At the exit, get the URLs of all images linked to specific product cards. You can download the photos yourself or immediately upload them to your site by specifying the links in the CSV file for import and export. Optionally, you can configure the table to display the corresponding product names.

Scenario 3: Retrieving Characteristics

If you are distributing goods, you need to transfer their technical characteristics to the site. What needs to be done for this?

  1. Examine the HTML code to find out what the element containing each of the characteristics is called and what class it has.
  2. Form a search specifying the found class in quotes.
  3. Specify in the settings that we only need Extract Text, without markup and other elements.
  4. Insert the addresses of the pages from which you want to collect characteristics.
  5. Load data into the excel table.

As a result, each cell of the table will have its own type of characteristic.

Scenario 4: Retrieving Site Structure

Before launching or when optimizing an online store, it is very important to work out an effective structure of the website. At the same time, it is not necessary to start building it from scratch; you can use the experience of already successful projects. Just follow the breadcrumbs.

  1. Right-click on the first link in breadcrumbs that comes across and open the HTML code of the selected fragment. Remember which element is the link, pay attention to the attribute and value.
  2. Specify the element, attribute, and its value in the XPath search for the crawling service.
  3. Analyze all the pages of the site and upload the data to an excel table.

In the columns of the resulting table, you will see the names of sections and subsections. This way, you can study the hierarchy and structure of any online store or directory.

How to use custom extraction to optimize conversions and UX

SEO Crawler tools allow you to extract almost any available information from any site such as prices, product availability, reviews, descriptions, contacts, in general, and everything that is in the HTML code. This gives you truly unlimited possibilities if you want to gain a competitive edge. And it also allows for higher conversions and better UX.

Let’s start with conversion. By extracting data on value, reviews, overall rating, meta tags, and keywords, you can already estimate which product is successful and which is not. Run a crawler to collect the same information from a competitor’s website. Perhaps he or she has better-optimized product pages, lower prices, and a more convenient structure. Compare, analyze the differences, and improve your site. The result will not take long.

You can also see which products of the competitor is currently out of stock. If you have them while they are in high demand, run an appropriate ad. No one will bother you to raise prices for such scarce products.

Here’s another idea. People pay more attention to the products they see in the recommendations. See which products are recommended by competitors and on what basis. What data do you use to make recommendations? How accurate and in line are they with user expectations? Quality recommendations are a powerful conversion tool. Try not to neglect them.

There are actually a lot of such ideas like these. Learn more in our SEO Competitive Analysis article.

What about UX?

Well, first of all, we’ve already talked about structure. It is difficult to overestimate its importance since sites with a poorly thought out hierarchy, confusing search system, and vague design will not interest users. A good structure will make the site intuitive and have a beneficial effect on the sales funnel. Therefore, make sure that the site becomes user friendly.

The second thing to do is filters. If you have a large online store, then you most likely have a system for selecting suitable products from each category. Track which filters are at the top on competitors’ sites, which ones are most often used by buyers. Now focus on them. If you don’t have filters yet, use the platforms of your niche colleagues to learn from their experience. Filtered information is retrieved in the same way as other custom data.

Similarly, you can collect and analyze other types of sales-related data to improve the performance of your site.

Pros and cons of website crawling

SEO Crawler services have solid benefits:

  • They allow you to conduct a full technical audit of your site.
  • Analyze and download useful data from competitors’ websites. Even when crawling web resources with thousands of pages, the top services are very fast.
  • You can easily customize what information you want to retrieve. The obtained data is presented in a convenient form with a breakdown by tabs.
  • You legally get the most complete information about the web site you are interested in and can use it to gain competitive advantages.

There is only one drawback. Almost all useful and convenient tools for crawling are paid. But believe us, it’s worth it.

About author
Eugenia Pasichnyk is a content creator with 10-years’ experience in the profession. Worked as a Project Manager at a Marketing Agency. Has been studying SEO processes for over 5 years. Interested in PR and other promotion methods.