How and Why Search Engines Render Pages
Web Content: Rendering vs. Indexing
Progress makes it possible to use a much wider range of tools by executing scripts both on the server-side and using browser rendering engines. On the other hand, if earlier content on a website could be understood based on its hypertext markup and source code, the ubiquitous use of scripts makes this task much more difficult.
What does render mean? Let’s say you want to maintain a small database of invoices that are issued to your customers. Since the scope and structure of the task are quite simple, it is enough to use solutions like Ivory Search and load new data into the database through the application. The applet will make sure that the data is available to the users of your site. When the client enters their invoice number or selects the desired value from the drop-down list, they will see the corresponding page. What is actually contained within the code of your page? Depending on the platform, one of the most common options are:
- External script <script>
- object <object> or <embed>
- Shortcode for WordPress users.
Instead of entering information about each point into your page, you use the appropriate customer service interface to create a small database. All you have to do is add the shortcode to the desired section of the rendered pages.
After that, when the script is executed, a search field, a map, and a list of suitable stores will appear on the page.
However, if you look at a website from the point of view of a search bot, then this content is not on the page until the script code is executed. This makes it extremely difficult to determine its content.
Quite often, the use of external scripts is associated with analytical programs or attempts to expand the possibilities for search engine optimization of content. For example, when displaying some part of a page or product, download reviews or other information from Google or other sources. While such a decision can have a positive effect on the conversion of the page itself, there will most likely be no significant positive changes from an SEO point of view.
There are several reasons for this. Although the indirect signals in the form of purchases, bounce rate, and time on the page will improve, the search bot will not be able to relate this to the content that users see. This is because the code does not contain the corresponding sections or elements from the point of view of the bot.
Therefore, taking into account the specifics of the processing of modern web pages, there are two main stages between which link render occurs. The first is the original HTML and the second is the final form of the page.
When rendering a page, Google does not waste resources and time to recreate the visual display. As a reminder, the main task is to figure out what the page is about. To do this, you need to analyze its content. This means it is enough to get the final structure of the web document for further processing. Martin Splitt from Google spoke about this in detail in his speech at TechSEO Boost in 2019:
The reason Google is so careful about computing resources is very simple: the constant growth of volumes. According to Martin Splitt, about 160 trillion documents are processed every day, so resources must be allocated as deliberately as possible to ensure that productivity does not suffer, processing as many requests as possible continuously and sequentially.
In order to successfully pass indexing in Google, web pages are divided into two categories: requiring and not requiring content to be rendered before indexing. Martin Splitt described this process in more detail during the traditional Google Webmaster Central office-hours on August 23, 2019:
Indexing occurs in the order of a normal queue where the presence of scripts does not affect the nature and logic of the content. However, if external or internal scripts generate a significant part of the document, then it is sent for two-stage indexing with the participation of specialized rendering engines. The staging is that before indexing, you need to get the final structure of the page. Therefore, you need to render it first, get the final structure, and then send it for further processing and indexing, not the original HTML. Accordingly, the queue of pages requiring two-stage indexing comes only when sufficient resources are available to continuously perform all the necessary calculations. John Mueller also touched on the topic of queuing and prioritizing two-stage indexing on his Twitter.
While dynamic content is more of technical optimization, there are a few simple steps that can help your site avoid common indexing issues.
Allow indexing of external resources
Back in 2014, the Google Webmaster Central Blog mentioned the need to allow search bots to access all the resources necessary to display the page in one of its publications.
John Mueller also reminded us of this in his posts on Google+. The easiest way to achieve this is to add the following lines to your robots.txt file:
Use dynamic links correctly
Typically, Google follows “the link is a link” principle and treats all relevant elements in a document according to their attributes.
However, the exceptions are links that the search bot finds in the script code. In this case, it will try to index the found link but will automatically calculate it as a nofollow.
Place important content in the user’s field of vision
This, in turn, leads to the fact that hidden items are not counted in the ranking and therefore, not indexed.
Use only the Scripts You Need
Search engine optimization includes many elements. One of the main tasks is to achieve optimal web page load time. With the abuse of analytical services, various add-ons, and plugins, the site may start to load slowly in the web render engine, despite all efforts. Use the Chrome Developer Tool to identify any unused scripts.
Don’t forget about forwarding
Laziness is the engine of progress
A technology known as lazyload allows images or other content to be loaded only when the user needs it, which saves resources and time. One of the important rules for using lazy loading is not to use it on main text content to avoid problems with indexing the page. To activate lazyload, it will be enough to add the loading=”lazy” attribute to the element. Remember to use asynchronous loading for scripts in your pages. The easiest way to do this is to add the async and defer attributes to the script load tag.
Delegate and Share
Time to First Byte (TTFB) and Time to Interactive (TTI) are some of the main performance metrics for on-page scripts. If your script starts to exceed 50 or even 100 kilobytes in size, it might be worth dividing it into several smaller pieces. Many small scripts can be completed more easily and quickly than one large one, especially when given the ability to load asynchronously.
Consider the Features of Different Search Services