SEO, Online Marketing, and Artificial Intelligence

SEO, Online Marketing, and Artificial Intelligence

With the development of computing capacities, an increase in the number of data in computers, and an increase in the bandwidth of data transmission channels, we are hearing the artificial intelligence, machine learning, and neural networks mentioned more often in various fields of professional activity. Online marketing is no exception. We will not go into technical details, but rather try to simply and comprehensively explain how you can use these technologies to carry out search engine optimization and generate data.

Let’s start with what artificial intelligence (AI) is. Broadly speaking, it is an algorithm able to effectively imitate the cognitive functions of the human mind or, more simply, make decisions based on abstract data. It won’t surprise anyone that Google has had its own AI unit for a long time — Google AI. The most famous projects in which the Google AI team is involved in are Google Auto ML, Google Assistant, TensorFlow, and DeepMind

Machine learning (ML) is a more complex concept. In practice, we most often come across some form of ML in the form of predictive input and other smart recommendations based on the analysis of our activity. But the essence of ML is not about accumulating experience, but rather, self-improvement—the ability of an algorithm to independently increase its effectiveness based on the obtained data and how it can be applied. ML is widely used in all major Google services, from Gmail, Search, Maps, and Voice Assistant to Google Adsense and Google Adwords.

What is machine vision (MV)? It is a field of computer science dedicated to teaching programs and algorithms to recognize and process visual images. A data vision computer is able to recognize not only an image but also its parts. It can analyze its components but also the connections between them. For example, it understands where the outlines of objects begin and end, what their sizes and shapes are, and it takes into account numerous external factors, such as motion vectors. It is MV that allows people to develop programs for self-driving cars and implement functions such as automatic parallel parking.

The achievements in AI and ML are directly related to search engine optimization. You are using the results of years of research on a daily basis, from pixel learning to recommendations for increasing the effectiveness of PPC campaigns, as well as smart selections of relevant interests and how your content is processed for indexing. You may often face a task that is not easy to automate or speed up. For example, Google still relies on people to judge content quality, although research into semantic analysis and natural language processing has been going on for more than a few decades.

One of these tasks is structured data entry for the layout of web pages, taking into account the principles of structured data (SD). As you know, SD is an attempt to standardize how different types of information are displayed on a web page. Typically, SD markup does not affect how users see your page. At the same time, it completely changes the way search bots perceive hypertext markup. It is thanks to this schema standard that we can use advanced search results and Google snippets.

Despite the obvious advantages of using SD, one of the main challenges that a webmaster will face will be the need for manual processing and data entry. Fortunately, this process can be partially automated with AI. For example, you can use Python to create your own structured data generator capable of generating text from images, and our article will show you how this can be used for search optimization.

How Structured Data and Computer Vision Work Together

Since the process is based on image analyses, we’ll explain in detail how you can use its results to automate the generation of SD. It’s no secret that basic elements such as description, price, or size can be found on any product description page. Likewise, the contacts section often contains links to company profiles on social media, and faculty websites provide information on employees working there. For each type of task, it is possible to select a number of parameters that will be located in a similar way for a number of sites or pages. Thus, if you need to analyze the website of a manufacturer of goods that you would like to include in your catalog, it is highly likely that the description pages will have the same structure.

This is where MV comes into play, thanks to which you can train an algorithm to recognize specific elements of a page screenshot, process them, and extract the data in the desired format. Moreover, thanks to automation, you can learn how to generate data more efficiently. For example, it is possible to create texts for use in image ALT tags by automatically recognizing what is shown in the photo. 

Thus, the first task will be to train the algorithm, which will require some training and knowledge. In particular, it is necessary to prepare educational materials in the form of ready-made images with markup added to them. Image markup resembles autofocus that detects people’s faces when you want to take a photo on your phone. But, in this case, it is necessary to select those areas of the image that the algorithm will need to pay attention to and label them in a way that the analysis will give suitable results.

After the training phase, we will be able to employ the ready-made program on unfamiliar images, for example, new screenshots, including those generated automatically, to process them more efficiently than if we did it manually. This will be done based on the obtained results. We will use Python to eventually get the markup in a structured data format.

Cloud Google

Perhaps the most famous example of MV is autopilot programs for cars. Its primary task is to identify objects in the images taken by surveillance cameras. When the camera takes a photo, the algorithm processes it and marks each recognized object with a corresponding frame. Further, the recognized objects are marked according to the type: a person, an animal, a car, a curb, a hydrant, a tree, and so on. 

The tool allows you to process the object more accurately, i.e., to assign it suitable unique properties and also to not waste time on unnecessary analyses. In our example, something similar happens. Since the user and the screenshots of the pages made by this user act as a sensor that supplies images, all that remains to do is to add type and specify the properties of objects.

One of the most popular real-time image recognition algorithms is You Only Look Once (or YOLO3). To get a better idea of how the algorithm works in practice, you can watch this short presentation video.

The next task will be to create images for training. The larger the selection, the more accurately it will reflect the final task of the algorithm, and the better it is. To complete this step, you can use one of the numerous tools to automatically create screenshots of web pages. How you process the results and how correctly you label desired objects will be important. One of the simplest programs that will allow you to mark up images by saving them in XML format is tzutalin/labelImg.


Images saved in different formats can be used for further processing, for example, DetectNet. When using this format, a text file is generated for each picture, which contains the names of the labels and their coordinates. In the event that you will be processing sites on which the use of structured data has already been implemented, the task of calculating coordinates for each shortcut will be simplified. You can use the existing structured data markup to generate the correct coordinates for objects, which will save you time and effort. At this stage of processing, the text containing the coordinates of the shortcut will be presented as follows:

quantity 0.0 0 0.0 836 198 1303 253 0.0 0.0 0.0 0.0 0.0 0.0 0.0.

Cloud Google

The next step is to convert the images and annotation coordinates to a format supported by AutoML. It is important to note that before the release of AutoML, such tasks could be solved with the help of API TensorFlow. But, the problem was that the learning and recognition process took much more time. It was necessary to prepare hundreds of training examples and spend many hours checking and manually adjusting the results. For the normal functioning of AutoML, it is enough to provide only 10 suitable images.

To successfully convert the coordinates of your labels into a comprehensive AutoML format, we recommend you read this manual. Also, you will need to upload your images to an appropriate Google Cloud Storage location. One of the main differences in the organization of label coordinate data is that AutoML needs to be converted to decimal values in order to work properly. The above example of coordinating a label for the number of selected items at this stage will look as follows:


When all of the data is ready, it is possible to combine it into one CSV file and upload it to the appropriate Google Cloud storage. Now you can proceed directly to training the algorithm to recognize data based on the parameters you need. To do this, you need to create a new dataset.

Search Engine Journal

After that, in the dataset settings, you need to go to the Import tab and select the prepared CSV file with coordinates and shortcuts. On the Images tab, select the corresponding screenshots that you prepared in advance. A visual editor will simplify this task, allowing you to see in real-time how the labels will be located on the images. 

After that, you can go to the Training tab and start the process. It usually takes several hours. You can keep track of the process thanks to the interactive diagram of the recognition results. Now, click on the Test & Apply tab. You can try out the already trained model on any image you want to load. To the right of the selected image, you will see an information panel with all of the recognized objects, as well as an indicator of recognition accuracy.

One of the main advantages of this approach is that if the model processes 5 out of 7 objects well—for example, it skips a discounted price—then you can return to training without losing the obtained results. To do this, simply load additional examples, specifying only those objects that have recognition problems, and let the model process them for a few more hours.

As we mentioned above, the training phase of the algorithm is one of the most important stages of the entire process. Therefore, you need to pay attention to the preparation and marking of your materials. It is necessary to provide high-quality examples to give the model a strong foundation to build further work on. One example of quality learning is that the algorithm is able to detect markups even when there are extraneous elements on the pages, such as pop-ups, tooltips, and menu items.

Downloading Google AutoML results for each specified and recognized parameter consists of several basic elements. They are the name of the recognized label, the accuracy of the match specified as a percentage, and the relative coordinates of the element in the image. It looks as follows:

Predicted label and score: quantity 99.40%; 

Predicted x1, y1: (62, 450); and

Predicted x2, y2: (300, 495). 

Since the coordinates will be the main tool for further work, it is very important that you do not edit or crop the original screenshots. This way, the coordinate system will coincide with the loaded page.

Based on the HTML DOM, you can determine the required element thanks to the obtained coordinates. To do this, you need to pass the coordinates of the element in the form of arguments to the document.elementFromPoint() function. Depending on the desired output values, you need to specify what information you are interested in through the element properties. For example, the el.TagName property will allow you to get the name of the containing tag, and the content will return el.InnerText. If you use these parameters in an automation script such as Puppeteer, the processing result may look like this:

 {‘x’: 181, ‘y’: 472.5, ‘HTMLtagType’: ‘DIV, ‘HTMLtagText’: ‘2’, ‘ObjWidth’: 238, ‘ObjHeight’: 45, ‘ScaleFactor’: 1}

At first, the following is passed: the coordinates by which the object was determined; the type and content of the HTML tag; the size of the object; and the scaling factor, taking into account the device used. However, the script needs to be slightly modified depending on the information source to define the structure of an entire element, such as a list, instead of returning just one item.

From Image to Schema

Another way to automate the process of image processing and recognition is based not on returning already defined content of a web page for subsequent use, but on generating missing content. For example, to auto-generate product descriptions it is not necessary to limit your tool to just this task. It is quite common to find pages with many images that do not have defined ALT or Title attributes. To cope with this task, you will need to turn to MV.

To implement a plan, an executive script from Pythia is required, namely a link already prepared for launch with a demo version based on the BUTD analysis. For its work, Pythia uses a recurrent neural network that provides high efficiency when performing description selection tasks for images. The processing principle at the base of the algorithm is called neural attention. This is the ability of a network to select individual elements of an image and establish a relationship between them. It is thanks to the widespread implementation of neural attention that the transformer architecture has become possible. The revolutionary BERT algorithm is based on transformer architecture. We wrote more about this in the article Google BERT Update: AI Machine Learning in the Semantic Search Service.

To get started with Pythia, create a copy of the notepad in your cloud storage, start all of your processes, and wait until the required components are downloaded. Upload the URL of the test image into a new cell in the following way:

image_text = init_widgets(“ABSOLUTE_IMAGE_ADDRESS”).

After processing, in the displayed widget window, click on the Name this image button and enjoy the results generated due to the neural network. In most cases, the quality of the descriptions will be quite high. By adding some code elements, it will be possible to configure the script to process many images at once.

The processing results do not have to be limited to the model shown here. As practice shows, if you use advertising texts for preliminary training of an algorithm, you can get readable descriptions of goods based solely on their photos. Depending on your capabilities and level of training, the use of neural networks and MV opens up tremendous opportunities not only for automating processes but also for organizing processing results in the form of structured data files or database elements.

About author
George is a freelance digital analyst with a business background and over 10 years of experience in different fields: from SMM to SEO and development. He is the founder of Quirk and a member of the Inspiir team, where he is working closely with stakeholders from many popular brands, helping businesses grow and nurturing meaningful projects. George writes regularly on topics including the technical side of SEO, ranking factors, and domain authority.