
What is Structured Data?
The simplest definition of structured data is information that is convenient to work with. Mainly due to the ability to effectively search for the necessary information. It is facilitated by clearly defined categories, types, and the hierarchy of data, as well as the ability to highlight trends and tendencies. Unstructured data is essentially everything else, such as information without a well-defined structure, attributes, or classification. To better understand what SEO structured data is and what role structure plays in the site search engine optimization, we will explain in detail what structured data and unstructured data are with examples.
History
The objects in the earliest studies in the field of business analysis were methods for extracting and classifying data from unstructured text, rather than numerical data. Although the first attempts began in 1958, it was only at the beginning of the century that technology that could move research forward was available. One of the most significant breakthroughs was the creation of the Text Miner algorithm by SAS. Using the singular decomposition method, which we mentioned in Latent Semantic Indexing and Keywords, Text Miner allowed you to split a single text space into elements, greatly increasing the efficiency of machine analysis.
In turn, the development of machine text analysis has given rise to research in the interests of commercial organizations. Their results were implemented everywhere, from the analysis of Voice of the customer mining to the automation of call centers. Although analyzing unstructured data is a more complex task, the development of big data in the early 2000s brought back an interest in it, especially in predictive and causal analysis.
Basic Parameters: the Processing, Storage, and Database
Most sites handle a large number of requests, from user information to working with content. It is necessary to organize it to ensure reliable storage and fast access to data. This task is performed by particular systems called databases. The most common way to develop web applications is to use:
- SQL – it is a structured query language developed by IBM in the 1970s. This means that data is stored per the specified model, preserving relational links among themselves, but they will still be available in their original form. The most commonly used are MySQL, its offshoot MariaDB, PostgreSQL, or CouchDB.
- NoSQL is a newer and more popular format. The main difference is the speed due to a lack of relational connections. Examples include MongoDB and Redis.
The growing amount of data and the need to organize it is one of the main reasons why structured data is so important. The structure or model is important primarily for information to be used effectively. Thus, data parameters can be divided into two categories — those associated with storage, or those with processing. Let’s look at simple structured data examples to get a better understanding of this. In transactional processing, information is often structured so that the desired values are in the same row or sequence. In other words, when displaying order information, the application requests the corresponding row in the database table and finds the desired column there. At the same time, there is another approach: analytical. It includes tasks where structured information needs to be used selectively. As one of the options for its application, it is possible to offer not only places available on the given date, but also variants with partially similar parameters, the nearest dates, transfers, which are often used when buying tickets.
Information processing is just as important as its storage. Most often, when developing websites, there are two main ways to request data: REST, GraphQL that was developed by Facebook, or Falcor, which was created by Netflix. The main advantage is the increased autonomy for front-end developers who, thanks to these requests, can work with information on the client-side without changing the backend.
Therefore, based on the expected tasks and results, you need to set three main parameters for the future database with your information. Volume is a choice between optimal performance and file storage reliability. The speed or rate of exchange determines how quickly information can be written and how fast it needs to be processed. For example, for applications where data is manually entered by people, this indicator is low, and when information is exchanged between two devices that support the Internet of Things, it is very high. And finally, the flexibility of the organizational model. Automated reports have low flexibility because they are created at a specified time and always follow a given structure. At the same time, notes about a meeting can be either vocal or in the form of a video recording, a photo of a sheet of notes, or a text document; that is, they have high flexibility of structure also known as variability.
Databases are one of the main technologies working with an array of information, without which it is difficult to imagine a modern web application. It determines the relationship of the database to search engine optimization, as an integral part of almost every website. Understanding, correct implementation, and the correct handling of information directly affect the speed and reliability of a website, which in turn affects the efficiency of ranking.
Knowing the main tasks, methods of application and processing, as well as the storage and organization requirements dictated by them, the data itself can be divided into:
- Structured — almost any information that is organized and stored in a non-proprietary system. Most often, this is text information organized in an ACID-compatible database.
- Partially-Structured — this is any information organized according to certain rules. The system is allowed to be proprietary and, in this case, following ACID is not required.
- Unstructured, well-defined — have some level of primary organization and can follow an arbitrarily defined standard.
- Unstructured — is proprietary binary data.
In practice, information rarely fits within the limitations of a particular definition. For example, is the hypertext markup language for documents (HTML) structured or not? In theory, it is related to the extensible markup language (XML), but HTML:
- does not follow the XML syntax fully;
- makes mistakes;
- is not case sensitive.
That is, a plain text file could very well become a working HTML document, even if there are errors. At the same time, even a single syntax error in XML will cause the document to malfunction. Thus, the question of whether information belongs to a particular type is not a question of theoretical compliance with a definition, but of the structure and specific tasks that must be performed.
The same goes for storage. For example, if a small video file is placed in a relational database, then, on one hand, the file itself is a set of binary data that meets one of the existing standards, but on the other hand, it is a point of information inside a structured system. In other words, information about the contents of the file and the file database can represent two completely different systems: structured and unstructured. Let’s take one image and compress it.
On the one hand, we did not interfere with the image; it is the same photo with the same subject, size, and angle. But in terms of data, these are three different photos that have almost nothing in common.
Structured Data
It is the most traditional method of organization that has been supported since the first versions of Database Management Systems (DBMS). Information can be entered and generated by both people and automation tools in the form of algorithms. The latter follows a pre-defined model and allows analysis and comparison with exhaustive results. To give a clearer definition of structured data, when working with structured information, you do not use the categories “like” or “unlike” but the percentage of similarity for a given number of defined parameters. For instance, enter two similar addresses in the cell.
If we have the task of comparing two addresses, then we cannot do this. However, we can say that the length of the address in cell A4 is 86.67% longer than the address in cell A3. But this does not answer the question of whether the addresses are similar. To answer it, we need to decompose the address into smaller values: the street, house number, city, and so on.
Thus, we see that the city, street, and house number coincide. But there are significant differences in the meaning of the state and country. A person will immediately understand that the addresses do correspond to each other. But, to reflect this similarity in the information management system, you will need to describe the possibility that the state and country might have abbreviated and full spellings, as well as set their matching and comparison parameters. And only after that, will comparison become possible. It is the data model.
The data model is vital because it determines how information should be stored, processed, and evaluated. In the presence of a well-defined model that allows you to store information in separate files without fear of overwriting or mutual interference. At the same time, this data can be compared and analyzed inside the database.
Most often, structured data uses a table format with mandatory relationships between rows and columns in the form of special keys that indicate the level of interaction. The simplest examples are Google Sheets or SQL databases.
Unstructured Data
The traditional approach defines them as binary, but in practice, a broader definition would be closer. Any data that does not meet the strictly defined and fixed mathematical standard for storing and processing can be broadly classified as unstructured. Also, it is more accurate to say that there are different degrees of information structure, and they often overlap with each other.
However, unstructured data can be entered and stored in structured or strict modules. For example, data in NoSQL or XML can be considered partially structured. XML has clear rules for accessing, receiving, and processing requests, but the information itself and its structure may not follow a specific model or standard. To illustrate, let’s again look at the cells containing address data.
The cell on the left contains unstructured data saved as an address. However, the same data in the same cell can also be partially structured. It will happen if upon further filling their structure will not change. In other words, the house number will be specified first, then the street name, and so on. But the information in this format is difficult to process, filter, and compare. The cells on the right are a structured representation of almost the same data.
Unstructured data often contains an array of texts, which in turn can include information organized in a specific way, such as numbers, dates, coordinates, or, for example, some facts. Thus, despite the usefulness of the content, it is more complicated to process it if you need to access a specific segment of information.
Instead of tables and standard databases, data lakes, as well as NoSQL databases and applications, are more suitable for storing unstructured data. But if previously the nature of unstructured data was a hindrance, with the development of artificial intelligence and machine learning, their value has increased dramatically. More and more platforms for storing and managing unstructured information are becoming available. For example, MongoDB is great for documents. Along with this, the number of options for processing unstructured data in informational systems is growing, which is especially valuable for business analysis and research.
Examples include audio, video files, NoSQL databases, photos, texts, social media content, satellite images, presentations, answers to open questions in questionnaires and call transcripts or video subtitles.
Partially Structured Data
The third category is a manifestation of practical uncertainty, which we discussed above. Partially structured data can combine the properties of both structured and unstructured data at the same time. For example, they do not follow a specific model associated with relational databases, forms, or tables. However, such information contains tags or other identifying elements, often organized as metadata, that allow semantic units to be separated. There is more information on this topic in the article: Meta Tags for SEO. It is also important because, in some ways, the hierarchy is still supported, but there is no exhaustive uniformity in the data. This form of organization is also called a self-described structure.
The main reason that this category exists is the simple fact that they are much easier to process than unstructured ones. A fairly simple example would be buying an Apple Watch on eBay and apple.com.
In the first case, the product type is clearly defined, there is an exhaustive and understandable filter system, the presentation, format, and order of information are unified. In the second case, there are more than 1,159 options from different sellers with different product descriptions in varying forms. And even having a filter system does not make it easier to search, since not all products and sellers follow this data model.
Quite often, what ordinary people refer to as unstructured data, guided by semantics rather than logic, is partially structured, since it contains classifiable characteristics. A good example of a partially structured database is the picture archiving and communication system that is often used in healthcare. The content of the database includes x-ray images, the graphic results of tomography, and other types of examination. It is essentially unstructured data. However, due to the tag system, metadata, and flexible information model, most often organized based on SQL or Oracle, hierarchy and fast processing are supported. At the same time, the ratio of the volume of the structured part, the database itself, and the unstructured part of images and other files may differ in favor of the latter.
Meta-Data
It is also known as the information about the data. From a technical perspective, it is not customary to separate metadata into an individual structure, but because of their significance for the analysis as a whole, it would be wrong not to mention them. The main purpose of this information is to provide context and additional information about their data.
For example, it is metadata that allows you to find out where a particular photo was taken on your phone- from the latitude and longitude to the exact local time and aperture value. Metadata is also responsible for information about artists, albums, and song titles in the smartphone. If you can open song lyrics or arrange albums by the composer, this is also metadata.
Since metadata follows a pre-defined model, providing a specific set of fields to fill in, it is commonly referred to as structured data elements. One of the main areas of application, in addition to the classification of media files, is a primary selection when working with big data.
Comparisons
Structured Data in Search Engine Optimization
Microformats SEO become the key to the success of a modern website focused on high efficiency in search ranking. Microformats are data types based on open-source principles and existing languages or solutions. Microformat tags can be placed inside documents that support hypertext markup standards. It allows search bots to better structure and process the information presented on pages faster, which in turn allows them to implement features such as snippets and extended search results.
Among the main advantages of using structured data in search engine optimization are:
- High ranking efficiency.
- Good CTR indicators.
- Conversion and credibility increase.
Schema Markup
A schema is a general dictionary that helps you to structure and organize the data hosted on websites. The dictionary identifies more than 829 data types that support more than 1,300 properties organized in a hierarchy.
The project was launched on June 2, 2011 thanks to the joint efforts of the largest search services: Google, Yahoo, and Bing. Yandex joined the initiative in November of the same year.
It should also be noted that schema is not the first Google project of this kind. Its predecessor, data-vocabulary.org, was the first in the field of wide popularization of structured data. The project was launched in 2009 and was supposed to end support in April 2020. But due to the coronavirus pandemic, on April 6th, Google announced that it was temporarily suspending the decision to disable support for the project until a reevaluation in June.
The schema dictionary supports multiple languages to provide maximum freedom and variety for implementing data markup on a wide array of sites. Google supports schema markup implemented in the following languages: JSON-LD, Microdata, and RDFa.
JSON-LD is the recommended option for most implementation cases. The code is embedded in the <script> tag inside the head or body element of the HTML page. It also supports dynamically-added markup, such as using JavaScript code or built-in widgets in the content management system. Microdata is a part of the hypertext markup language specification based on adding additional attributes to existing tags. It can also be implemented in both the body and head tags. RDFa is an HTML5 extension that applies to most supported document formats and also works based on hypertext markup tag attributes.
Updates
In 2020 alone, the schema markup received seven significant updates and changed three versions. The current eighth version was introduced on May 1st and brought several important innovations that are still being implemented. The previous update affected a change in the attributes of educational organizations. Initially, the question arose in a tweet sent to Danny Sullivan.
Although John Mueller did not directly respond to this message, Lizzi Harvey immediately addressed the project team.
As a result, the discussion was moved to the corresponding GitHub branch and was implemented in the April update.
Partners and Support
Many of the schema project partners have developed and published detailed guidelines for implementing structured data. Such documents can be found in the Bing search service, one of the most popular search services in Eastern Europe – Yandex, as well as in Yahoo articles. Also, the social network Twitter introduced support for structured data as a way to improve the display of quoted records, which is discussed in detail in the developer’s guide.
Also, it should be noted the social effect that the introduction of structured data can bring. For example, on March 16, 2020, the official blog of the schema project published a message about the introduction of special types of SpecialAnnouncement and CovidTestingFacility, in connection with the pandemic.
Advanced Search Results
As we mentioned earlier in the article What is the SERP in 2020?, there are three main features of Google’s SERP:
- Rich Snippets.
- Universal Results.
- Knowledge Panels.
Using structured page markup following the current schema dictionary in one of the three supported languages will allow your content to be displayed in the desired categories.
For example, in the form of reviews and ratings of a store or cafe.
Additional short facts from the biographies of celebrities or employees.
Or the ability to purchase a ticket to an event of interest.
Mobile Search
Similar to advanced search results, there are several options for how implementing structured data can help improve the performance of the mobile version of your website. More information is in the article on this topic: Mobile Search. The first and leading argument will be support for accelerated mobile pages or AMP. Read more on the subject in the article: What You Need to Know About Accelerated Mobile Pages.
The main areas of application:
- eCommerce;
- news and articles;
- the pages of brands and performers.
One of the most striking examples of AMP implementation will be the participation of news in a selection of popular stories or a news carousel search for a given topic.
Among other advantages will be the implementation of rich snippets in mobile search, as well as extended cards that are currently available for movies, recipes, courses, and restaurants. For more information, see the article What are Rich Snippets? Also, structured data will allow your products to be displayed in search results, even if the user did not enter the brand name or any specific characteristics. Google launched a style assistant and related recommendations back in 2017. Interestingly, these features also extend to image search.
In any case, do not forget that the mobile version of the page now plays a huge role in ranking search results, since most queries occur on smartphones and mobile devices.
Voice Search
According to a recent Microsoft study, more than 69% of users use voice assistants on their devices. By the end of 2020, it is projected that two-thirds of households will have at least one device with a voice assistant function. A third of users actively use voice functions while driving, or in entertainment or other home devices. More than 30% of users own at least one smart speaker, and more than half of these devices actively receive voice commands.
With the introduction of the speakable type in schema markup, the role of voice search and voice-controlled smart devices has changed significantly. Now structured data helps you find news with the ability to voice a fragment or excerpt. When the corresponding search query is made, the Google Home smart device will offer three options for suitable articles and read out the selected one. The link and full text will also be sent to the user’s nearest device.
Also, almost half of the responses to voice search queries are based on snippets. Read more in the article: How to Optimize Featured Snippets.
Implementation and Verification
As you know, the use of structured data does not directly affect the results of indexing or ranking. The main role is still played by the quality and usefulness of the content that you publish. According to Google’s recommendations, compliance with the content quality level stated in the guidelines for webmasters is the priority for success in implementing micro-markup. Learn how to analyze content from the article: Website Content Audit.
Other requirements include relevance, meaning that structured data matches the content on the page itself, as well as the completeness of content and placement. The content must match the selected schema markup type as closely as possible, which is why a separate guide has been compiled. To learn more about the process of implementing structured data markup on the site, we have compiled a special presentation with practical examples from Google.
To help you generate code for the required data type, there is a Structured Data Markup Helper tool, as well as equally popular analogs such as JSON-LD Schema Markup Generator – JD Flynn and JSON-LD Playground. If you use the WordPress platform, pay attention to plugins that support microdata, namely: Yoast SEO, WP SEO Structured Data Schema, and Schema App. The Shopify eCommerce platform also offers several available plugins for managing structured data, such as SEO Manager and Smart SEO. You can check whether the implementation is correct using special tools called the structured data testing tool. We recommend using the Google Structured data testing tool, but a similar tool is also available from Bing and Yandex.