Latent Semantic Indexing and Keywords

Latent Semantic Indexing and Keywords

What Are LSI and LSA?

Latent Semantic Indexing (LSI) is a method to mathematically analyze information, developed in the 1980s. Its main task was the fast and high-quality processing of information contained in static documents. It also needed to scan unstructured data contained, for example, in a set of documents. Semantic units were combined into a matrix and then processed the results using the mathematical method of singular value decomposition. This approach allowed us to process data faster and helped to better determine the relationship between the concepts contained in them.

Latent Semantic Analysis ​​(LSA) was developed a little later, on the basis of LSI. The main task addressed by this type of analysis was the processing of natural languages, especially in terms of semantic distribution. This method has also been used to study various cognitive models of human lexical perception. The analysis was carried out on the basis of processing arrays of documents in order to determine the concepts contained in them through general concepts. Unlike the LSI which analyzes the semantic units of the document as a whole, the LSA focuses on the meaning of the term in each section or article and hopes to find a more accurate relationship.

The use of latent semantic relationships formed the basis of the information processing method which was patented in 1988 by Bell Communications Research, Inc. One of the co-authors was Susan Dumais, who was also known for her significant contribution to the development and optimization of Microsoft search algorithms.


Latent semantic indexing is not the only way to find groups of terms related to keywords of interest due to semantics (LSI keywords). Analysis of the term frequency, when compared with the inverse document frequency (TF-IDF), is a method of statistical estimation of the importance of a term in the context. To assess the importance or weight of a word, first determine the frequency of use of this word in the document, which is proportional to its significance. In the next step, the frequency of use in one document is compared with the frequency of use in all documents of the sample. Thus, terms important for one document are separated from those that are less significant.

What Is LSI in SEO

Among professional communities, one can often infer that search engines use latent semantic indexing in one form or another. Its main task is to find hidden or implied connections between the meanings of words and to improve the process of information processing – indexing. Simply put, its role is to help find a connection between terms and content when there are no common keywords or synonyms that clearly point to it.

Search engine ranking is a complex and multifactorial process, the main task of which is to qualitatively compare links with user requests. Accordingly, the link ranking algorithm is not the only important factor but also the criteria by which this ranking takes place. Keywords are no longer used as the main sources of information about links for search engines because such information was easy to manipulate. Now, the algorithms independently determine the topic and types of queries that this or that page will suit according to its content and context. Key words have only an auxiliary role. 

Nevertheless, it is worth taking into account the principles of latent semantics when developing a search engine promotion strategy. This will significantly improve the basic sets of keywords and maximize the reach of the audience with minimal risk to affect related queries.

Short History

The need to consider the context and structure of a natural language has formed out of abuse, like many other things in SEO. Earlier search algorithms focused on calculating the density of mentions of words on the pages, which led to the practice of keyword stuffing. Google defines this type of violation as adding a list or block of keywords to the page, outside the context of its contents, in order to manipulate the site’s ranking in the search results. This also contradicts the requirements for the quality of the content, since excessive filling of the text with keywords often sounds unnatural and the result is practically useless for real users. You’ll learn more about algorithms and sanctions in the Google Algorithms That Affect SEO article.

Semantic indexing technology was developed in the late 1980s to help systematize research results. This approach to the analysis of information is based on the structure of meanings in the language. A hierarchy is built from the structure, including both terms and concepts. The main objective of the LSI was to expand and improve the search results beyond literal and exact matches.

In practice, latent semantic keywords are often confused with synonyms. This is a misconception, since the task of LSI is to look for connections in the meaning of words in the absence of a direct synonymic series. For example, when searching for the optimal window size, it can be understood as architecture and construction, as well as the development of applications or sites. Similarly, the word “tool” can refer to both a screwdriver and an application. The simplest example of LSI will be a grouping of words that are often found together, such as “sport” and “fitness”. In search engine optimization, the principles of LSI can be applied in the formation of the semantic core. When you need to determine not only well-known and previously used keywords but also their analogues with a similar meaning. Excellent results are often achieved when you pay close attention to the analysis of competitors. Detailed instructions on how to conduct such an analysis can be found in the article SEO Competitive Analysis.

Google, LSI and Polysemy


It is evidnt that search services like Google will actively use LSI technology. However, on July 30, 2019, John Muller tweeted that this is not entirely true. The messaging concerned the use of keywords matched to LSI as a ranking factor. In correspondence, John Muller denied the existence of such keywords. The use of LSI technology was out of the question.

There are two main factors that indirectly confirm Mueller’s words. Latent-semantic indexing was developed before the creation of the World Wide Web and was not adapted for hypertext markup or any dynamic documents. The LSI is based on the analysis of static documents and the technology requires it to be carried out again when any significant changes are made. The second feature is the thought that LSI was conceived for the analysis of small volumes of documents. Considering the huge amount of information processed by Google algorithms, scaling problems would most likely prevent the implementation of LSI.

Also, LSI is a patented technology. The original patent was issued by Bell Communications Research in 1988, and the duration of a patent in the United States is usually 20 years. Until 2008 it was impossible to apply this technology without the consent of the patent holder. In 2008, Google’s market share was already estimated at 57% to 72%. The search algorithms that existed at that time coped so well with the tasks of interpreting user requests that the likelihood of introducing LSI technology after 2008 seems even less than before.

Nevertheless, the interest and use of hidden semantics do not lose relevance for processing arrays of unstructured data. For example, in 2016,  the results of a Google study on improving thematic clustering on search queries using the frequency of mentions and other factors was publsihed.

Common Misconceptions about LSI

When working on an article or other material, most of the efforts are concentrated around the main keywords and expressions, often called longtail keywords. Related article – Long Tail Keywords. However, a poorly developed context leads to the fact that the material looks unnatural and the information content suffers. Therefore, it is important to use not only direct derivatives of the main keys but also a broad synonymous series or adjacent constructions.

Long-tailed keywords are often confused with latent semantic ones, therefore we have compiled a small comparative table for your convenience.

We can see the difference in this example. When we enter the search “wine” in Google Trends, we are once again faced with a polysemy, as we are offered to choose what we mean: a drink, a program, or color. Related article – How to use Google Trends for SEO.

To find out about the popularity of this color, enter the appropriate query in Google.

Let’s create a long tail query to narrow and refine the results.

The difference is huge! However, we can try to clarify the request a little more.

Now, to choose a latent-semantic analogue, we will use the same search. Set the exact phrase – wine color.

Let’s review the list of proposed analogues and choose the appropriate color, for example, burgundy. Add it to the final version of the previous search using search operators. More information about them can be found in the Google Search Operators article.

As you can see, the number of results has doubled. Is it now possible to refine the search according to our ultimate goal?

Thus, both latent semantic and long-tailed queries can be used in both directions: to expand the possible number of matches or to refine the results. But each tool has its own task and it copes well with it.

Another topic for controversy is Google’s content quality factors. For example, there was a fairly common opinion that updating the Panda algorithm introduced latent semantic elements as one of the key criteria for text quality. However, there was no official confirmation of this information.

The Benefits of Latent Semantic Analysis when Selecting Keywords

Using LSI to select additional keywords can bring a number of unexpected benefits. This is a good way to not only improve organic ranking but to also increase the uniqueness of texts. When the context is not only rich in synonyms but also in semantically similar words, it allows for enhancing the positive effect for both search engine optimization and user experience.

A more accurate and diverse context will refine and improve the ranking, reducing the likelihood of accidentally falling into adjacent samples. Therefore, your site will be displayed to a more interested audience, which will improvethe time, depth of sessions, and the bounce rate.

Also, users will respond to better and more informative content. We can admit that no one likes pages full of synonyms and repeating phrases, especially when you are looking for something specific. Following the principles of LSI, you can provide a more natural and useful text without sacrificing its effectiveness for ranking.


How to Find Keywords Using LSI

  • Step 1. Begin with a starting phrase.
  • Step 2. Find expressions with similar meanings.
  • Step 3. Reasonably expand the range of key phrases from the list of semantic analogues found when compiling a search engine promotion strategy.
  • Step 4. Use semantically comparable constructions in the content, for example, answering similar or frequently asked questions in your articles.

Tip: Do not focus on synonyms only.

Sources of Keywords for LSI

  • Related Google searches: Enter the phrase you are interested in in the search bar and see which of the suggested queries may suit you.

  • Google AutoComplete: Start typing in your original keywords or semantically similar terms you’ve already chosen, and see what Google has to offer.

  • Google Trends: Try entering one or more searches, check the geography, and see which of the related topics or searches may come in handy.

  • Google Keyword Planner: This is also a great tool to find some inspiration. Enter a keyword or phrase of interest, click on the Get Results button, and you will have a number of tips.

  • Specialized services, such as LSIGraph: Many of them offer a free trial. Just enter the original keyword or phrase and follow the instructions.


  • SEMRush: Go to the keywords tab and select the appropriate tool.


Enter the original keyword and wait while the program selects the results. The Similar tab contains recommendations found for the keyword phrase or for its individual elements.

Look for more useful services for SEO-specialists in the article Top SEO Tools for Businesses in 2020.

About author
George is a freelance digital analyst with a business background and over 10 years of experience in different fields: from SMM to SEO and development. He is the founder of Quirk and a member of the Inspiir team, where he is working closely with stakeholders from many popular brands, helping businesses grow and nurturing meaningful projects. George writes regularly on topics including the technical side of SEO, ranking factors, and domain authority.