russianenglish
 


How search query niche determines the behavior of Google SERP

Yury Bondyrev
Seoquake.com, Seodigger.com, Serparchive.org
Tel: +7 (812) 923-35-77
seoquake@gmail.com

Analyzing Google SERP with keywords from different niches.
Here the SERP behavior is studied, determined by the frequency of queries and their niche.

Contents

1. Introduction
2. How Google algorithms evolved
3. Definitions and concepts
4. Target setting and input data
5. Software and services used to process the data
6. Edge effects
7. Input data obtained during the experiment
8. Site stats for every group
     8.1 Distribution of doorways in different niches
     8.2 Doorway quantity in each niche
9. Defining biggest players
     9.1 Analyzing biggest players among "white hat" sites
10. Conclusions
11. Final words

1. Introduction

Traffic coming from search engines is traditionally considered to have the highest value since search engines have emerged on the web. Thousands of webmasters work hard every day to make sure their sites are better optimized for better search engine rankings. It's no secret that the amount of work and money needed for desired results largely depend on the niche of the search queries a site is optimized for, and their search volume. Moreover, sites in different niches require different promotion techniques. It is natural that the competition between market players drives the optimization technology forward.

New mechanisms of converting traffic into money have appeared, and with that the borderline between commercial and non-commercial keywords became less and less visible. Still, we can say that even now there are niches with heavy competition, while in some other niches search engine traffic is less valuable.

Additionally, search engines never stop developing. Ranking algorithms are constantly improving, server capacities are increased, SERP (search engine result page) moderation has been introduced, and so on. As a result, optimization techniques which brought guaranteed results yesterday may be not at all efficient today.

Still, as more and more people start using the web and the amount of information offered by all kinds of sites keeps growing, it is difficult to say when the search engines' processing power will be great enough to deliver relevant results for all the keyword niches and topics. Currently, when you study search engine rankings, you can see legitimate, content-packed sites, as well as doorways, a form of search engine spam. The proportion between these is different from niche to niche. We won't touch upon the ethics and economics of search engine spam. This topic deserves a discussion of its own.

This report focuses on analyzing Google's SERP to spot the variations observed in different niches and with keywords of different search volume.

2. How Google algorithms evolved

Google, the worldwide famous search engine, quickly became popular due to the quality of its search results, backed up by the revolutionary Page Rank technology. Soon Google started getting more visitors than any other search engine and became a powerful source of targeted traffic for websites. It is natural that many webmasters started specializing in optimizing sites for Google.

Black hat SEO (search engine optimization), or spamdexing, became widely popular. The industry of spamdexing gained even more popularity when a multitude of affiliate programs and pay-per-click systems emerged.

Spamdexing, or illicit optimization and promotion methods, is among the most critical problems for any search engine. Google is known to have taken drastic measures against it. Many webmasters remember the widely discussed technologies like Florida, Hilltop, Trust Rank, and the like. In general, as new algorithms were introduced, less and less spamming sites could be seen in SERPs. Some time passed, and the "black hat" webmasters upgraded their techniques, and doorways started showing up in rankings again.

Giving Google some justice, it has to be noted that with all these newly introduced algorithms, Google became a much better search engine, while spamdexing grew less and less profitable. It is in the highly commercialized and competitive niches that filter-overcoming technologies are implemented with the greatest speed. Consequently, Google pays more attention to these niches.

Having spent some time studying these niches, one can notice that Google SERPs behave differently from niche to niche.

3. Definitions and concepts

Before we start our analysis, certain notions need to be defined to avoid misinterpretation of the results. In fact, this is not an easy thing to do, because there is no agreement within the online community concerning the criteria differentiating between legitimate sites with content and spamdexing pages.

Let us start with search engine spam, or doorways. There have been a lot of definitions of this, here is one of them:

Doorways are a technology often used in spamdexing. A doorway basically is a site page optimized for one or several keyphrases aimed to get ranked high in the SERPs. An automatically generated doorway contains random text which includes necessary keyphrases. Thus, a doorway is useless to a surfer. A manually made doorway may contain narrow-niche information valuable for internet users.

Yet, these definitions do not provide us with exact criteria to differentiate doorways from other sites. Alas, there are no simple, unambiguous definitions today. What is more, as technologies develop, artificially made sites look more and more like regular, legitimate, web pages filled with valuable content. Sometimes only an online marketing professional can tell a quality doorway from a content-based site.

Defining quality content-based sites is even more difficult. Sometimes, a plain HTML page with pure text has more SE value than a site produced by major developers and promoters.

With all this taken into account, we may conclude that if somebody aims to detect doorways in SERPs, he or she will use subjective guidelines rather than objective characteristics.

However, such evaluations are no good for SE engines. Search engines separate spamdexing pages from content-based sites using a wide range of parameters. The exact combination of these parameters and their relative importance are kept in secret. Moreover, spam detecting techniques never stop evolving, and new evaluation methods are introduced all the time.

Taking all the above mentioned facts into account, we find it appropriate to distance ourselves from popular definitions of "white hat" and "black hat" sites. For our analysis it will be much more convenient to use the Google SERPs and introduce new definitions with certain approximations.

We will refer to sites which exist in Google's SERP for a long period of time comparing to the duration of the experiment (from 18 days) as "white hat" sites.

On the other hand, sites which lived in the SERPS for less than a week will be referred to as spamdexing, or doorways.

You should see these definitions probabilistic. It is evident that you are far less likely to find doorways in the first group compared to the second one.

Additionally, you have to understand that these definitions do not describe legitimate sites and spamdexing with 100% accuracy. Sometimes, when ranking algorithms change, quality sites disappear from the SERP. For example, there are news sites which publish articles on the topics and niches we addressed here. After the page contents changes, time passes and these pages disappear from the SERPs for these keyphrases. Also, sometimes you can find elaborately made doorways among "white hat" sites which have been in the SERP for quite a while.

Anyway, we can openly say that these exclusions do not influence the entire picture. Moreover, you have to take into account the fact that you cannot address a site individually when processing SERPs containing hundreds of thousands of links.

4. Target setting and input data

To study the way Google behaves in different keyword spheres, we took 6 niches.

For every niche, we built databases of one-, two-, and three-word queries from wordtracker.com.
Each database contained 30,000 queries.
The total database analyzed contained 180 thousand queries.
First 20 SERP positions for every query are saved and analyzed every day.
The experiment started on July 12.
The experiment ended on August 19.

Experiment objective:

5. Software and services used to process the data

These services were used to analyze the SERPs obtained during the experiment: Seodigger.com, Serparchive.org and Seoquake.com.

Seodigger.com is a tool which shows for which keywords and phrases a site is ranked high by Google.
Concept: The service saves Google's first 20 results for 44 million popular keywords. After that, a database of correspondences is built:

Serparchive.org is a tool which saves the first 100 SERP results for given keywords in a number of search engines on a daily basis. It helps to monitor how site rankings change with time.

Seoquake.com is an extension for the FireFox browser. It quickly shows a site's parameters using the SERPs of leading search engines and any other pages (documents).

6. Edge effects

To make sure our analysis is as accurate as possible, edge effects have to be considered.

7. Input data obtained during the experiment

The experiment lasted for 26 days. During this period we used Serparchive.org to save the SERPs for every keyphrase on a daily basis. Then, Seodigger.com calculated the positions of a page for these queries.

All the materials and analysis attempts made below are nothing but statistical processing of the data obtained.

8. Site stats for every group

Using previously defined notions of "white hat" and spamdexing sites, we will evaluate the quantity of sites of both types for every group of keyphrases we selected.

To do this, we need to calculate the quantity of page links which were in the SERPs for 1, 2, 3, and up to 36 days. To display this information in a more understandable way, let's split the entire experiment duration into 6 equally long periods. For our research, periods 1 and 6 are the most interesting. The first one will contain spamdexing sites, according to our definition, while the last one will comprise "white hat" sites.

Group 1 - 6 7 - 12 13 - 18 19 - 24 25 - 30 31 - 36
Adults 1 838** 241 159 127 152 402
Adults 2 503250 58910 29515 22410 24176 84909
Adults 3 1947145 156474 13963 14883 25320 116502
Cars 1 5616 1810 939 651 699 2896
Cars 2 281164 70326 37187 27751 29255 154664
Cars 3 647145 145474 73969 53883 57320 286902
Casino 1 9955 3311 1834 1293 1450 5067
Casino 2 196810 52148 28561 22326 23272 117990
Casino 3 538990 132745 74220 55579 57562 295333
Dating 1 1666 463 264 150 173 616
Dating 2 88139 21638 11826 8802 9368 43310
Dating 3 721208 128039 64727 46641 47790 200827
Gifts 1 573 203 113 71 87 426
Gifts 2 49843 12672 6821 5499 5875 35720
Gifts 3 635098 133249 69386 52307 54656 265006
Pills 1 1056 185 90 100 80 505
Pills 2 234692 23225 10795 8049 7414 35713
Pills 3 303830 29824 13618 10018 9660 43894
  <<<< Doorways White hat sites >>>>

Table 1. Site statistics per their Google SERP life

* - number near the group name means one-, two-, and three-word queries respectively.
** - this number shows the total quantity of unique pages which were in Google's SERP for the specified amount of days.

Table 1 shows how sites are distributed according to their SERP life during the experiment. Yet, we cannot yet compare this data, because groups of one-, two-, and three-word queries contain different quantities of keyphrases, which means there will be different quantities of sites in every group during the experiment.

To compare these accurately, we have to normalize the data obtained by keyword quantity in every group. Let's normalize all the results for 1,000 keyphrases.

Group 1 - 6 7 - 12 13 - 18 19 - 24 25 - 30 31 - 36
Adults 1 20950* 6025 3975 3175 3800 10050
Adults 2 58456 6843 3428 2603 2808 9863
Adults 3 85298 6520 582 537 847 4438
Pills 1 27077 4744 2308 2564 2051 12949
Pills 2 67421 6672 3101 2312 2130 10259
Pills 3 69288 6801 3106 2285 2203 10010
Dating 1 27767 7717 4400 2500 2883 10267
Dating 2 24161 5931 3242 2413 2568 11872
Dating 3 38864 6900 3488 2513 2575 10822
Cars 1 21683 6988 3625 2514 2699 11181
Cars 2 22567 5645 2985 2227 2348 12414
Cars 3 26964 6061 3082 2245 2388 11954
Gifts 1 16853 5971 3324 2088 2559 12529
Gifts 2 18731 4762 2563 2067 2208 13424
Gifts 3 28525 5985 3116 2349 2455 11902
Casino 1 20071 6675 3698 2607 2923 10216
Casino 2 20600 5458 2989 2337 2436 12350
Casino 3 22458 5531 3093 2316 2398 12306
  <<<< Doorways White hat sites >>>>

Table 2. Normalized statistics for sites per their life in SERPs (obtained by normalizing Table 1)

- The table contains doorway quantities in the niches sorted in descending order.

Thus, every number in the table corresponds to a relative number of sites normalized per 1,000 keyphrases. So we can say, for instance, that for group Adult 1 the number of doorways per 1,000 keyphrases was 20,950 for the entire experiment duration (left column).

Now let us try to look into the data collected.

8.1 Distribution of doorways in different niches

After we normalized the data, we are now fully entitled to accurately compare the amounts of spamdexing pages and 'white hat' sites in different niches.

Let us look into the relative distribution of doorways per 1,000 keyphrases among one-, two-, and three-word queries in different niches.

Request quantity Adult Pills Dating Cars Gifts Casino
1 20950 27077 27767 21683 16853 20071
2 58456 67421 24161 22567 18731 20600
3 85298 69288 38864 26964 28525 22458

Table 3. Quantity of doorways in different niches among one-, two-, and three-word requests

* - the data in the table were obtained taking normalization into account.

Table 3 shows that there are niches where there are a lot less doorways in one-word queries, than in two-word and three-word ones (Adult and Pills); and there are niches where the amount of doorways stays the same (Gifts, Cars).

This effect can be explained in a number of ways:

  1. Traditionally, Adult and Pills are the niches where spamdexing is massively used. Despite all the efforts of search engines, there are still plenty of doorways. It is obvious that the less search volume a keyword has, the bigger the chance to find doorways in SERPs for this keyword. This is why we see this exact picture.
  2. The Casino niche is also highly commercialized and competitive. So, there are plenty of webmasters wanting to obtain SE traffic in this niche. Though, as we see, the doorway distribution in this niche is different from that in Adult and Pills. Perhaps this is the result of high competition. Niche sites fight for every single niche query. Site owners monitor the activities of their competition and send abuse complaints once spamdexing is detected.
  3. It is possible that the sites in the SERPs for the specified groups are rotated with different periodicity.

To get the entire picture, it will be useful for us to compare the doorway distribution in subgroups. See the chart.

quantity of doorways in groups

Figure 1. Quantity of doorways for one-, two-, and three-word search queries

We should point out that the difference between niches lies not only in absolute doorway quantity values (Y axis), but the angle of the envelope, too. If we build an envelope (like we made for Adult and Pills on the chart) for all the niches, we will see that the angles are different. This angle indirectly shows the niche competition.

If we want to describe these values in "white hat" sites, we will see a reverse picture. Obviously, there will be more legitimate sites in groups where there are less doorways.

8.2 Doorway quantity in each niche

Calculating the total quantity of doorways in every niche is relatively easy. Let us build a table with all the niches.

Niche Total doorways in the niche per 1,000 requests
Adult 164704
Pills 163786
Dating 90792
Cars 71214
Gifts 64109
Casino 63129

Table 4. Total doorways in every niche.

* - The total doorway quantity was calculated by adding the total of doorway quantities for subgroups from one, two, and three words

This table offers enough information to make indirect conclusions concerning the competition in these niches between spamdexing sites. We can also assume how "easily" sites can get to SERPs. Most likely, obtaining search engine traffic will be easier in the niches with "flexible" search engine behavior. Yet this cannot be a guaranteed truth because the entire situation depends on lots of factors; including the number of players in the niche, niche key phrase size, and others.

9. Defining biggest players

Operating the entire volume of data accumulated during the experiment, we can point out the main players for every niche among "white hat" sites. We can also define the most typical spamdexing strategies for every individual niche.

Let us define main "white hat" players as sites in the last time span of figure 1 which can be found entering many search queries relevant to the niche.

For the sake of convenience, we will not be considering knowledge base sites like wikipedia.org and answers.com which are widely represented in all the niches that we study.

Niche Main "white hat" players
Adult pichunter.com, youngerbabes.com, 3pic.com, penisbot.com
Cars edmunds.com, kbb.com, utotrader.com, nadaguides.com
Casino harrahs.com, casino.com, gonegambling.com, alottery.com
Dating adultfriendfinder.com, swinglifestyle.com, swingtowns.com, match.com
Gifts patagoniagifts.com, gifts.com, antiquingonline.com, bernardine.com
Pills drugs.com, druginfonet.com, crazymeds.org, coreynahman.com

Table 5. Main 'white hat' players in every niche

We will use a slightly different method when pointing out the main players among spamdexing sites:

  1. Sites of this type are in the first time span (1-6 days, see Table 1).
  2. We consider individual pages, not domains like with the "white hat" sites. Considering domains is pointless because most doorways are separate pages located within privileged sites; we will find evidence to support this later on.
  3. Leading pages are those which can be found through the biggest number of search queries.

Niche Leading doorway pages Search requests
Adult http://hgfkjhg.blog.drecom.jp/archive/1
http://php.scripts.psu.edu/juw107/seminars/php-may2006/uploadedfiles/hardcore.html
http://newmedia.cdws.ucf.edu/wiki/img/amateur-girls.html
http://jabsom.hawaii.edu/images/amateur-teens.html
423*
285
261
251
Cars http://theframegw.iifree.net/index-auto-parts.html
http://aivt.1sweethost.com/index-auto-parts.html
http://www.2000twe.happyhost.org/index-auto-parts.html
http://2000bns.free-site-host.com/index-auto-parts.html
22
22
20
18
Casino http://www.mathematics.pitt.edu/?2:12
http://www.umc.pitt.edu/tour/tour1-12.html
http://alison73.wordpress.com
http://baccaratnew.blogspot.com
130
57
38
19
Dating http://php.scripts.psu.edu/juw107/seminars/php-may2006/uploadedfiles/amateur.html
http://reddot.uark.edu/UserFiles/File/amateur.html
http://mcobit.business.nd.edu/kb/images/Research/amateur.html
http://eclassrooms.coe.uh.edu/attachments/amateur.html
601
513
507
451
Pills http://pills.hornbeckboats.com/zoloft
http://smallschools.ischool.washington.edu:8000/d_www/buy-soma.html
http://web.cfa.arizona.edu:8082/d_www/buy-valium-online.html
http://ccgb.umn.edu:8002/d_www/buy-valium-online.html
133
48
43
41

Table 6. Main spamdexing players in every niche

* - number of niche search queries through which a page could be found in SERPs during the experiment

If we take a closer look, we see similarities in spamdexing techniques in different niches. Most SE traffic is concentrated around doorways placed on.edu and .gov domains.

We would also like to mention that in highly competitive niches, like adult or dating sites, a great percentage of traffic is accumulated by doorway pages. In Cars and Gifts there is a lot less spamdexing traffic (Gifts are not mentioned in the table as no visible doorways could be located within this niche). These conclusions are also indirectly supported by Table 1. If we calculate the ratio between 'white hat' and spamdexing sites (columns 6 and 2 respectively), we will see evidence to this.

9.1 Analyzing biggest players among "white hat" sites

As we have found out in the previous section, doorway technologies are now centered on governmental and educational sites. The aim behind the realization of this scheme is placing content-containing pages on .edu and .gov sites. Then, these pages get plenty of incoming links, and this quickly brings these pages to search rankings. It has to be said that the SERP life of such pages is normally quite short.

Finding out the differences of main "white hat" players in highly competitive niches is of far more interesting. To do this, we will need a software service known as Seoquake.com.

Let us compare the biggest players by a number of parameters:

Url Google pagerank Google index Google links Dmoz Webarchive age
Adult          
http://pichunter.com/ 5 36600 947 No Apr 10 2001
http://youngerbabes.com/ 3 101 1 No Nov 28 1999
http://3pic.com/ 5 15 291 No Mar 03 2000
http://penisbot.com/ 5 27400 1290 No Aug 2000
Casino          
http://harrahs.com/ 6 6020 638 20 Feb 05 1997
http://casino.com/ 5 2190 67 36 May 30 1997
http://gonegambling.com/ 1 22700 0 1 Nov 11 1998
http://alottery.com/ 3 86 23 1 Apr 11 2000
Dating          
http://adultfriendfinder.com/ 7 131000 2050 No Aug 1998
http://swinglifestyle.com/ 4 338000 122 No Sep 24 2001
http://swingtowns.com/ 0 82400 0 No Feb 20 2001
http://match.com/ 7 1170000 11300 152 Jan 12 1998
Pills          
http://drugs.com/ 6 336000 6830 13 Dec 23 1996
http://druginfonet.com/ 6 1990 282 12 Dec 22 1996
http://crazymeds.org/ 4 13600 135 2 Nov 18 2003
http://coreynahman.com/ 6 219 2320 9 May 11 2000

Table 7. Analyzing main "white hat" players

Google search ranking positions may change every day, and sometimes the differences may be drastic. So, this table should be considered carefully. Still, some dependencies are clear.

  1. Most leading sites are multi-page portals. It is natural, after all, sites containing only several pages are not supposed to accumulate plenty of traffic.
  2. There are now young, newly launched sites in this chart. The youngest site was founded in late 2003 while most of the sites were started even before 2001.
  3. Being listed at Dmoz (and most likely in Google catalog too) is no more a necessary condition for being a niche leader [*].

Other possible conclusions require broad assumptions and are not quite evident, so we leave the opportunity to make these conclusions to our respected readers.

10. Conclusions

Analyzing the Google search result pages statistically, we can obtain a proportion between legitimate "white hat" sites and spamdexing pages for every niche. Moreover, we can see the most active players in every niche, among regular content-based sites and doorway pages as well.

Studying the search engine spam, one can see that in highly competitive niches like adult or dating sites, a noticeable percentage of traffic belongs to doorway pages (see figure 6). In less competitive niches there is less doorway traffic.

Yet, having looked at the casino and online gambling niches, we came across a different ratio between legitimate sites and doorways. As we see it, the main reason behind it is the highly competitive nature of this niche. Spamdexing techniques widely used in other niches may not show the same efficiency when optimizing casino sites.

Our analysis was carried out for a large amount of search queries for every niche. Naturally, one does not have to study a range of separate niches, processing large amount of data. By narrowing the niches, you can get more detailed information for every group of keywords.

11. Final words

All the data for this report was gathered and processed with Seodigger.com, Serparchive.org, and Seoquake.com. Anyone can experiment with search engine rankings using these services.

The above mentioned methods offer wide opportunities for research. Here are some of them:

24.08.2007

You can discuss this article here: http://blog.seoquake.com/?p=76.

[*] There is an error in the part of the table 7, which concerns DMOZ listing. Sites with Adult and Dating content should be searched in DMOZ using special queries. And hence, these sites are listed in Dmoz. Therefore, general conclusions, about leadres being listed in catalogue, may be regarded false, or, actually, partially false.