Duplicate Content: A Complete Guide
One of the important components of successful web resource promotion through SEO is internal optimization. Oftentimes, there are problems with duplication of content on sites or outside of web resources. This makes their development less successful, and also brings about the risk of the site getting flagged search engine filters. You can read more about on-page optimization in the article On-Page SEO.
Duplicate Content: Definition
Duplicate SEO content are pages whose contents (texts, visuals, etc.) partially/completely coincide with the content on the pages of the same or other sites. Typically, these are analogues available on the resource at several URLs:
- Duplicate home page on index.html/htm or index.php;
- Duplicate content caused by inbound links; and
- Navigation, where sorting, attributes, display and other parameters are available to crawlers and indexed by a search engine.
Reasons Why Duplicate Content Appears
SEO duplicate content does not always indicate technical SEO problems. More information on this topic can be found in the article Technical SEO Checklist For 2020. This is rather an incorrectly followed content strategy:
- Texts on similar topics; and/or
- Replacement of obsolete texts, images, categories without removing old ones.
The problem of duplicates is often faced by online stores. It’s connected with:
- Blank category pages;
- Reusing a copy of review;
- Duplicates on Print and Download pages;
- Copies of the product posted on partner sites and platforms such as Etsy, Ebay or Amazon;
- Copies in the sections of payment, delivery and exchange;
- Duplicates of descriptions and headings;
- The availability of the product in several categories or sale URLs; and
- Reviews across multiple pages.
News resources often “suffer” from duplicate content as well due to announcements of articles that are placed on different pages of a site. Instead of describing the article, the first few sentences from it and the title image are used. Sometimes content is deliberately duplicated between domains when trying to increase traffic to the site. Such actions can worsen behavioral factors, as the visitor will see identical content in the search results.
Types of Duplicate Content
Content may partially or fully coincide with the original. Here is an example of full duplication:
Partial duplication of content, on the other hand, looks like this:
There is even a conditional classification of duplicate content among webmasters.
- Internal technical duplicate content– This is a copy of the content, available at several URLs on the web resource, duplicated home page on index.html/htm or index.php and so on.
- Duplicate content caused by incorrect server settings– There is no http redirection to https or an alternative subdomain like www3 is open for indexing. See more information on this in the article HTTPS vs. HTTP.
- Third-party external duplicate content– This consists of copies or partially copied versions of content on external domains owned by third parties. These are sites with content copied from your web resource that published it as content of personal authorship. This also includes partially non-original texts copied from your web resource, as well as duplicate content on different pages of your own site.
Is Duplicate Content Bad For SEO?
Popular questions among novice SEO professionals and experienced Internet marketers are questions such as: “Does duplicate content hurt SEO? Is there a duplicate content penalty?”
Google webmasters states:
Note that there is a risk of the website getting lower in the ranking in the search results.
It is important to know that the position of a site in the SERP is ever-changing. After the site is launched, its pages can be sorted without taking into account some factors. Duplicate pages or content copied from other sites may not immediately be detected on a new site, but in the future, the content issue and other factors will affect the initial ranking.
Why Duplicate Content is Bad for SEO
Duplicates themselves do not prevent the visitor from finding the necessary information and completing the target action, but can make it difficult to find the resource on the Web. Duplicate content could result in:
- Indexing issues. This item is especially important for sites with a large number of pages (e-commerce). Read more about this in the article Ecommerce SEO. When forming duplicate pages, the overall size of the web resource increases. By indexing “extra” pages, bots spend the crawling budget of the site owner.
- The site getting placed under search engine filters, even if other methods chosen for optimization and promotion are allowed. Get more information on this in the article Google Algorithms That Affect SEO.
- The loss of link weight of pages that are being promoted. The search engine can automatically redirect visitors to duplicate pages, and not to their originals.
- Changes to whether the page matches search results. Often, due to competition between duplicate pages, none of them fall into the search results.
- Decrease in organic traffic.
Are There Penalties For Duplicate Content?
Does Google penalize for duplicate content? Special Google filters crawl website content. Is there a “regular” or “printed” version of the pages on the web resource, and are none of them blocked by the noindex meta tag? The search engine will select one of the versions for the results. If Google’s algorithms accept duplicate content for manipulating the ranking in the search results, adjustments will be made to the indexing and ranking of such a site. As a result, the position in the search results of a web resource will decrease. Another consequence of manipulations on a site with similar content is that the web resource will be removed from the Google index and will no longer appear in the search results.
Is your website already removed from Google’s search results? Search engine webmasters made detailed recommendations for SEO professionals to help index and analyze website optimization. Submit the website for review when you’ve made your changes and you are sure that your actions on the website do not violate Google’s rules.
Learn more: How does Google handle duplicate content? Google’s Matt Cutts video on how to work with takes on a site.
How to Check if There is Duplicate Content On And Off of Your Site
Checking for duplicate content can be done using search queries and duplicate content tools: its detection, finding the cause, and elimination on the website. Such a tool is typically called a duplicate content checker.
The SEO duplicate content checker is a Google’s webmaster toolbar that will be able to show if duplicate content is present or not. The search in the index also helps identification– for Google it is site:domain.com “keyword.” Just enter the site address and a piece of text in quotation marks to search for matches.
To identify the use of the same images on different sites in SEO one can use image verification services. Such tools are also available for texts. They will tell you if the content is copied, and if so, how big the issue is.
If you doubt the uniqueness of several pages of your website, use these services to check the similarity of the pages.
Checking for duplicate content will also help with website analysis services. Netpeak Spider and Siteliner detect duplicates of your title, description, H1 headers, fragments of duplicate texts, and pages with complete matches.
Additional recommendations for improvement:
- Use 301 redirects. 301 redirects, or RedirectPermanent, can be configured in the .htaccess file to redirect users, Googlebots, and bots from other search engines. For more information on this topic, see the article 301 vs 302 redirects.
- Add a link to the original source if you publish page content on external resources. For pages with copied content, you can also use the noindex meta tag. Bots will correctly index pages without changing the correspondence of the original web resource to search queries. More about meta tags here: Meta Tags for SEO.
- Use the canonical tag. Read more in the article: Rel=Canonical Tag.
- Reduce repetition of text content. Do not post copyright protected text at the bottom of each page. Instead, add a short description with a link to the details page. Using parameter processing tools is also useful. They allow you to specify wishes for processing Google URL parameters.
- Learn text and picture management. Do you know exactly how content is displayed on your site? Analyze if the blog entry is displayed on its pages in the archive of entries with the same data.
- Replace non-unique images, preferably with your own content.
- Make duplicate content unique. Avoid using the first sentences of the text instead of short descriptions of articles.
A Case for An Online Cosmetics Store
The site contains separate pages for two manufacturers of makeup remover, but with similar information on both pages. You can combine these pages into one– “Makeup Remover.” Or, fill each of them with unique content about the manufacturers of makeup removers, describing the benefits of their products.
- Use the Noindex WordPress Tag or category pages for WordPress sites. Site Builder automatically generates page tags and categories that are sources of duplicate content.
- Use top level domains. Using top-level domains to process country-specific content (if possible) might help with your duplicate content issue. Use the http://www.example.pl domain to display Poland-oriented content instead of the http://www.example.com/pl or http://pl.example.com domains.
- Close duplicate content for indexing by Google as it is harmful to SEO. To do this, use the <noindex> tag.
Where Should You Start?
Go to the Google Search Console. Here, you can choose your own options for sorting, attributes, and displaying pages available for Google crawlers. The use of this tool is mandatory for sites with a large number of sections and search queries (real estate sites, web resources for selling cars and automobiles, or job search sites).
You can select various types of parameters used on the website pages:
- SortBy =;
- ShowInStockOnly =; and
- Price = and others.
If pages and their contents have duplicates, use a query in Google:
website: www.nnn-nnn.com inurl: sortBy = or try other types of parameters.
Easy Content Theft Protection
You can prevent your content from being stolen by someone by adding the link rel=canonical to the pages of your web resource.
Rel=canonical link: The URL marked in the rel=canonical tag matches the URL of the current page. The rel=canonical tag ensures that the version of your website on the Web is considered “original” content.
Duplication of site content worsens the user experience, reduces the amount of organic traffic, and leads to additional costs. Duplicates can also lead your web property under a Google’s filter.
You can fix the problem with a thorough analysis of the site using special tools and services. Choose them based on the situation with duplicate content for Google. White SEO will not “save” a resource if it contains duplicate images, texts, and entire pages. Publish unique content also. This will help you to avoid problems with filters and a drop in rating. Good luck in your promotion!