Why the site is not indexed by Google: TOP-7 reasons for the absence of your resource in the search engine
In the practice of SEO specialists, there are often situations when work on a site is in full swing: unique high-quality content is added, the structure expands, new useful sections, blog articles appear, and the number of pages in the index does not increase in any way. Worse yet, they start dropping out of there, or there is no indexing of the website at all, and it does not appear in search results, is invisible to the search engine and does not receive organic traffic.
Repeated checks, as experience shows, do not lead to anything. The only rational way out of this situation is to find the reason why the site is not indexed in Google. We will not waste time, and we will start solving your problem “without leaving the checkout”.
In the article you will learn:
- What is ranking, site indexing and crawling
- How to check the presence and number of pages in the Google index
- Optimize your robots.txt, sitemap.xml and .htaccess files
- Look at the meta name tags "robots" and rel = ”canonical”
- Check the speed and correctness of server responses
- Make sure the web page is unique, useful, not an orphan, not a duplicate
- Eliminate the influencer of the search engine: filter, technical problems
- Check the domain name: young site or domain with history
- Analyze links, both external and internal
What is ranking, site indexing and crawling?
To begin with, let's understand the basic terminology and understand how a search engine works. The Google search bot crawls the World Wide Web like a spider, finds and adds new web pages to the database. After the user enters a query, the search engine sorts the resources so that in the TOP there are those, the content of which most closely matches the entered search phrase.
Now in terms:
- Indexing - adding and saving new pages to the database.
- Crawling is the process of crawling hyperlinks in order to find new content.
- Ranking is the distribution of pages by relevance to a key query.
By analogy, we can say that indexing is registration in a tournament, and ranking is the assignment of prizes. Agree, participation and victory are not possible without registration.
How to check the availability and number of pages in the Google index?
If you have access to the Search Console service, check the site indexing on Google there. The Coverage section contains all the information you need to show the number of online pages in the index, including those with errors and warnings.
Якщо в перших трьох полях ви бачите числа, відмінні від нуля, – індексація сайту в Гугл присутня, і хоча б частина контенту є в базі даних. Наявність певної кількості сторінок в графі “Виключених” не є чимось критичним. Однак, якщо у вас всюди 0, а в останньому полі значення не дорівнює нулю, є проблеми зі сприйняттям ресурсу пошукачем. Значить, пора шукати причину, чому
Гугл не індексує сайт, та усувати її.
In addition to the total number of web pages in the index, Search Console allows you to check the index of any one taken separately. You will see a corresponding notification depending on its status.
For example, everything is all right here
And here indexing failed
There is probably no better way to check the number of pages in the Google index. When the fate of a web resource is not indifferent to you, and search traffic is of interest, we recommend registering with Search Console - this is necessary and is done for free.
The second option still exists, and it is for those who do not have access to the console - using the site: operator in front of the url will help to find out if the Internet resource is in the index and to what extent it is present there. How can I check pages in the Google index in this way?
Let's say we enter site: in front of our url
We see the approximate number of Internet pages in the search engine index. By the same principle, you can check a separate page. For example:
The site is not indexed by Google: how to “feed” it to the search engine
If you find that Google does not index a site or a separate web page, we recommend that you proceed according to the following algorithm:
- Open the Search Console webmaster panel;
- Go to the url check section;
- Insert the link you want to check into the search box;
- Request a check and wait for the result;
- If the URL is not in the Google database, click on the "Request indexing" parameter.
Alas, if the index is missing several hundreds or thousands of landing pages of your online resource, this method of search will not be very effective and too time-consuming. Poor site indexing by Google indicates deep-seated problems that cannot be solved with queries alone. But you shouldn't forget about them either. Search Console queries are useful when you post new content, expand your structure, and update content. With their help, you inform the search engine that the website has been updated, and the bot should pay attention to this.
For example, we simulate a situation. The young website, after development, was indexed in early February this year. It is clear that Google is suspicious of new Internet resources and is in no hurry to reward them with good ranking. After a couple of months of work, it turned out that some product cards, moreover with optimized content and meta data, were not in the search results simply because the bot visited them once, for example, on January 31, saw a noindex in the robots meta tag, and within two months more did not go there.
In this situation, the reason probably lies in the fact that after updating and optimizing the product card, they forgot to send a request through the webmaster's console and notify Google that the content was updated.
What are the reasons for the absence of a site in the index and what to do to fix them
With regard to the options why Google does not index the site as a whole, you need to carefully study each individual case, and find the source of this phenomenon as soon as possible. In our new article, we will consider the TOP-7 most common reasons for the absence of web platforms in the search engine database, and describe methods for eliminating them.
Optimize your robots.txt, sitemap.xml and .htaccess files
One of the most common mistakes webmasters make is incorrectly compiled robots. The presence of blocking Disallow directives in it may prevent the bot from processing the entire resource or its individual Internet pages. It is important to remember that every website has pages that need to be closed from indexing. These include technical, search results, get-parameters, login, admin, shopping carts, "trash", etc.
If the robots states:
- User-agent: *
This means that the Internet resource is completely closed to search engines. In Search Console, under the “Coverage” category, you can see all URLs blocked by robots. This most often happens with new websites, when developers have saved test versions on a subdomain and closed them from indexing. When transferring, it is often forgotten to update the robots, and the finished resource remains inaccessible to the bots.
For WordPress sites, you also need to remember to uncheck the checkbox for disabling search engine visibility in the Settings → Reading section.
The robots file can be accessed via the link https://sitenametest.ua/robots.txt. You can check it using the tools of the webmaster's panel, online services, for example, this one, using parser programs - Screaming Frog or Netpeak Spider. The problem of banning the index is easily solved - it is enough to remove the blocking conditions and send a request to update the robots.txt.
The sitemap.xml file is needed for your online resource to interact with a search engine. It informs him of the important pages and the recommended rescan rate. Of course, a search bot is able to bypass a website without a sitemap.xml, but its presence is considered good practice and simplifies the work of Google, Yandex, Bing and other search engines. You can check the map of the online resource through the webmaster's panel or by following the link https://sitenametest.com/sitemap.xml. In Search Console, you can also request a search engine crawl in the Sitemap section.
This way you can get your site indexed on Google faster. The sitemap.xml is most often created by an auto-update plugin so that all new posts are immediately added to it. If a web resource does not have a sitemap, this can negatively affect indexing, so we recommend adding it to your hosting.
The .htaccess file is intended for the server and contains the rules for its operation. It is usually placed at the root:
- server - / public_html
- online site - /public_html/sitenametest.ua/
Htaccess rules can be used to close indexing by disallowing:
order allow, deny
deny from all
allow from IP
or by allowing:
order allow, deny
deny from UndesiredIP
allow from all
access to all users except for a specific IP.
We recommend analyzing .htaccess, maybe it will contain rules prohibiting the index.
Look at the meta name tags "robots" and rel = ”canonical”
The site is not indexed on Google? The problem may lie in the wrong robots tags or incorrect indication of the canonical web pages. If in the header you suddenly find meta name = “robots” content = “noindex, nofollow” - it means that the resource is closed and the search engine is deprived of the ability to crawl. You can check the robots tag in the webmaster's panel in the “Coverage” category or using parser programs, for example, Screaming Frog.
The solution is simple and quite logical - remove the noindex, nofollow meta tags where they are not needed. If there is no result, we recommend that you look at the server and check the HTTP responses of the X ‑ Robots tag.
If you find a directive like this there
HTTP / 1.1 200 OK
Date: Tue, 25 May 2021 23:31:10 GMT
hence, the resource is closed and not indexed.
Therefore, urls that need to get into the index should not return a “noindex” HTTP response. Apache adds this directive via .htaccess. On Nginx - in conf. HTTP responses are checked in the same way as robots tags - through the webmaster's panel in the url check report or by parsers.
The rel = ”canonical” attribute is another aspect that affects resource indexing. Usually, the self-referential canonical is used, informing search engines that the web page is the main one, the so-called. canonical, and is presented in a single version. The link rel = "canonical" tag is used in SEO promotion of a site in search engines in order to eliminate duplicate content. So, in some cases, it can interfere with indexing, namely, when another or nonexistent page is selected as canonical for the checked url. In this case, the status will be Canonicalised, Non-Indexable.
Therefore, the correct use of the "canonical" attribute will help to avoid low-quality indexing. Check the presence and correctness of canonicity for the found non-indexed pages, and if you want to “feed” them to a search bot, set up the tag correctly.
Check the speed and correctness of server responses
Google or Yandex does not index the site, although the previous reasons have been checked and eliminated? Check your content rendering speed. It happens that such pages do not get into the search engine database due to slow loading. The reason for this may lie in poor code optimization or unstable hosting. Bots simply cannot receive a response to their request and leave the web page before the main content is loaded. What to do in this case? Aim to reduce the server response time to the first byte of TTFB to 200ms. In general, less is more. The main thing is not more than 500 ms. TTFB over 500ms is a problem.
There are many services to check the speed, choose the most convenient for you:
- web site optimization,
- pingdom tools,
- pagespeed insights.
The most common reasons for a slow server response are the following:
- Website loading speed is not configured, images are not compressed, blocking resources are not eliminated, CSS / JS files and scripts are not minimized.
- Server functionality is not optimized, databases are not configured.
- There is no caching.
- Weak server power, low memory, low processor performance, poor quality hosting.
As for the server response, all Internet pages in an accessible state should have a code 200. If problems are observed, 5xx errors appear. The response codes are checked by parsers, for example, Screaming Frog, by third-party online services - httpstatus.io, etc., through the webmaster's panel, using the browser - the combination F12 + the Network tab.
Make sure the web page is unique, useful, not an orphan, not a duplicate
When the technical problems of indexing are not found, it is possible that the problem lies in the lack of uniqueness, value, usefulness of the content. The fact is that Google prefers to showcase high-quality content that is as relevant as possible to key user queries and meets E-A-T principles. We recommend that you ensure that the online page you are checking is providing value to the visitors of the online resource. If there is no benefit, update the content, add freshness, expertise, increase the uniqueness of the content, organically distribute key queries throughout the text.
As a rule, any web-page is useless that hosts:
- automatically generated text;
- duplicate content;
- content spammed by outgoing urls;
- text that has no semantic meaning, abundantly supplied with keywords.
Also, doorways used for promotion by black SEO are also referred to as low-quality web pages.
With regard to orphan pages, their feature is the absence of any incoming links, both within your Internet resource and from third-party web platforms. Since the search engine bots are looking for new content, moving one by one between pages, they, in fact, like users, cannot find orphan pages. You can check the presence of "orphans", as well as duplicates, with parsers. Only in the first case it will be necessary to compare the number of all scanned pages with the list of URLs uploaded via the CMS. Pages that were not found by parsing will turn out to be the “orphans” we are looking for.
Eliminate the influencer of the search engine: filter, technical problems
With an incorrect promotion strategy, an Internet resource can get banned, having received sanctions from a search engine. In this case, Google does not index the site, and this is one of the reasons for its absence in organic search results. Search engines tend to block online resources in three cases:
- spamming with key phrases,
- useless content or doorway,
- low quality backlinks.
You can check for sanctions in Search Console under Security Issues and Manual Actions. If there is no filter, the reason for poor content indexing may be technical problems on the side of the search engine. For example, in May 2020, there was a crash that resulted in problems with indexing new content. Usually, such situations are promptly identified and eliminated by the search engine's technical support.
Check the domain name: young site or domain with history
As you know, new online resources created less than six months ago are not particularly known to search engines, and without adding them to the index through the webmaster's panel, bots will find them for a long time. The sooner you notify Google about your web resource and send it for crawling, the sooner they will appear in search results.
The second option is to use a domain with history. Buying drop domains is generally a risky business. If the previous owner of the domain name managed to “inherit” and receive sanctions by driving the resource under the search engine filters, you will have to reap the fruits of his efforts. At the same time, the new online site will be poorly indexed. When buying an old domain, it is important to check it carefully by studying the history and resources previously located there.
Analyze links, both external and internal
Link mass is one of the key ranking factors that informs a search bot that a web resource is popular and that other web platforms are linking to it. Yes, the index contains many pages without backlinks, but the search engine still gives preference to resources with good link mass. As a result, pages with quality backlinks are indexed faster.
In terms of internal linking, it is recommended to avoid using the rel = “nofollow” attribute for URLs within a website. First, Google does not follow such links. Secondly, this attribute prevents PageRank from being passed to the target url. As a result, a page with a nofollow link will be able to get into the search engine base only if a URL from a third-party resource that is not marked with this attribute leads to it.
To avoid problems, you need to correctly configure the linking by placing several links to the landing page. Let's say, in blog articles about the development of Internet resources, place a link to a service page, increasing its internal weight. In turn, rel = “nofollow” is recommended to be added to outbound links, for example, on social networks, so as not to transmit the weight of your online site to them.
Summing up what has been said
Yes, we overclocked something ... we ended up with a real longread, informing how to check if a page is in the Google index, and informing about possible reasons for the lack of indexing. Instead of a resume, we offer a checklist that you need to check on the site if it has poor indexing:
- robots.txt, sitemap.xml and .htaccess files;
- meta name tags "robots" and rel = ”canonical”;
- server speed and response;
- content: uniqueness, usefulness, expertise;
- filter and technical problems on the part of the search engine;
- domain quality;
- links: external and internal.
If you do not have time to check, or some aspects cause you difficulties, you can contact our digital agency at any time and order a comprehensive audit with an emphasis on finding problems with site indexing.