Sitemaps & Google Crawls
A sitemap is a file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to
more intelligently crawl your site.... so no SEO specialist would dispute that having a sitemap is desirable but Google will crawl your site regardless of its presence or the quality of its upkeep.
Important to Consider in terms of impact of sitemap:
- A sitemap is not essential to the inclusion of your website pages on Google searches.
- Google crawls will access your site whether you have an xml sitemap submitted via Webmaster tools or not.
- Xml Sitemaps are most helpful for assisting Google crawls to find complex layers of inner site pages and new content you may recently have added.
- Only specific technical instructions via a Robots.txt file or other deliberate “do not crawl” mechanisms would prevent Googlebots from accessing your site and pages.
- Regardless of a sitemap, if your site’s pages are properly linked through your IA, web crawlers will discover most of your site without the assistance of a sitemap.
- Note that use of a sitemap doesn't guarantee you that all the web pages listed in your sitemap can be crawled or indexed as Google processes rely mainly on complex algorithms.
A sitemap would be of heightened importance if it met one of the following criteria:
- Your site is new or has just been launched and has few external links to it. Googlebot and other web crawlers crawl the web by following links from one page to another. As a result, Google might not discover your pages if no other sites link to them.
- Your site is really large with a large volume of new pages being added regularly. As a result, it’s more likely Google web crawlers might overlook crawling some of your new or recently updated pages.
- Your site has a large archive of content pages that are isolated or not linked to each other. If your site pages do not naturally reference each other, you can list them in a sitemap to ensure that Google does not overlook some of your pages.
- Your site uses rich media content, is shown in Google News, or uses other sitemaps-compatible annotations. Google can take additional information from sitemaps into account for search, where appropriate.
Common Reasons for problems with Sitemaps:
- The CMS that is generating the XML sitemap for automatic submission to Webmaster Tools (WMT) experiences a problem and stops generating the automatically updating file.
- The webmaster is creating and submitting manually generated sitemaps and forgets / fails to keep that file updated.
- A switch from HTTP to HTTPS set up... the website is moved from a non-secure to secure domain and proper guidelines within WMT were not followed. See more on this below:
HTTP to HTTPS move:
In theory a move to HTTPS should be of benefit to our organic search rankings and not a negative. Google is rewarding sites who make the effort to provide more secure platforms for consumers.
There is no impact from what SSL cert you use so no issues there and so long as you set up a separate profile in WMT for the revised HTTPS domain (recommended but not essential) the outcome should be beneficial not negative. But consider the following checklist to be safe:
SEO checklist to ensure you preserve your rankings during HTTPS move (best practice guidelines):
- Make sure every element of your website uses HTTPS, including widgets, java script, CSS files, images and your content delivery network.
- Use 301 redirects to point all HTTP URLs to HTTPS. You'd be surprised how often a 302 (temporary) redirect finds its way in by accident
- Make sure all canonical tags point to the HTTPS version of the URL (not the HTTP).
- Use relative URLs whenever possible.
- Rewrite hard-coded internal links (as many as is possible) to point to HTTPS. This is superior to pointing to the HTTP version and relying on 301 redirects but not essential.
- Register the HTTPS version as a new profile within Google Webmaster Tools.
- Use the Fetch and Render function in Webmaster Tools to ensure Google can properly crawl and render your site.
- Update your sitemaps to reflect the new URLs. Submit the new sitemaps to Webmaster Tools.
- Leave your old (HTTP) sitemaps in place for 30 days so search engines can "process" your 301 redirects.
- Update your robots.txt file. Add your new sitemaps to the file. Make sure your robots.txt doesn't block any important pages.
- If necessary, update your analytics tracking code. Most modern Google Analytics tracking snippets already handle HTTPS, but older code may need a second look.
- Implement HTTP Strict Transport Security (HSTS). This response header tells user agents to only access HTTPS pages even when directed to an HTTP page. This eliminates redirects, speeds up response time, and provides extra security.
- If you have a disavow file, be sure to transfer over any disavowed URLs into a duplicate file in your new Webmaster Tools profile.
Site / Pages removed from the Google index
Google may temporarily or permanently remove sites from its index and search results if:
- Google will not always comment on the individual reasons a page may be removed. However, certain actions are very commonly listed as red flags (more on this below).
- It believes it is obligated to do so by law
- If the site does not meet Google's quality or technical guidelines
- Or for other reasons, such as if the sites detract from users' ability to locate relevant information.
- If your site is blocked from the Google index because it violates their quality guidelines, they will alert you about this using WMT.
- In some cases you may believe that your site has been removed but you may only have experienced a temporary rankings drop... meaning your site dropped off its historic ranking to a lower position leading you to believe it was fully “removed”. In most cases these temporary drops last only a matter of hours or a few days before returning to normal.
The following are the reasons why a website would be “removed” by Google:
- Automatically generated site content.
- Participating in link schemes or dubious inbound link building.
- Creating pages with little or no original content for blatant SEO purposes.
- Cloaking & sneaky redirects... sending users to pages that were different than listed.
- Hidden text or links... hiding blocks of text inserted on your pages only for Crawls.
- Doorway pages.
- Scraped content.
- Participating in affiliate link programs without adding sufficient value.
- Loading pages with irrelevant keywords.
- Creating pages with malicious behavior, such as phishing or installing viruses, trojans, or other badware.
- Abusing rich snippets markup.
- Sending automated queries to Google.
- A change to the sitemap, an error in the sitemap, the removal of a page from the sitemap or a switch from HTTP to HTTPS would not under any circumstances cause a website to be removed from Google search. They lead to a temporary rankings drop but that is just the nature of dealing with the complex Google algorithm.
- The following are reasons why a site might suffer a “sudden” temporary or short term loss of ranking:
- 301 redirects... a web page being 301 redirected will retain the vast majority of its historic ranking power and rankings on search engines however it will often experience a temporary drop off after the redirect goes through. In almost all cases the page returns to its original ranking position after a couple of days. In some cases the ranking might drop slightly (for example for position for 4 to 5) and not fully recover after a 301 redirect. A 301 redirect of the homepage of a website could have a dramatic short term impact in some cases.
- Switch from HTTP to HTTPS... in some cases where mass 301 redirects take place to facilitate a switch to HTTPS a site may experience short term ranking drops.
- Hacking... if the site is hacked and hacked content is put live.
- User generated spam detected by crawls on UGC posts etc.
- Slow page download speeds... extremely slow page download speeds for a period of time.
- Crawl errors... large volumes of crawl errors such as mass 404 errors, server or site downtime.
- In our opinion a corrupted / altered sitemap that removed some pages from it would not have a noticeably or significant ranking impact. Certainly once rectified within a period of time it would not explain the homepage dropping off the rankings.
- It’s possible that the switch to HTTPS and 301 redirects involved led to a sudden drop in rankings for the site which then recovered quickly as is common. We’ve seen sites drop off slightly with the switch to HTTPS and then come back strongly as before... this is just a normal part of SEO and the change to HTTPS is highly recommended / beneficial in the long run so worth any short term pain.
- A case could be made for any number of reasons as to why search rankings would drop and in our opinion it would be almost impossible legally to prove a direct “cause and effect” in this area... especially considering the level of mystery Google maintains in this area, the complexity of the algorithm and the enormous array of possible contributing factors.
Reported compiled by Adrian Feane of Coalface Digital.
Coalface Digital is a specialist provider of search engine marketing and analytics services to leading Irish agencies. Delivering expert consultancy, strategic guidance and implementation services to support the needs of clients. Our team consists of SEO, Paid Search and analytics experts with over 10 years of experience working with dozens of Ireland’s leading brands.
Adrian Feane is a senior digital strategist and search marketing specialist and has held senior management positions across the industry spectrum. He has lectured on the topic of SEO for Digital Marketing Institute, SureSkills, Irish Internet Association, Marketing Institute of Ireland among others.