Google has never published a list of IPs they crawl from, as those IPs can change at any time. They also have bots used for specific products as well, such as the ones for Google AdSense and Google AdWords.
One of the issues is Google the Local-aware crawling by Googlebot, which not only come from completely new IPs, but also from countries and IPs based outside of the US, which seems to be triggering false positives in bot blocking scripts. If you are unsure if the Googlebot visiting is a real one or not, you can do a reverse DNS lookup to confirm.
Why do people block bots? A variety of reasons – to block server load, to block attacks, to prevent fake referrals. Generally users will whitelist with a list of known Googlebot IPs, as many people will spoof Googlebot, but when Google switches up the IPs and a user inadvertently blocks Googlebot, it can take quite some time to rectify this situation, as noted in this thread from WebmasterWorld.
I have a system that prevents ‘bots from crawling my site. It has a whitelist, to which I add Google IPs. I had always added them manually because new IPs didn’t come up too often, and I wanted to make sure that no one was spoofing Google. About 10 days ago, Google apparently switched to crawling from about a dozen new IPs. I was not paying close attention to my system and those IPs got blocked. They were blocked for about 3 or 4 days.
The traffic picked up a little bit, but slowly. Google wasn’t adding the pages back even though they had recrawled them. Some pages came back, but some of my top pages (for example, Connor McDavid) were nowhere to be found in Google – even when I searched with my site’s name (as many users do). I tried asking Google to recrawl multiple times, but after a week they still aren’t adding back pages for which I request a recrawl.
John Mueller also commented today on the Google Webmaster Help forums with the same situation, where a site is blocking Googlebot.
Wordfence, a popular WordPress plugin for blocking bots, is one that repeatedly comes up, with both the free and paid versions having issues. (Added: Wordfence has some documentation on the specific Googlebot feature so webmasters can ensure they don’t accidentally block Google.)
Hosting companies can also block Googlebot to save server resources. Many, many years ago, GoDaddy hosting blocked Googlebot from crawling all the sites they were hosting for their hosting clients.
Bottom line, if you are using any kind of bot blocking script, you will want to check Google Webmaster Tools daily (if not more than once a day) to check on any issues with Googlebot being blocked.
Latest posts by Jennifer Slegg (see all)
- Google Treats Hreflang in Sitemaps and HTML the Same - June 13, 2018
- Google Dropping Meta Tag Description Length Warnings from Search Console - June 7, 2018
- Google Search Console Crawl Stats to Update Soon - June 6, 2018
- Google: Don’t Block Googlebot from Slow Resources on Page - June 5, 2018
- Google’s Googlebot Crawling, Search Visibility and Rankings - June 4, 2018