What is a Googlebot?

Article by Daniel B (2,209 pts ) , published Jul 12, 2009

The objective of Googlebot is to crawl the Web (with a robot) to search, find, and fetch Web pages for Google's Index database.

Defining Googlebot

Googlebot is another Google user-agent. It uses Web crawling software by Google to find, add and index new web pages.

The function of Googlebot

Googlebot functions as a search bot (i.e., a search engine that uses Web-crawling robots) to collect documents from the Web and build a searchable index for Google (Google's Indexer).

The searchable bots (robots) work by reading Web pages; then, they make the content of the pages available to all Google services (done by Google's caching proxy).

Googlebot's requests to Web servers are done by a user-agent string containing "Googlebot," and requests to a host address contain "googlebot.com."

How to use Googlebot

Current version: Googlebot 2.1

Tag: Googlebot/2.1 (+http://www.googlebot.com/bot.html)

Switching User-Agent to Googlebot: FireFox extension (User-agent switcher)

Verifying Googlebot

IP address range:

  • from 66.249.64.0 to 66.249.95.255 (googlebot.com)
    (as of May 2008)

Tips: For Googlebot to function entirely, allow the bots (spiders) to have all the access they want/need.

Reminders: Ensure the Prevent Spiders option is set to true in your admin sessions settings.

Updates/changes to Googlebot: check the .txt file (such as "robots.txt") for content.

How to Allow/Disallow Googlebot:

  • To Allow Googlebot
  1. User-agent: Googlebot
  2. Allow: / (or list a directory or page that you want to allow)
  • To Block Googlebot
  1. User-agent: Googlebot
  2. Disallow: / (or list a directory or page that you want to disallow)

Note: Through Googlebot, users can check out their own Web site as seen by Google. See how it works: View a Web Page as 'Googlebot'

Pros and Cons of Googlebot

- It only follows HREFlinks and SRC links

- It can quickly build a list of links that come from the web

- It takes up an enormous amount of bandwidth

- Some pages may take longer to find, so crawling may occur once a month vice daily

- It must be setup/programmed to function properly

- It recrawls popular frequently-changing web pages to keep the index current

Other Googlebots

  • Googlebot-Mobile

- crawls pages for Google's mobile index

  • Googlebot-Image

- crawls pages for Google's image index

  • Mediapartners-Google

- crawls pages for AdSense content/ads

  • Adsbot-Google

- crawls pages to check for Google AdWords

 
Subscribe to Google
RSS
Get free weekly updates, directly to your inbox.