The objective of Googlebot is to crawl the Web (with a robot) to search, find, and fetch Web pages for Google's Index database.
Defining Googlebot
Googlebot is another Google user-agent. It uses Web crawling software by Google to find, add and index new web pages.
The function of Googlebot
Googlebot functions as a search bot (i.e., a search engine that uses Web-crawling robots) to collect documents from the Web and build a searchable index for Google (Google's Indexer).
The searchable bots (robots) work by reading Web pages; then, they make the content of the pages available to all Google services (done by Google's caching proxy).
Googlebot's requests to Web servers are done by a user-agent string containing "Googlebot," and requests to a host address contain "googlebot.com."
How to use Googlebot
Current version: Googlebot 2.1
Tag: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Switching User-Agent to Googlebot: FireFox extension (User-agent switcher)
Verifying Googlebot
IP address range:
- from 66.249.64.0 to 66.249.95.255 (googlebot.com)
(as of May 2008)
Tips: For Googlebot to function entirely, allow the bots (spiders) to have all the access they want/need.
Reminders: Ensure the Prevent Spiders option is set to true in your admin sessions settings.
Updates/changes to Googlebot: check the .txt file (such as "robots.txt") for content.
How to Allow/Disallow Googlebot:
- User-agent: Googlebot
- Allow: / (or list a directory or page that you want to allow)
- User-agent: Googlebot
- Disallow: / (or list a directory or page that you want to disallow)
Note: Through Googlebot, users can check out their own Web site as seen by Google. See how it works: View a Web Page as 'Googlebot'
Pros and Cons of Googlebot
- It only follows HREFlinks and SRC links
- It can quickly build a list of links that come from the web
- It takes up an enormous amount of bandwidth
- Some pages may take longer to find, so crawling may occur once a month vice daily
- It must be setup/programmed to function properly
- It recrawls popular frequently-changing web pages to keep the index current
Other Googlebots
- crawls pages for Google's mobile index
- crawls pages for Google's image index
- crawls pages for AdSense content/ads
- crawls pages to check for Google AdWords