rss

Google Robots And Why They Are Important

27 Nov, 2009 Phil SEO

Robots?  No this post isn’t about sci-fi.   From my own research, and the work that I do at my web design agency OwenDevelopment,  Google robots are very real and important in helping your website get indexed in search results.

Also known as ‘Spiders’, ‘Crawlers’ and the ‘Googlebot’, these programs scrawl the internet constantly, from page-to-page, site-to-site, reading the content on the page and reporting back to Google and other search engines about what your site is about and what keyword or phrases would be relevant to display your site in it’s results.

Every day hundreds of them go out and scour the web, whether it’s Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions.  As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called “robots.txt.”

“Robots.txt” is a regular text file that through its name, has special meaning to the majority of “honorable” robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it’s both meaningless to you and a waste of your site’s bandwidth. “Robots.txt” lets you tell Google just that.

 

Imposing Restrictions
You may impose restrictions on which web pages to disallow indexing. By default, most users will want to allow all directories except their /cgi-bin/ directory, which commonly holds scripts, and their images directory /images/. To enable all web pages, select Yes to “Enable All Webpages,” then enter each web page or directory path in the exclusion box, one per line.

Example: “http://www.yourdomain.com/cgi-bin/” (Excludes the /cgi-bin/ directory)
Example: “http://www.yourdomain.com/images/” (Excludes the /images/ directory)
Example: “http://www.yourdomain.com/welcome.html” (Excludes the /welcome.html web page)

 

For my readership (because I love each and every one of you), I have included below a generator to create a robot.txt file to upload and use on your own sites:

   
  Robots.txt Generator Tool © SEO Chat™

 
 

 

Allowed User Agent
Select user agent or use default for all agents

Enable All Webpages

Yes

No – Exclude These URL’s:

¦lt;br /> Enter URLs you wish to exclude

Enter Captcha To Continue
To prevent spamming, please enter in the numbers and letters in the box below

Report Problem with Tool. 

 

 

   
   

 Generator and info are courtesy of www.seochat.com

 

Share this post:
  • email
  • Add to favorites
  • PDF
  • Facebook
  • Twitter
  • Digg
  • MySpace
  • Google Bookmarks
  • del.icio.us
  • NewsVine
  • Ping.fm
  • Reddit
  • Yahoo! Buzz
  • Technorati
  • StumbleUpon
  • Sphinn
  • RSS

About Phil

Phil is creative director at PSM Digital but also freelances with web design and SEO in Manchester, UK. He researches and studies online business, along with the latest technological advances and development in design, SEO and social media.

Leave a Reply

About The One Man Mission...

Hi I’m Phil. Welcome to my blog, where you’ll find useful information on web design, development and online business advice.  I’m a creative director for a digital agency in Manchester, UK and I also freelance web design also.  Currently setting up and developing a new online business, I am here...

Read More »

Find me at:

  • twitter
  • facebook
  • linkedin
  • youtube
  • flickr

Photos on Flickr...