Creating a Robots.txt Page

by Sean Rasmussen on May 21, 2009

in SEO

Robots.txtDo you need a Robots.txt file on your website? As with nearly anything, there are pros and cons to adding this page.

Some people feel it is absolutely necessary while others will point to the fact that what you are trying to do with robots.txt can easily be bypassed. Take a look at this information and then decide for yourself.

What Is A Robots.txt Page?

A robots.txt file is simply a way to give search engine robots information about what areas of your site should be accessed and which should not.

Imagine that Yahoo! or another big name has sent out its bots in search of certain websites. It is looking for www.thisisanexample.com.

Supposedly, the bot will first check to see if there is a robots.txt page to visit. The code on this page gives instructions to the bot. It is placed on your web server in the top-level directory (the same place your index file resides).

Keep in mind that robots can be programmed by the search engine to ignore this page. Malware bots and email harvesting programs are two examples of bots that will purposely not look for a robots.txt file.

Your Robots.txt may not be visible to visitors, but it is publicly available. Anyone can access the contents and determine what pages you are requesting bots not to visit.

Elements Of A Robots.txt File

Let’s take a look at the common elements on a robots page and what each line means to the bot. There are no real standards which apply, however, they all use two lines – User-agent and Disallow.

  • User-agent: * – the section applies to every robot.
  • User-agent: MalBot – excludes only this bot from visiting the pages you specify.
  • Disallow: / – instructs the bot not to visit any of the website pages.
  • Disallow: /tmp/ – means that the bot is not supposed to visit this particular directory.
  • Disallow: / – excludes all pages from bot activity (a blank instead of the / means just the opposite).

You will need a separate line for each file you specifically want to keep bots from. Do not use any blank lines between.

Basically, this is it. You will use any combination of the above two lines to determine which bots you are targeting and which directories or files are allowed access.

Where To Get A Free Download For Your Site

There are many sites where you can get a Robots.txt file created for your site for free if you are not comfortable writing the code yourself.

McCanerin has an easy to fill out form with a fairly comprehensive listing of search engines. Web Tools has one that is simpler, but without as many options and the SEOChat website also has an option. You can find numerous sites just by doing a search for “create robots.txt file”.

When you’ve created your file you can visit Google’s robots.txt analysis tool. It is located under webmasters’ help and requires that you have a Google Account set up.

Adding a robots.txt file to your website can be advantageous. There are many instances when you would not want a search engine robot indexing a particular page. By allowing access to only those pages that are optimised, your search engine ranking should climb higher.

Have a most outstanding day.

Sean Rasmussen
Aussie Internet Marketing
www.SeanSEO.com © 2008 - 2010

 

{ 6 comments… read them below or add one }

1 Gee March 19, 2010 at 4:30 pm

Thisisi pretty much over my head. I will need a more simple explanation and my site checked

Reply

2 Jazz Salinger March 22, 2010 at 2:26 pm

Hi Sean,

I’m with Gee. What is the point of blocking certain pages from being crawled. I see where you said that if you only allow access to pages that are optimized, your rankings should increase. Shouldn’t they all be optimized?

Or are you referring to the Contact, About Us pages etc?

Reply

3 Sean Rasmussen March 22, 2010 at 3:16 pm

It’s hard to go into detail there Jazz. In a nutshell, the best way to use a robots.txt is to block pages that you do not want to be ranked for and those that provide no SEO value to your website. There is more to it than that, but I will leave that up to your own research ;-)

Reply

4 Jody Chambers July 11, 2010 at 5:12 pm

As I am new to SEO is this what this article means….’By having control over which website pages the search engine robot can crawl will give you more control over your search engine ranking’ ?

Reply

5 Jill Brown July 14, 2010 at 11:20 am

Hi Sean,
I hope my understanding is correct, Sean. I see a benefit by using the robot.txt file in that, I can direct search engine spiders to bypass certain pages for indexing.

I may want this because I have duplicate content or weak content that could bring my page ranking down. There could be other reasons why I want a page to be private and not indexed for optimization by search engines. (?)

Reply

6 Sean Rasmussen July 14, 2010 at 1:13 pm

You are on the right track there Jill. Another example would be if you have a private members area of your site that you don’t want to be listed in search results.

Reply

Leave a Comment

Previous post: How Much Effort Do You Make To Increase Your Affiliate Cheques?

Next post: DIY Search Engine Optimisation Tips