LSF Marketplace
For Publishers
For Advertisers
Writing a Robots.txt file and Use of Robots.txt Generator

A Robots.txt file is a special text file that is always located in your Web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define. You should create a robots.txt file using a text editor like windows notepad. Don't use your word processor to create such a file.

Writing text for search engine spiders about your instruction on which web page to open and which is not called Robots.txt. The text is written in HTML code in notepad. A robots.txt should be viewed like a list of recommendations. By including one, you are asking the spiders to visit your site to ignore certain things that you would prefer not to be indexed, but they are not obliged to pay attention to that. If you really do not want things indexed, it is far better to disallow access with server-side programming than a robots.txt.

Here is the basic syntax of the robots.txt file:

User-Agent: [Spider Name]
Disallow: [File Name]

User Agent line specifies the robot. For example:

User-agent: googlebot

You may also use the wildcard character "*" to specify all robots (all search engines):

User-agent: *

Disallow is directive lines, which instructs Google search engine spiders (robots) to follow what you say. If you think to hide some information say 'email' about your website, you can write as follows:

User-Agent: *
Disallow: email.html

Or you may write

User-Agent: *
Disallow: /email/

However, if you leave the Disallow line blank, it indicates that all files can be retrieved. If a website doesn't have a robots txt, it means all files can be retrieved. Once you are through with creating robots.txt file, check to ensure that you have not made any error anywhere. A small error can lead to some serious consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine.

Why using robots.txt?

In order to let the robots index your site properly, they need instruction on which folders or files not to crawl or index, as well as which ones you want to have indexed. Another good reason to use the robots.txt file is because many of the search engines tell the public to use them on their websites and Google is one of them. Keep in mind, no human visitor is looking at that file, yet it ranks better than a lot of the human visited pages.

What are the advantages of using robots.txt file or robots.txt generator?

  • A Robots.txt file is helpful to keep out unwanted search engine spiders like email retrievers, image strippers, etc.
  • A Robots.txt defines which paths are off limits for spiders to visit. This is useful if you want to hide some personal information or some secret files.
  • An absent robots.txt generator may generate a 404 error and redirect the robot to your default 404 error page. A research has confirmed that web sites with no robots.txt file had a customized 404-error page.
Use robots.txt to keep the spiders out of any part of your web page that you want to avoid. For further assistance to write robots.txt file, request a free quote now to see how LSF Network can help you write robots.txt file to keep your secret within you.

© LSF Network, Inc. All rights reserved.  |  Sitemap  |  Links