![]() ![]() |
![]() ![]() |
A Robots.txt file is a special text file that is always located in your Web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define. You should create a robots.txt file using a text editor like windows notepad. Don't use your word processor to create such a file.
Writing text for search engine spiders about your instruction on which web page to open and which is not called Robots.txt. The text is written in HTML code in notepad. A robots.txt should be viewed like a list of recommendations. By including one, you are asking the spiders to visit your site to ignore certain things that you would prefer not to be indexed, but they are not obliged to pay attention to that. If you really do not want things indexed, it is far better to disallow access with server-side programming than a robots.txt.
Here is the basic syntax of the robots.txt file:
User-Agent: [Spider Name]
Disallow: [File Name]
User Agent line specifies the robot. For example:
User-agent: googlebot
You may also use the wildcard character "*" to specify all robots (all search engines):
User-agent: *
Disallow is directive lines, which instructs Google search engine spiders (robots) to follow what you say. If you think to hide some information say 'email' about your website, you can write as follows:
User-Agent: *
Disallow: email.html
Or you may write
User-Agent: *
Disallow: /email/
However, if you leave the Disallow line blank, it indicates that all files can be retrieved. If a website doesn't have a robots txt, it means all files can be retrieved. Once you are through with creating robots.txt file, check to ensure that you have not made any error anywhere. A small error can lead to some serious consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine.
Why using robots.txt?
In order to let the robots index your site properly, they need instruction on which folders or files not to crawl or index, as well as which ones you want to have indexed. Another good reason to use the robots.txt file is because many of the search engines tell the public to use them on their websites and Google is one of them. Keep in mind, no human visitor is looking at that file, yet it ranks better than a lot of the human visited pages.
What are the advantages of using robots.txt file or robots.txt generator?
![]() |