Hundreds of search engines send out their robots every day to crawl the web. Whether it is for indexing or spam purposes you may not want them to crawl some of your files
or folder like the /images folder of your site as it is meaningless to you and is also a waste of your bandwidth.
The file robots.txt is a file that includes directives instructing these robots on what should be excluded from their visits. This file has to be named exactly 'robots.txt'
and has to be placed in your root directory (no other place or folder). If, for example your web site address is www.mydomain.com, then the web address to this file should
be:
http://www.mydomain.com/robots.txt
Configuring the robots.txt file
The syntax used in this file is simple and uses two different keywords:
User-agent: Disallow:
The keyword 'User-agent' is used to specify the name of a
robot (e.g. User-agent: Googlebot). The keyword 'Disallow' is used to provide the name of a file or folder that the search engine spider should not crawl.
The
following robots.txt file was engineered to suit any of our standard store. Stores with additional custom modification
may require additional changes. If your store does not have any major custom changes or additional folder then the following robots.txt file should be used. Please note that at delivery time, we already included a copy of this file with your store.
Because your store can also be accessed via HTTPS. Search engines robots accessing your site via both HTTP and HTTPS may think that you are providing duplicate content. To
correct this issue you need to add the following lines to your basic authentication file '.htaccess' file:
RewriteEngine on RewriteCond % on RewriteRule ^robots\.txt$ robots-https.txt
Then you need to create a second robots.txt file but with a different name. It should be called robots-https.txt and have the following content:
User-agent: * Disallow: /
This will actually prevent all search engines robots from doing any crawling of your site via HTTPS.
Please note that both changes, the addition to the .htaccess file and the new file robots-https.txt have already been made to your store before your store was first delivered to you.