Robots.txt, also known as the Robot Exclusion, is key in preventing search engine robots from crawling restricted areas of your site. Watch a video or read more below!
You may want to block robots from indexing private photos, expired special offers or other pages that you’re not ready for users to access. Blocking pages can also help your SEO. Robots.txt can solve issues with duplicate content (however there may be better ways to do this, which we will discuss later). When a robot begins crawling, they first check to see if a robots.txt file is in place that would prevent them from viewing certain pages.
When should I use a Robots.txt file?
It is only necessary to use a robots.txt file if you don’t want search engines to index certain pages or content. If you want search engines (like Google, Bing and Yahoo) to access and index your entire site, you don’t need a robots.txt file (also, in some cases, people do use the robots.txt to point users to a sitemap).
However, if other sites link back to pages on your website blocked by robots.txt, search engines may still index the URLs and they may appear in search results. To prevent this from occurring, use an x-robots-tag, noindex meta tag or rel canonical to the appropriate page.
How do I create a Robots.txt file?
If you would like to set up a robots.txt file, the process is actually quite simple and involves two elements: the “user-agent,” which is the robot the following URL block applies to, and “disallow,” which is the URL you want to block. These two lines are seen as one single entry in the file, meaning that you can have several entries in one robots.txt file.
For the user-agent line, you can list a specific bot (such as Googlebot) or can apply the URL block to all bots by using an asterisk. The following is an example of a user-agent blocking all bots.
The second line in the entry, disallow, lists the specific pages you want to block. To block the entire site, use a forward slash. For all other entries, use a forward slash first and then list the page, directory, image, or file type. See the following examples:
Disallow: / blocks the entire site.
Disallow: /bad-directory/ blocks both the directory and all of its’ contents.
Disallow: /secret.html blocks a page.
After making your user-agent and disallow selections, one of your entries may look like this:
View other example entries from Webmaster Tools.
Save your file by copying it into a text file or notepad and saving as “robots.txt”. Be sure to save the file to the highest-level directory of your site and ensure that it is in the root domain with a name exactly matching “robots.txt”.
How do I know that it worked?
Test your site’s robots.txt file in Webmaster Tools to ensure that the bots are crawling parts of the site you want and staying away from areas you’ve blocked.
- Choose the site you want to test
- Click “Blocked URLs” under “Crawl”
- Select the “Test robots.txt” tab
- Paste your robots.txt content into the first box
- List the site to test against in the “URLs” box
- Select the user-agents you want in the “User-agents” list
Keep in mind that this will only test the Googlebot and other Google-related user-agents.