what is robots txt in seo Robots.txt comes into play in a vital role when you want the crawler to need to visit your site and index the pages which you want. Robots.txt file is an instruction set of a text file that is placed in the root folder of your site. This instruction guide search engine that which page you want to visit & index and which not. Mainly all major search engines including Google obey the instructions of Robots.txt file.

Robots.txt file has a simple structure to maintain, It can contain unlimited user agents and disallowed files & directories. If your website doesn’t have Robots.txt file then crawlers can crawl whole website pages (Good & Bad). It’s not good practice to index pages with broken links. Management of Robots.txt for SEO comes under ON-PAGE SEO.

Let’s start exploring more about Robots.txt file

Basically, the syntax is as follows:



Sitemap: http://www.yoursite.com/sitemap.xml

“User-agent” are search engines’ crawlers

“Disallow” lists the files and directories to be excluded from indexing.

“Sitemap” use to define the location of XML Sitemap file

Also to combine “user-agent:” and “disallow:” you can add comment lines like keeping # sign at the beginning of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

In the above syntax, I explain how Robots.txt tells search engine to crawl all areas of the website except /temp/ directory.

Let me explain above Robots.txt code in detail

User-agent:- It’s for specifying the bot for which instructions are written, but in our example, there is “*” which means these instructions are for all bots.

Disallow:- It’s for specifying which area of the website need not crawl. Like in our example /temp/ directory not to crawl.

Some key benefits of Robots.txt
  • It helps to save some bandwidth by instructing bot to exclude media files like images, style sheets, and JavaScript from indexing.
  • You can secure sensitive data of websites like in the banking segment some pages are only of private use so with Robots.txt they can be secure.
Must to know about this while dealing with Robots.txt file
Check out the below example
User-agent: *

Disallow: /temp/

Disallow: /cgi-bin/

User-agent: Googlebot

Disallow: /img/

Disallow: /css/

Disallow: /js/

Disallow: /temp/

In the above example as per the above discussion first line tells that all bot is allowed to crawl and they can exclude /temp/ & /cgi-bin/ directory as per second and third line. Now forth line tells GoogleBot to crawl website by not touching the directory like /img/ , /css/ , /js/ & /temp/. But’s this is not like this because when Googlebot visits Robots.txt file and found that the first line instructs all bot allow to crawl website excluding /temp/ & /cgi-bin/ directory become enough for Googlebot, it will start crawl website as per first three lines and ignore rest of the file.

How is Your Website Working In Search Engine? Have You Checked Ever? Must Check For Free Below