This is not only about SEO. Now generate leads, Improve your website CONVERSION, make your website LOAD FAST, Catchy DESIGN HACKS, Website SECURITY and many more...

Saturday, March 19, 2011

How Robots.txt and humans.txt works | syntax, types and benifits

Also known as Crawler or Spider, a robot is a search engine program that "crawls" the web, collecting data, following links, making copies of new and updated sites, and storing URLs in the search engine's Index. This allows search engines to provide faster and more up-to-date listings.
This is a file written and stored in the root directory of a website that restricts search engine spiders from indexing certain areas of the website.

First Come First Serve Search Engine robots will follow the first link they find to any particular page. They won’t follow additional links to the same page.

1.   Useragent:*
[it doesnot crawel whole site]
2.   User_agent:*
3.   User_agent:*
[it doesnot crawel 123.html in abc directory]
4.    Meta tag Robots
<name =”robots”; content=index, follow>
<name =”robots”; content=noindex, follow>
<name =”robots”; content=noindex>
<name =”robots”; content=index, nofollow>
<name=”robots”; content=noindex, nofollow>
5.    Use nofollow with hyperlink in page under a href tag. This is used if we don’t want to follow a specific link.
6.    <meta name=”robots”; content = “noscript”>
[It doesn’t crawel description in search engine below title also doesn’t crawel cached pages. This is used when we change whole theme of our site with same domain.]

Meta Robots
The "NAME" attribute must be "ROBOTS".
Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW", so there's no need to spell that out. That leaves:
<META name="ROBOTS" content="NOARCHIVE">
<META name="ROBOTS" content="NOSNIPPET">
<META name="ROBOTS" content="NOODP">
<META name="ROBOTS" content="NOYDIR">

* FOLLOW – a command for the search engine crawler to follow the links in that webpage
* INDEX – a command for the search engine crawler to index that webpage
* NOFOLLOW – a command for the search engine crawler NOT to follow the links in that webpage
* NOINDEX – a command for the search engine crawler NOT to index that webpage

* NOARCHIVE -  Useful if the content changes frequently: headlines, auctions, etc. The search engine still archives the information, but won't show it in the results. 
* NOSNIPPET - Encourages the search engines to use the title only, and to suppress the "cache" link. Might be useful if the site has special plus box listings in search results, but otherwise, not so much. This is also used when we change theme(content) of any website.
* NOODP - Encourages search engines to use the page title tag, and match term in context, or META Description tag content instead of the ODP content, which may be misleading or outdated.
* NOYDIR - Yahoo Slurp robot only (same as above) 

Google =  googlebot
MSN Search = msnbot
Yahoo = yahoo-slurp
Ask/Teoma = teoma
DMOZ Checker = robozilla
Baidu = baiduspider

It's an initiative for knowing the people behind a website.
If possible, you can also add an author tag to the <head> of the site:
<link type="text/plain" rel="author" href="http://domain/humans.txt" />

/* TEAM */
                Chef:Baagdi Solutions
                Contact: info [at]
                Twitter: @baagdi

                CSS3 and HTML5 boy: Rashtra
                Twitter: @rashtra
                From:Sri Ganganagar

                Grphics boy: Sunil
                Twitter: @sunil

/* SITE */
                Last update:2011/01/23
                Doctype:HTML5 with CSS3
                IDE:Baagdi Solutions and Notepad++

1 comment:

  1. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.