CRAWLER



 

 

Ilial, Inc. is a Los Angeles based Internet startup backed by blue chip Northern and Southern California Venture Capitalists. We have developed technologies that will redefine the $20B online advertising industry. After several years of stealth development, the company is poised to launch its service. Additional information regarding our privacy policy can be found on the following page Privacy Policy.
 
You may be wondering about "ilial"¯ agent and be curious about why it is visiting your site. The Ilial crawler (robot), which identifies itself as ilial in the HTTP "User-agent" header field, uses a web-wide crawl strategy. Basically, it starts with a list of known URLs from across the entire Internet, and then it fetches local links found as it goes.
 
We will not crawl anything you would like to remain private. By using the Standard for Robot Exclusion (SRE) you can let us know not to crawl your site. Based on www.robotstxt.org regulations, we honor both robots.txt and HTML META TAG format. According to this regulation, if none of these standards are set, by default it means we can crawl those pages.
 
There are two ways to avoid Ilial¯ crawl your site or your page:
 

 

Using HTML META TAG


Place the following meta tag in the head of your HTML document:
 
<META NAME="ilial" CONTENT="nofollow">
 
To learn more about meta tags, please refer to http://www.robotstxt.org/wc/exclusion.html#meta; you can also read what the HTML standard has to say about these tags. Remember, changes to your site won't be immediately reflected; they'll be discovered and propagate when ilial next crawls your site.

 

Using "robots.txt" file


The Ilial crawler looks for a file called "robots.txt". Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots. The Ilial crawler will always pick up a copy of the robots.txt file prior to its crawl of the Web.
 
To exclude all robots, the robots.txt file should look like this:
 
User-agent: *
Disallow: /
 
To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this:
 
User-agent: *
Disallow: /images/
 
Web site administrators can allow or disallow specific robots from visiting part or all of their site. Ilial's crawler identifies itself as ilial, and so to allow ilial to visit (while preventing all others), your robots.txt file should look like this:
 
User-agent: ilial
Disallow:
 
To prevent ilial from visiting (while allowing all others), your robots.txt file should look like this:
 
User-agent: ilial
Disallow: /
 
For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at www.robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion. You may email us at crawladmin@ourdomainname.com for any other concern regarding our crawler.

 

Terms of Service  Privacy Policy

© Ilial, Inc. All right reserved.