Tuesday, August 3, 2010

How does a robot decide where to visit?

This depends on the robot, each one uses different strategies. In general they start from a historical list of URLs, especially of documents with many links elsewhere, such as server lists, "What's New" pages, and the most popular sites on the Web. Most indexing services also allow you to submit URLs manually, which will then be queued and visited by the robot.
Sometimes other sources for URLs are used, such as scanners through USENET postings, published mailing list achives etc.
Given those starting points a robot can select URLs to visit and index, and to parse and use as a source for new URLs.

No comments:

Post a Comment

geograhical factors

geograhical factors