A web crawler is one kind of bot, or software agent. Generally, it starts with a list of URLs to visit, it is called the seeds. As the crawler visits these URLs, it identifies each and every one of the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. The URLs from the frontier are recursively visited according to a set of different policies.
No comments:
Post a Comment