Online Outsourcing: Crawler : Spider

It is also known as spidering. It is an automated computer program which browses all the sites of internet thoroughly. It keeps the data of the sites and continues crawling of these sites in case of getting some updations of those sites. Now a question may arise why search engines use web crawler. There is a lot of reason behind using web crawler. Some of them are noted here like the following –

Generally a search engine uses crawler to the sites to get up to date data and to take all the data to its memory through copying pages of those sites.
To check both the external and internal links of the pages of those sites.
To verify and validate the HTML code of those pages.
To collect different types of information from different types of site pages.

Generally a search engine uses some policies for crawling data like some of these. Focused policy is to search the similarity of a particular theme. Selection policy is used by the search engines to index more than 50% content of the pages of a particular site. But this policy is not so good one. Url normalization policy is to prevent crawling a same information more than one time. Re-visit policy is used for crawler through which search engines collects data after a certain period of time from a same site to get the probable updation or deletaion of data.

Search Engine Crawler	Search Engine Name
Googlebot	Google
Yahoo Slrup	Yahoo
MSNbot	Bing
GRUP	Wikipedia

These are the crawlers of the top most and famous search engines we are using in our day to day purpose.

Online Outsourcing

Pages

Crawler : Spider

No comments:

Post a Comment