Sabtu, 11 Juli 2009

How search engines work


Search engine site that always take place on the register have been provided. Thus, you should see the first place determited by the search engine is. Part of the search engines have three important component.

First is also called a spider or a crawler on. Element spider to access the website, read its contents, and then follow the link on the site. Element access this web site periodically one or two months to see if there is a change from the site.

Index is the second element of the search engine. Index is often called a similiar catalog of a book that contains copies of every site you visit in the spider element. If you find a spider element changes from a web, the information from a web site that is the catalog as soon as possible by refining the search engine.

PARTS OF THE SEARCH ENGINE
There are three main parts to every search engine:
-spider
-index
-web interface

SPIDER

A spider crawls the web. it follows links and scans web pages. all search engines have periods of deep crawl and quick crawl. during a deep crawl, the spider follows all links it can find and scans web pages in their entirety.

During a quick crawl, the spider does not follow all links and may not scan pages in their entirety.

The job of the spider is to discover new pages and to colleck copies of those pages, which are then analyzed in the index.

CRAWL RATE

Pages that are considered important get crawled frequently. for example, the new york times may be crawled every hour or so to put new stories in the index.

Less authoritative sites with less PR are crawled less frequently, even as rarely as rarely as once a month. the crawl rate depends directly on link popularity and domain authority.

If many links point to a website. it may be an important site, so it makes sense to crawl it more often than a site with fewer links. this is also a money-saving issue.

If search engines were to crawl all sites at an equal rate, it would take more time overall and cost more as a result.

MORE SPIDER FEATURES

Spiders may check for duplicate content before passing page copy to the index, in order to keep the index clean (or at least cleaner).

Tidak ada komentar: