How
Search Engines Work
The term "search engine" is often used generically to
describe both crawler-based search engines and human-powered directories.
These two types of search engines gather their listings in radically different
ways.
Crawler-Based Search Engines
Crawler-based search engines, such as HotBot, create
their listings automatically. They "crawl" or "spider" the web, then people
search through what they have found.
If you change your web pages, crawler-based search engines
eventually find these changes, and that can affect how you are listed.
Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as Yahoo, depends on
humans for its listings. You submit a short description to the directory
for your entire site, or editors write one for sites they review. A search
looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing.
Things that are useful for improving a listing with a search engine have
nothing to do with improving a listing in a directory. The only exception
is that a good site, with good content, might be more likely to get reviewed
for free than a poor site.
"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search
engine either presented crawler-based results or human-powered listings.
Today, it extremely common for both types of results to be presented. Usually,
a hybrid search engine will favor one type of listings over another. For
example, Yahoo is more likely to present human-powered listings. However,
it does also present crawler-based results (as provided by Google), especially
for more obscure queries.
The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements.
First is the spider, also called the crawler. The spider visits a web page,
reads it, and then follows links to other pages within the site. This is
what it means when someone refers to a site being "spidered" or "crawled."
The spider returns to the site on a regular basis, such as every month
or two, to look for changes.
Everything the spider finds goes into the second part
of the search engine, the index. The index, sometimes called the catalog,
is like a giant book containing a copy of every web page that the spider
finds. If a web page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes
that the spider finds to be added to the index. Thus, a web page may have
been "spidered" but not yet "indexed." Until it is indexed -- added to
the index -- it is not available to those searching with the search engine.
Search engine software is the third part of a search engine.
This is the program that sifts through the millions of pages recorded in
the index to find matches to a search and rank them in order of what it
believes is most relevant.
|