Free Online Games | Free Software Downloads 
Search

  Home    Recent Articles    Most visited    Highest rated    Forum  
Home Site Promotion

How Search Engines Find Documents
September 7, 2008, 12:00 am | visits: 121 | wordcount: 419

 by: Kamlesh Patel

Every document on the Web is associated with a URL (Uniform Resource Locator). Inthis context, we will use the terms "document" and "URL" interchangeably. This is an oversimplification, as some URLs return different documents to the user depending on such factors as their location, browser type, form input etc., but this terminology suits our purposes for now.

To find every document on the Web would mean more than finding every URL on the Web. For this reason, search engines do not currently attempt to locate every possible unique document, although research is always underway in this area. Instead, crawling search engines focus their attention on unique URLs; although some dynamic sites may display different content at the same URL (via form inputs or other dynamic variables), search engines will see that URL as a single page.

The typical crawling search engine uses three main resources to build a list of URLs to crawl. Not all search engines use all of these:

Hyperlinks on existing Web pages

The bulk of the URLs found in the databases of most crawling search engines consists of links found on Web pages that the spider has already crawled. Finding a link to a document on one page implies that someone found that link important enough to add it to their page.

Submitted URLs

All the crawling search engines have some sort of process that allows users or Website owners to submit URLs to be crawled. In the past, all search engines offered a free manual submission process, but now, many accept only paid submissions. Google is a notable exception, with no apparent plans to stop accepting free submissions, although there is great doubt as to whether submitting actually does anything.

XML data feeds

Paid inclusion programs, such as the Yahoo! Site Match system, include trusted feed programs that allow sites to submit XML-based content summaries for crawling and inclusion. As the Semantic Web begins to emerge, and more sites begin to offer RSS (RDF Site Summary) news feed files, some search engines have begun to read these files in order to find fresh content.

Search engines run multiple crawler programs, and each crawler program (or spider) receives instructions from the scheduler about which URL (or set of URLs) to fetch next. We will see how search engines manage the scheduling process shortly, but first, let's take a look at how the search engine's crawler program works.

Source: http://www.elitedatasolution.com

About The Author

Kamlesh Patel

I'm freelancer Search engine optimization expert from India. We

provide Search engine optimization services including link building,

meta tags etc.

info@elitedatasolution.com

Google
 
Web www.articles3000.com
E-mailE-mail  Printer friendlyPrinter  PublisherPublisher  


Rate this article: 1 2 3 4 5  

Related articles...
SEO 1,2,3 for Dummies
Using Your Rotator Site as an Online Secretary
Increase Your Traffic For Free
Should You Buy Text Links?
Search Engine Marketing and Website Content
SEO - WSSYA (Where to Successfully Submit Your Articles) for high rankings
Beginner's Guide To Site Promotion
Web Site Traffic - 5 Inexpensive Ways to Generate it!
How Do I Get Hits To My Websites?
Search Engine Optimisation: How Accurate are Keyword Tools?
   Related Tags
   Bookmark Us
Set this page as your
home page

Add this page to your favorites:
   Categories
Advice
Aging
Arts and Crafts
Auto and Trucks
Break-up
Business
Business and Finances
Cancer Survival
Career
Cheating
Classifieds
Computers and The Internet
Cooking
Culture
Dating
Death
Education
Entertainment
Etiquette
Family
Finances
Food and Drink
Gadgets and Gizmos
Gardening
Health
Hobbies
Home Improvement
Humor
Internet
Jobs
Kids and Teens
Leadership
Legal Matters
Marketing
Marriage
Medical Business
Medicines and Remedies
Online Business
Opinions
Parenting
Pets and Animals
Poetry
Politics
Real Estate
Recreation
Recreation and Sports
Relationships
Religion
Self Improvement and Motivation
Sexuality
Short Stories
Site Promotion
Society
Travel and Leisure
Web Development
Women
World Affairs
Writing
   Our Picks
Limewire
AVG Free
MSN Messenger 7.5
Download Firefox
DVD Shrink
DC++
Partition Magic
Ares Galaxy
   Partners
Download free software
Free Online Games
Miniclip
  
Powered by Apache, PHP, MySQL © 2006 Elerion, ltd.