This page
contains a very short introductory course on the basics (and trivia?) of
searching the web.
To return to the "Web Research" page,
click the "Web
Research" button in the navigation bar at the top of the
page, or Click Here.
If you are familiar with
the basics, you can access a very good web
searching course, by the University of Albany of the State University of
New York (SUNY). This course
allows you to choose topics, such as:
- Boolean Searching on the Internet
- How to Choose a search engine or directory
- A quick reference guide to search engine syntax
When you click on the link below, you will see a list of places to go.
At the top of the page, click on "Search Engines and Subject
Directories"
To access this course, click
here. (You will be leaving albuterolotc.com) |
An Introduction to Web Searching
Why Web Search can be
difficult.
The web is a repository of an enormous amount of information; the number of web
pages is increasing by thousands every day. Every imaginable topic seems to
be addressed somewhere on the web. Finding exactly
what you are looking for, however, can be a very time-consuming and frustrating
task. For example, a recent search of the web, using keywords "Airport and
pollution", produced over 40,000 hits!
The tools used to find information on the web are called "Search Engines".
Because of the gigantic number of web sites, these "Search Engines"
simply cannot dynamically peruse every web site and check for matches to every search
request. Instead, the Search Engines build internal information bases
about the web. Some build internal tables, called
"directories", which help them match web sites to search criteria.
Other Search Engines create "Categories" within the Search Engines
database. Still others use proprietary methods to store and retrieve
information. These
"information bases" are built from several sources. One of them
is a "Site Registration" process, where a webmaster registers the
destination site in the search engine. (albuterolotc.com has done this in most
widely-used search engines.) Another is the Search Engine's effort to
"sniff out" new web sites. This is done by sending out little
programs, called "web-crawlers" or "spiders", to find new web sites. The
"web-crawlers" or "spiders" find the new sites, and then have them entered into the
Search Engine's information base. When a new web page is created, it can take
weeks before it is found by Search Engines' web crawlers. Researching
topics with a scientific slant, such as Environmental Health, can be much more
difficult than "shopping" on the Web. Many Search Engines are actually
web businesses, achieving a revenue stream from advertisers. Accordingly, they
cater to Users and Businesses that want to participate in commerce; scientific
research can be very difficult when using these search engines. So, finding any
information can be difficult; finding NEW information can be very difficult. Starting
the Search
To find the web pages you are seeking, a few things need to happen:
1. The Search Engine must "know about" the target web page.
2. The builder of the Web Page (the "Webmaster") must have included
appropriate "search criteria": titles and key
words that allow the web site to be "indexed". (These are not
directly visible to web Users.)
3. The Search Engine must find and "register" the Web Page's
"indexing" information in its information base.
4. The Search Engine must be able to find the web page, in its own information
base, by matching User supplied "Search arguments" (more below) to the afore-mentioned builder supplied
"Search criteria" (also called
"indexing information".
5. The User of the Search Engine (that's us) must know how to communicate with
the Search Engine. If any of the five "must
happens" is missing, some frustrating things happen:
1. The Search Engine reports that there were "zero hits", or
2. The Search Engine reports what seems to be totally unrelated hits. These can
happen when the Webmaster "cheats " by including keywords just to get
visibility anywhere, anytime. These are called "Bogus
Hits".
3. The Search Engine does as it's told and correctly finds many hits but most are completely unrelated to the User's
intent.
An example: A search for "airport AND health", intended to find
airports' impact on health , instead gets a list of every airport located
health-fitness club in the Western Hemisphere. Even though we asked for them, we
really didn't want them. These occur because of a lack of sophistication in
either the Search Engine, the User, or both. Let's be kind and call these unwanted hits
"Secondary Hits". The
Search Engine and the Webmaster take care of four of the five "must happens"
to various degrees. The fifth, formulating the "Search Argument", is entirely up to the User. Search
Arguments: Keywords,
Operators and Syntax
Search Engines work better when the User defines exactly what they are
looking for. This is done by entering "Search
Arguments". "Search Arguments", in turn, contain "Keywords"
and "Operators".
For example,
suppose we are looking for information about "love songs". We really
can't expect to find a song directly from the search; for one thing, very
few love songs contain the word "song"; a few don't even contain the
word "love". Instead, we hope that somebody
has created a web site that contains the "love song" information, and
has used "love song" in defining the web site's "Search
Criteria", i.e. its title and/or keywords indexing information.
Now, we'll try to find the web site. One would expect that simply
entering the keywords <love songs> in the "Search For" box
would be enough. But what does this really mean? Does it mean:
1. We want any site whose "Indexing Information" includes either of the
words "love" OR "song"?
2. We want only sites in which the words "love" and "song"
are adjacent ("love song" - words enclosed in double quotes are
interpreted as phrases.)?
3. We want only sites that have both words, but they don't have to be adjacent
("love AND song")? If we specify only
"love song", then we are at the mercy of the Search Engine to decide
what we want. Most search engines today would interpret the term "love
songs" to mean "love AND songs". A few interpret it as "love
OR songs". No wonder different search engines produce vastly different
results! To give control of the search argument to the
User, the Search Engine looks for not only keywords, but also for Operators.
Operators are words like <AND>, <OR>, <AND NOT> <NOT>,
that are inserted before and between the keywords, to refine the search. To complicate things, different Engines expect
different "Syntax" to the Operators. One engine could
accept <love
songs>, another would require <love AND songs>, and another
<+love
+songs>. If you wanted to exclude Engelbert's songs, you might enter <love AND songs NOT Engelbert>
in one engine, or <+love +songs -Engelbert> in another engine.
(Note: the <> marks are used here just to enclose the phrases. They are
not used in searching.) Search
Arguments: Keywords,
Operators and Syntax Example
|
User Enters: |
Interpreted as: |
Notes |
Engine A |
<love songs> |
<love AND
songs>
|
|
<love songs
NOT Engelbert> |
<love AND
songs and unpredictable results > |
can't recognize <NOT>, may even
think it's a keyword. Will include Engelbert. |
Engine B |
<love songs>
|
interpreted as love OR songs |
|
<love songs NOT Engelbert> |
<love OR songs and unpredictable
results > |
can't recognize <NOT>, may even
think it's a keyword. Will include Engelbert. |
Engine C |
<love AND songs>
|
always
interpreted as <love AND songs> |
|
<love AND songs NOT Engelbert> |
<love AND songs AND NOT Engelbert> |
recognizes operators <AND> and
<NOT>.
Excludes Engelbert. |
Engine D |
<+love +songs>
|
always interpreted as <love AND
songs> |
|
<+love +songs -Engelbert> |
<+love +songs -Engelbert> |
Recognizes operators <+> and
<->.
Excludes Engelbert. |
Boolean
Search
When using Operators <AND>, <OR>, <NOT>, etc. you are performing a
"Boolean" search. (This is named after mathematician "George Boole",
whose pioneering work contributed greatly to digital computing.) Whether the
syntax requires you to enter the word <AND> or the Plus sign <+>, it's still a boolean
search. A Boolean
Search is actually the most precise of searches. It allows you to decide not only on AND/OR
combinations, but also to define MUST HAVE, NEAR, FAR and other terms. It sounds
complicated, but it really isn't. Some search engines don't support boolean
search at all; they just assume that you want an <AND> or <OR>
between your keywords. No wonder two search engines will produce vastly
different results; one is "ANDing", the other "ORing" your
keywords!! Many Engines offer Boolean Search in an "Advanced Search" mode. Still
others require it. Use the "Help" function of the individual Search
Engine to determine how to format your search arguments. Basic
vs. Advanced Search
Most Search Engines start out offering a "basic search". This will
usually allow a string of words to be entered. The "Basic Search" then
automatically inserts the <AND> or <OR> function between each word,
depending on the engine, and performs the
search. If you know enough to enter the <OR> operator, or even the <AND
NOT>, it MAY recognize them. When browsing for information in a complex field
like Environmental-Health, it is usually better to go directly to the
"Advanced Search" Option. The "Clicker" for Advanced Search
could be anywhere on the Search Engine's page. If you do a basic search first,
the first "Results Page" almost always includes a place to click to
"Advanced Search". Look carefully, it could be at the end of the page. Not
all "Advanced Search" options are the same. Some offer pull-down menus,
offering choices like "Must Include All words". (Another way of
offering to insert <AND> between each keyword). These are not
very flexible, and will produce a large number of Secondary Hits. Good "Advanced Search" functions are accompanied by
good "Help" functions. Always check out the "Search Help",
and you will quickly get a handle on the Operators and Syntax of the Search
Engine. If the Advanced Search offers "Boolean Search", use it.
Search the Search Engines;
the Meta-Search
Some Search Engines actually search other search Engines. They collect the
information, and present it as one set of 'hits'. These are called
"Meta-Search" engines. One obvious advantage is that the meta-engine
saves the User the effort of trying multiple search engines. A possible
disadvantage is having one of the target Search Engines return a bunch of bogus
or secondary 'hits'. On the flip side, if two or more search engines
return the same information, the Meta-Search Engine will only present one of
them. The table on the "Search The Web" page
indicates which Search Engines are "Meta-Search" Engines.
Narrowing
the Search
No matter what type of search engine or syntax you are using, you will
undoubtedly receive many secondary hits, and sometimes bogus hits. To narrow the
search, it's usually easiest to do a "recursive" search; you just do
it over and over until you get it right! Example
of a Successful Search
Here is an example using AOL's "Netfind" to do a recursive search:
The User wants information about "airports" impact on "air
pollution" and health.
User Enters: |
Engine Reports: |
Using the basic search engine, the User
enters <Airport AND Pollution> |
15, 617 hits. Many of the hits deal with
groundwater, and other issues. Relevant, important hits are probably
buried far down in the list. Who has the patience to look through a pile
this big? |
<airport AND air pollution> |
The engine assumes that the user is looking
for <airport AND air AND pollution>. Hit count is a little less,
but includes any of the first set of hits that contain the word
<Air> |
<Airport AND "air
pollution"> - (phrases are almost always included in double
quotes" |
10,920 hits. Fewer, more
focused hits because the target must include the words "air"
and "pollution" next to each other. |
User finds the "Advanced Search"
clicker at the bottom of the first Results Page.
Enters <Airport AND "Air Pollution" AND Health> |
6368 hits. |
User notices hotels and fitness centers are
in the list. Enters <Airport AND "Air Pollution" AND Health
AND NOT business> |
1426 hits |
User continues to eliminate unwanted
hits. |
A Results list of 20 is excellent. It will
probably contain what you are looking for. |
Summary
and Lesson Plan
1. Understand the Concepts. Use this
Primer, and/or the SUNY Albany course.
2. Learn and use Boolean search techniques.
3. Experiment with two or three Search Engines. Learn how to format their search
arguments.
4. Try one of the Meta-Search Engines, for Example "Copernicus",
mentioned in the Search
The Web Page.
5. Build your own set of search arguments for a particular subject.
Had enough?
Comments or Feedback? Click
here to contact us.
Find Out
About the State University of New York at Albany Internet Tutorial.
|