Web Searching Primer - AlbuterolOTC

This page contains a very short introductory course on the basics (and trivia?) of searching the web.

To return to the "Web Research" page, click the "Web Research" button in the navigation bar at the top of the page, or Click Here.

If you are familiar with the basics, you can access a very good web searching course, by the University of Albany of the State University of New York (SUNY). This course allows you to choose topics, such as:
- Boolean Searching on the Internet
- How to Choose a search engine or directory
- A quick reference guide to search engine syntax
When you click on the link below, you will see a list of places to go. At the top of the page, click on "Search Engines and Subject Directories"
To access this course, click here. (You will be leaving albuterolotc.com)

Why Web Search can be difficult.
The web is a repository of an enormous amount of information; the number of web pages is increasing by thousands every day. Every imaginable topic seems to be addressed somewhere on the web. Finding exactly what you are looking for, however, can be a very time-consuming and frustrating task. For example, a recent search of the web, using keywords "Airport and pollution", produced over 40,000 hits!

The tools used to find information on the web are called "Search Engines". Because of the gigantic number of web sites, these "Search Engines" simply cannot dynamically peruse every web site and check for matches to every search request. Instead, the Search Engines build internal information bases about the web. Some build internal tables, called "directories", which help them match web sites to search criteria. Other Search Engines create "Categories" within the Search Engines database. Still others use proprietary methods to store and retrieve information.

These "information bases" are built from several sources. One of them is a "Site Registration" process, where a webmaster registers the destination site in the search engine. (albuterolotc.com has done this in most widely-used search engines.) Another is the Search Engine's effort to "sniff out" new web sites. This is done by sending out little programs, called "web-crawlers" or "spiders", to find new web sites. The "web-crawlers" or "spiders" find the new sites, and then have them entered into the Search Engine's information base. When a new web page is created, it can take weeks before it is found by Search Engines' web crawlers.

Researching topics with a scientific slant, such as Environmental Health, can be much more difficult than "shopping" on the Web. Many Search Engines are actually web businesses, achieving a revenue stream from advertisers. Accordingly, they cater to Users and Businesses that want to participate in commerce; scientific research can be very difficult when using these search engines.

So, finding any information can be difficult; finding NEW information can be very difficult.

Starting the Search
To find the web pages you are seeking, a few things need to happen:
1. The Search Engine must "know about" the target web page.
2. The builder of the Web Page (the "Webmaster") must have included appropriate "search criteria": titles and key words that allow the web site to be "indexed". (These are not directly visible to web Users.)
3. The Search Engine must find and "register" the Web Page's "indexing" information in its information base.
4. The Search Engine must be able to find the web page, in its own information base, by matching User supplied "Search arguments" (more below) to the afore-mentioned builder supplied "Search criteria" (also called "indexing information".
5. The User of the Search Engine (that's us) must know how to communicate with the Search Engine.

If any of the five "must happens" is missing, some frustrating things happen:
1. The Search Engine reports that there were "zero hits", or
2. The Search Engine reports what seems to be totally unrelated hits. These can happen when the Webmaster "cheats " by including keywords just to get visibility anywhere, anytime. These are called "Bogus Hits".
3. The Search Engine does as it's told and correctly finds many hits but most are completely unrelated to the User's intent. An example: A search for "airport AND health", intended to find airports' impact on health , instead gets a list of every airport located health-fitness club in the Western Hemisphere. Even though we asked for them, we really didn't want them. These occur because of a lack of sophistication in either the Search Engine, the User, or both. Let's be kind and call these unwanted hits "Secondary Hits".

The Search Engine and the Webmaster take care of four of the five "must happens" to various degrees. The fifth, formulating the "Search Argument", is entirely up to the User.

Search Arguments: Keywords, Operators and Syntax
Search Engines work better when the User defines exactly what they are looking for. This is done by entering "Search Arguments". "Search Arguments", in turn, contain "Keywords" and "Operators".
For example, suppose we are looking for information about "love songs". We really can't expect to find a song directly from the search; for one thing, very few love songs contain the word "song"; a few don't even contain the word "love". Instead, we hope that somebody has created a web site that contains the "love song" information, and has used "love song" in defining the web site's "Search Criteria", i.e. its title and/or keywords indexing information.
Now, we'll try to find the web site. One would expect that simply entering the keywords <love songs> in the "Search For" box would be enough. But what does this really mean? Does it mean:
1. We want any site whose "Indexing Information" includes either of the words "love" OR "song"?
2. We want only sites in which the words "love" and "song" are adjacent ("love song" - words enclosed in double quotes are interpreted as phrases.)?
3. We want only sites that have both words, but they don't have to be adjacent ("love AND song")?

If we specify only "love song", then we are at the mercy of the Search Engine to decide what we want. Most search engines today would interpret the term "love songs" to mean "love AND songs". A few interpret it as "love OR songs". No wonder different search engines produce vastly different results!

To give control of the search argument to the User, the Search Engine looks for not only keywords, but also for Operators. Operators are words like <AND>, <OR>, <AND NOT> <NOT>, that are inserted before and between the keywords, to refine the search. To complicate things, different Engines expect different "Syntax" to the Operators. One engine could accept <love songs>, another would require <love AND songs>, and another <+love +songs>. If you wanted to exclude Engelbert's songs, you might enter <love AND songs NOT Engelbert> in one engine, or <+love +songs -Engelbert> in another engine. (Note: the <> marks are used here just to enclose the phrases. They are not used in searching.)

	User Enters:	Interpreted as:	Notes
Engine A	<love songs>	<love AND songs>
Engine A	<love songs NOT Engelbert>	<love AND songs and unpredictable results >	can't recognize <NOT>, may even think it's a keyword. Will include Engelbert.
Engine B	<love songs>	interpreted as love OR songs
Engine B	<love songs NOT Engelbert>	<love OR songs and unpredictable results >	can't recognize <NOT>, may even think it's a keyword. Will include Engelbert.
Engine C	<love AND songs>	always interpreted as <love AND songs>
Engine C	<love AND songs NOT Engelbert>	<love AND songs AND NOT Engelbert>	recognizes operators <AND> and <NOT>. Excludes Engelbert.
Engine D	<+love +songs>	always interpreted as <love AND songs>
Engine D	<+love +songs -Engelbert>	<+love +songs -Engelbert>	Recognizes operators <+> and <->. Excludes Engelbert.

Boolean Search
When using Operators <AND>, <OR>, <NOT>, etc. you are performing a "Boolean" search. (This is named after mathematician "George Boole", whose pioneering work contributed greatly to digital computing.) Whether the syntax requires you to enter the word <AND> or the Plus sign <+>, it's still a boolean search. A Boolean Search is actually the most precise of searches. It allows you to decide not only on AND/OR combinations, but also to define MUST HAVE, NEAR, FAR and other terms. It sounds complicated, but it really isn't. Some search engines don't support boolean search at all; they just assume that you want an <AND> or <OR> between your keywords. No wonder two search engines will produce vastly different results; one is "ANDing", the other "ORing" your keywords!! Many Engines offer Boolean Search in an "Advanced Search" mode. Still others require it. Use the "Help" function of the individual Search Engine to determine how to format your search arguments.

Basic vs. Advanced Search
Most Search Engines start out offering a "basic search". This will usually allow a string of words to be entered. The "Basic Search" then automatically inserts the <AND> or <OR> function between each word, depending on the engine, and performs the search. If you know enough to enter the <OR> operator, or even the <AND NOT>, it MAY recognize them. When browsing for information in a complex field like Environmental-Health, it is usually better to go directly to the "Advanced Search" Option. The "Clicker" for Advanced Search could be anywhere on the Search Engine's page. If you do a basic search first, the first "Results Page" almost always includes a place to click to "Advanced Search". Look carefully, it could be at the end of the page.

Not all "Advanced Search" options are the same. Some offer pull-down menus, offering choices like "Must Include All words". (Another way of offering to insert <AND> between each keyword). These are not very flexible, and will produce a large number of Secondary Hits. Good "Advanced Search" functions are accompanied by good "Help" functions. Always check out the "Search Help", and you will quickly get a handle on the Operators and Syntax of the Search Engine. If the Advanced Search offers "Boolean Search", use it.

Search the Search Engines; the Meta-Search
Some Search Engines actually search other search Engines. They collect the information, and present it as one set of 'hits'. These are called "Meta-Search" engines. One obvious advantage is that the meta-engine saves the User the effort of trying multiple search engines. A possible disadvantage is having one of the target Search Engines return a bunch of bogus or secondary 'hits'. On the flip side, if two or more search engines return the same information, the Meta-Search Engine will only present one of them. The table on the "Search The Web" page indicates which Search Engines are "Meta-Search" Engines.

Narrowing the Search
No matter what type of search engine or syntax you are using, you will undoubtedly receive many secondary hits, and sometimes bogus hits. To narrow the search, it's usually easiest to do a "recursive" search; you just do it over and over until you get it right!

Example of a Successful Search
Here is an example using AOL's "Netfind" to do a recursive search:
The User wants information about "airports" impact on "air pollution" and health.

User Enters:	Engine Reports:
Using the basic search engine, the User enters <Airport AND Pollution>	15, 617 hits. Many of the hits deal with groundwater, and other issues. Relevant, important hits are probably buried far down in the list. Who has the patience to look through a pile this big?
<airport AND air pollution>	The engine assumes that the user is looking for <airport AND air AND pollution>. Hit count is a little less, but includes any of the first set of hits that contain the word <Air>
<Airport AND "air pollution"> - (phrases are almost always included in double quotes"	10,920 hits. Fewer, more focused hits because the target must include the words "air" and "pollution" next to each other.
User finds the "Advanced Search" clicker at the bottom of the first Results Page. Enters <Airport AND "Air Pollution" AND Health>	6368 hits.
User notices hotels and fitness centers are in the list. Enters <Airport AND "Air Pollution" AND Health AND NOT business>	1426 hits
User continues to eliminate unwanted hits.	A Results list of 20 is excellent. It will probably contain what you are looking for.

Summary and Lesson Plan
1. Understand the Concepts. Use this Primer, and/or the SUNY Albany course.
2. Learn and use Boolean search techniques.
3. Experiment with two or three Search Engines. Learn how to format their search arguments.
4. Try one of the Meta-Search Engines, for Example "Copernicus", mentioned in the Search The Web Page.
5. Build your own set of search arguments for a particular subject.