[Western Oregon University]


Search services index entire web sites and their pages. These search services use programs called robots or spiders to comb World Wide Web, Gopher and FTP sites each day to add new resources to the services database and to update resources that have changed names or addresses. Since these databases are always adding new resources, subsequent searches may find new information. Each site address in the database is listed along with information concerning what the site contains, an outline of the information, and the number of times that the site is referenced by other addresses.


A search engine is a software program that queries the search service database and returns a list of "hits" or documents which contain the terms used in the query. Many search engines sort the hits in order of relevance. Relevancy scores range from 0 to 100 and are determined by the number of times the query words or phrases occur on the page.

Relevancy or results ranking is based on a computer algorithm which identifies items which have a higher frequency of use of your search term or terms than others. This is based on the idea that a document that is more relevant to your topic probably contains your keywords more frequently than a less relevant document. The algorithm can also consider the positioning of the terms within the document, i.e., whether the words are in the title or header, the abstract, at the "top" of the document or at the "bottom, as well as the length of the document. Thus, it is assumed that a document that uses your search term five times in 50 lines will be of more value than a document that uses the term 5 times in 100 lines. Although relevancy scores can be useful in your search, do not automatically assume that a site with a low score has nothing valuable to offer and that one with a high relevancy score must be important. Relevancy scores can be a useful tool for dealing with the large data sets returned by some search engines. It is assumed that the author of a given document was consistent with the language used in the document -- frequently repeating a key term and not using synonyms for that term. For example, in this paragraph the term relevancy has been used several times while results ranking has been used only once. Thus, a search using the phrase results ranking would give this document a lower relevancy score than a search using the term relevancy. If an author wants to target a certain audience (perhaps for a commercial endeavor), he/she can ensure appearing high on a search list by loading up the document with a search term that would be logical for searchers to use who are looking for information on the author's topic. Typically, documents are listed from high to low relevancy scores.

No two search engines have the same exact collection of sites and pages indexed so the same search using two different engines will probably not generate the exact same results. The following entries give information about good general purpose search engines including Boolean operators which can be used in designing searches using the engine.


AltaVista
http://www.altavista.com

Boolean Operators Used By AltaVista
DescriptionOperator
To include words+
To exclude words- ;
To search for wildcards*
Boolean OperatorsAND, OR, NEAR, NOT
Grouping the query( )

Note: Parentheses indicate the order in which the search engine should perform the query.
platinum or (nickel and palladium)
searches for documents including the word platinum or the words nickel or palladium

(platinum or nickel) and palladium
searches for documents that include either the word platinum or nickel and the word palladium

AltaVista allows the researchers to search only a certain element of a Web page (a constrained search). The constraining term is typed in the lowercase and is followed by a colon. Thus, host:www.wou.edu will search only for pages with this host name

Constraining Search Terms
TermFunction
For Web Pages
host:matches pages with a particular host name
image:searches only for images
link:matches pages with at least one link to the URL specified
title:searches only the text in a Web page's title
For Usenet Posts
subject:searches only subject headers
from:searches the from header only
newsgroup:searches newsgroups only



Excite
http://www.excite.com
Excite is a good place to search for rare resources because it indexes more than 50 million Web pages.

Boolean Operators Used By Excite
DescriptionOperator
To include words+
To exclude words-
To group words""
To weight words^number

Weighting search words increases their importance. Normally, the search engine assumes that all the words in a search are of equal importance. Including a weight with a word will increase the importance of the word in the search. Excite will look for that word first giving pages that contain that word a high priority. For example, nuclear^9 reactor waste^3 tells Excite that nuclear is the most important search term followed by waste and then reactor.
Excite's Find Similar feature will show you pages similar to the page your search located.


HotBot
http://www.hotbot.com
Simple searching in HotBot is done by using pull-down menus instead of typing in Boolean operators. You may choose the following search methods:
The HotBot screen has a plus and minus sign on either side of the word "Modify" which can be used to add or subtract words from your search by clicking on the symbol.


Infoseek
http://www.infoseek.com

Boolean Operators Used By Infoseek
DescriptionOperator
To include words+
To exclude words-
To group words""
To find words between the words-
To find words within 100 words of each other[]

The search Hans-Andersen will find the entries Hans Andersen and Hans Christian Andersen

A pull-down menu allows you to choose what internet resourses you wish to search such as the Web, Usenet Newsgroups, Infoseek Select Sites, Web FAQs, etc.


Lycos
http://www.lycos.com

Boolean Operators Used By Lycos
DescriptionOperator
To include words+
To exclude words-
For an exact match.
For searching word fragments$

wave. will return wave but not wavelength
wave$ will get wavelength as well


Open Text
http://www.opentext.com

Simple searching with Open Text involves typing in search terms then using pull-down menus to indicate if the words can be used in any order or should be used as a phrase. You can also use the power search option for more flexibility. The power search option uses a blank box for typing search words or phrases, a pull-down box for indicating where the search should be conducted, and a box for the Boolean operators AND, OR, BUTNOT, NEAR and FOLLOWED BY.

NEAR will look for the search word within 80 characters before and after the search word on the previous line. FOLLOWED BY will look for 80 characters after the word on the previous line.

There are limitations how you can conduct Boolean searches. You can't use parentheses to group Boolean expressions, and expressions will be read in order from left to right.


WebCrawler
http://www.webcrawler.com

WebCrawler was one of the best of the early search engines. Unfortunately, it has remained relatively small and will return a relatively small number of hits. This can be useful if you are searching for a popular topic and don't want a huge number of results to review. WebCrawler is easy to use.

Boolean Operators Used By WebCrawler
DescriptionOperator
Boolean OperatorsAND, OR, NEAR/X, NOT, ADJ
Grouping the query( )
Phrases""

ADJ searches for words next to each other. Carbon ADJ copy would find carbon copy not copy carbon.
NEAR/X where X is a number finds words within a certain number of words to each other.

Check out these sites for more information on search engines:

How to Search the Web - A Guide to Search Tools. This document gives details with examples of how to word simple searches; how rankings are determined; how to conduct advanced searches for Alta Vista, Excite, WebCrawler, Lycos, Opentext, Infoseek, Yahoo.

Chemical Information Resources on the World Wide Web. This paper compares some search engines and also gives some specific Chemical Information Sites.

Introduction to Search Engines. This document from the Kansas City Public Library reviews Alta Vista, Excite, HotBot, Infoseek, Lycos, OpenText, and WebCrawler. This page contains a chart comparing the features of these search engines. It also compares the types of documents besides web pages that the different engines index (gopher, binary files, FTP, Telnet, Newsgroups).

METASEARCH ENGINES

A metasearch engine searches other search engines by formatting your query in the proper form for each individual search engine and then submitting the query. Some metasearch engines report the results as a unified list while others generate lists from each individual search engine. Metasearches do take longer than single engine searches but can be useful because they can bring back results from many different sources. When using a metasearch engine, it is a good idea to check out the help or info section before searching. Metasearch engines do not always support all of the advanced search options available on those search engines they will query.


ALL-IN-ONE Search Page is a compilation of many forms-based tools found on the Internet. You can select from the following search categories: Each of these categories offers many different search engines.


CUSI (Configurable Unified Search Engine) is a search interface which allows you to quickly check a series of related resources without having to retype your search words.


Metacrawler sends your queries to several search engines such as AltaVista, Excite, Galaxy, HotBot, Lycos, WebCrawler, and Yahoo. It performs the search, throws out duplicate entries, and organizes them into a unified list. Metacrawler searches sometimes take several minutes, and no status report is shown on your screen as the search proceeds. You can limit your search to sites on a specific continent, country or domain to decrease the search time. In addition, you may specify a search time limit (1, 2 or 5 minutes.) There is an advanced searching syntax which allows searching for phrases, all the words in the query or any of the words in the query. To use the advanced features, choose Configuration, select your choices and Save Configuration.

Operators Allowed By Metacrawler
DescriptionOperator
To include words+
To exclude words-
To group words""



SavvySearch is a simultaneous search engine that can be queried in 20 different languages. From the keyword in your query, it will select search engines that should be appropriate to the search. Savvy Search includes the following sources and types of information:

You can limit the number of documents that you want returned from each search engine (10, 20, 30, 40, 50) and also how much is displayed on the screen (brief, normal, verbose). When verbose is chosen, URLs will be displayed in the report. The results can be displayed either by search engine or as a uniform list.


Search.com is a c/net's clearinghouse for the top 250 search engines for designated subject categories. There are several search options available. The Express Search option is a quick and dirty search tool. It gives quick hits but doesn't cover every category. In this mode, you type in your query and then choose your search engine (has all the major ones descibed above.) You can do a straight forward search where you type in your query and then specify Search the Web or Usenet. You also can search categories listed on the yellow bar that runs down the lefthand side of the search window. For example, one entry is Search Subjects. When you click on this, a menu of choices appears. One interesting menu item is the Includes Find a Search where you type in a keyword and get a list of suggested Web searches that should fit your query.


Yahoo's List of All-In-One Search Pages allows you to search all of Yahoo or All-in-One Search Pages. From here, you can also travel via hyperlink to another large list of search engines. You may tailor your search via the options feature to include search Yahoo Usenet e-mail addresses; searching various Yahoo categories; search only new listings (default during the past 3 years, one day, one week, one month), use the Boolean AND and OR, etc.

Return to "Searching the Internet" Return to the Chem Home Page




Western Oregon University
Copyright © 1997 Western Oregon University
Direct suggestions, comments, and questions about this page to Arlene Courtney, courtna@wou.edu.
Last Modified January 20, 1999