Search engine

From Citizendium
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Applications for retrieving bibliographic information fall into several categories: those with a controlled search vocabular and expert-friendly command languages (e.g., MEDLINE), which return a small number of items very likely to be relevant, and those that are user-friendly and sensitive, but are more likely to return false positives matches irrelevant to the question. Data being searched can be on one server, a federation of servers, or an aribitrary and ever-changing set of servers.

The servers may or may not have direct access to the documents being retrieved.

Controlled vs. Free Vocabulary

A controlled vocabulary system uses a well-defined and public set of keywords, such as Medical Subject Headings (MeSH) for MEDLINE, or formal chemical notation for systems such as Chemical Abstracts. To use this sort of tool effectively, the user has to have a good idea of the content and structure of the descriptor database.

In addition to keywords, which are apt to be nouns, a search engine using Semantic Web or equivalent techniques also uses verbs and prepositions in the query (e.g., display drugs that treat a given condition), or (crugs that cause a given side effect).

Proximity searches

This kind of search can match only multiple-word strings, but also take qualifiers that a match of multiple terms are valid only if thy are in the same sentence, the same paragraph, or the same page; or are separated only by a certain number of words.

Search operations

The more powerful, expert friendly search engines allow combining sets of intermediate searches. For example the user could create set 1, defined as all records containing keywords A and B, either keyword C or D, and not keyword E. Set 2 might have all records that contain B and C, but not F. As a second step, the user could ask for the entries that are in booth sets.