Information retrieval: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Badgett
mNo edit summary
imported>Robert Badgett
m (Fixed citations)
Line 1: Line 1:
{{subpages}}
{{subpages}}
{{TOC-right}}
'''Information retrieval''' is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."<ref name="title">{{cite web |url=http://www.nlm.nih.gov/cgi/mesh/2007/MB_cgi?term=Information+Storage+and+Retrieval |title=Information Storage and Retrieval |accessdate=2007-12-12 |author=National Library of Medicine |authorlink= |coauthors= |date= |format= |work= |publisher= |pages= |language= |archiveurl= |archivedate= |quote=}}</ref> As noted by [[Carl Sagan]], "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."<ref name="isbn0-345-34629-7">{{cite book |author=Sagan, Carl |authorlink= |editor= |others= |title=The Dragons of Eden: Speculations on the Evolution of Human Intelligence |edition= |language= |publisher=Ballantine Books |location=New York |year=1993 |origyear= |pages= |quote= |isbn=0-345-34629-7 |oclc= |doi= |url= |accessdate=}}</ref> The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.<ref name="pmid7719819">{{cite journal |author=de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D |title=Information retrieved from a database and the augmentation of personal knowledge |journal=J Am Med Inform Assoc |volume=1 |issue=4 |pages=328–38 |year=1994 |pmid=7719819 |doi=}}</ref>
'''Information retrieval''' is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."<ref name="title">{{cite web |url=http://www.nlm.nih.gov/cgi/mesh/2007/MB_cgi?term=Information+Storage+and+Retrieval |title=Information Storage and Retrieval |accessdate=2007-12-12 |author=National Library of Medicine |authorlink= |coauthors= |date= |format= |work= |publisher= |pages= |language= |archiveurl= |archivedate= |quote=}}</ref> As noted by [[Carl Sagan]], "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."<ref name="isbn0-345-34629-7">{{cite book |author=Sagan, Carl |authorlink= |editor= |others= |title=The Dragons of Eden: Speculations on the Evolution of Human Intelligence |edition= |language= |publisher=Ballantine Books |location=New York |year=1993 |origyear= |pages= |quote= |isbn=0-345-34629-7 |oclc= |doi= |url= |accessdate=}}</ref> The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.<ref name="pmid7719819">{{cite journal |author=de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D |title=Information retrieved from a database and the augmentation of personal knowledge |journal=J Am Med Inform Assoc |volume=1 |issue=4 |pages=328–38 |year=1994 |pmid=7719819 |doi=}}</ref>


Line 14: Line 15:


===Information awareness===
===Information awareness===
Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he <nowiki>[the searcher]</nowiki> did not know existed".<ref name="garfiled1966"/> Examples of this prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters<ref name="pmidpendingGrad">Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).</ref>, email alerts, and RSS feeds.<ref name="titleAlgorithms Are Terrific. But to Search Smarter, Find a Person">{{cite web |url=http://www.wired.com/techbiz/it/magazine/16-04/bz_curator |title=Algorithms Are Terrific. But to Search Smarter, Find a Person |accessdate=2008-04-04 |author= |authorlink= |coauthors= |date= |format= |work= |publisher= |pages= |language= |archiveurl= |archivedate= |quote=}}</ref>
Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he <nowiki>[the searcher]</nowiki> did not know existed".<ref name="garfiled1966"/> Examples of this prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters<ref name="pmidpendingGrad">Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).</ref>, email alerts, and RSS feeds.<ref name="titleAlgorithms Are Terrific. But to Search Smarter, Find a Person">{{cite web |url=http://www.wired.com/techbiz/it/magazine/16-04/bz_curator |title=Algorithms Are Terrific. But to Search Smarter, Find a Person |accessdate=2008-04-04 |author=Koerner B |authorlink= |coauthors= |date=2008 |format= |work= |publisher=Wired Magazine |pages= |language= |archiveurl= |archivedate= |quote=}}</ref>


==Classification by indexing methods used==
==Classification by indexing methods used==
Line 36: Line 37:


====Display of information====
====Display of information====
Information that is structured was found to be more effective in a controlled study.<ref name="pmid3528494">{{cite journal |author=Beck AL, Bergman DA |title=Using structured medical information to improve students' problem-solving performance |journal=J Med Educ |volume=61 |issue=9 Pt 1 |pages=749–56 |year=1986 |month=September |pmid=3528494 |doi= |url= |issn=}}</ref> In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.<ref name="titleWriting Inverted Pyramids in Cyberspace (Alertbox)">{{cite web |url=http://www.useit.com/alertbox/9606.html |title=Writing Inverted Pyramids in Cyberspace (Alertbox) |accessdate=2007-12-12 |format= |work=}}</ref> This allows the reader to take only an overview, or choose more detail.
Information that is structured was found to be more effective in a controlled study.<ref name="pmid3528494">{{cite journal |author=Beck AL, Bergman DA |title=Using structured medical information to improve students' problem-solving performance |journal=J Med Educ |volume=61 |issue=9 Pt 1 |pages=749–56 |year=1986 |month=September |pmid=3528494 |doi= |url= |issn=}}</ref> In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.<ref name="titleWriting Inverted Pyramids in Cyberspace (Alertbox)">{{cite web |author=Nielsen J | date=1996| url=http://www.useit.com/alertbox/9606.html |title=Writing Inverted Pyramids in Cyberspace (Alertbox) |accessdate=2007-12-12 |format= |work=}}</ref> This allows the reader to take only an overview, or choose more detail.


Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.<ref name="pmid18952948">{{cite journal |author=Lau AY, Coiera EW |title=Can cognitive biases during consumer health information searches be reduced to improve decision making? |journal=J Am Med Inform Assoc |volume= |issue= |pages= |year=2008 |month=October |pmid=18952948 |doi=10.1197/jamia.M2557 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=18952948 |issn=}}</ref>
Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.<ref name="pmid18952948">{{cite journal |author=Lau AY, Coiera EW |title=Can cognitive biases during consumer health information searches be reduced to improve decision making? |journal=J Am Med Inform Assoc |volume= |issue= |pages= |year=2008 |month=October |pmid=18952948 |doi=10.1197/jamia.M2557 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=18952948 |issn=}}</ref>
Line 49: Line 50:


==Evaluation of the quality of information retrieval==
==Evaluation of the quality of information retrieval==
Various methods exist to evaluate the quality of information retrieval.<ref name="isbn0-87815-064-1">{{cite book |author=Warner, Amy J.; Lancaster, Frederick Wilfrid |authorlink= |editor= |others= |title=Information retrieval today |edition= |language= |publisher=Information Resources Press |location=Arlington, Va |year=1993 |origyear= |pages= |quote= |isbn=0-87815-064-1 |oclc= |doi= |url= |accessdate=}}</ref><ref name="isbn0-387-78702-X">{{cite book |author=Hersh, William R. |authorlink= |editor= |others= |title=Information Retrieval: A Health and Biomedical Perspective (Health Informatics) |edition= |language= |publisher=Springer |location=Berlin |year=2008 |origyear= |pages= |quote= |isbn=0-387-78702-X |oclc= |doi= |url= |accessdate=}} [http://books.google.com/books?id=H3f9xsW0a_8C&printsec=toc Google books]</ref><ref name="isbn0-13-607224-0">{{cite book |author=Trevor Strohman; Croft, Bruce; Donald Metzler |authorlink= |editor= |others= |title=Search Engines: Information Retrieval in Practice |edition= |language= |publisher=Addison Wesley |location=Harlow |year=2009 |origyear= |pages= |quote= |isbn=0-13-607224-0 |oclc= |doi= |url=http://www.pearsonhighered.com/croft1epreview/ |accessdate=}}</ref>
Various methods exist to evaluate the quality of information retrieval.<ref name="isbn0-87815-064-1">{{cite book |author=Lancaster, Frederick Wilfrid; Warner, Amy J. |authorlink= |editor= |others= |title=Information retrieval today |edition= |language= |publisher=Information Resources Press |location=Arlington, Va |year=1993 |origyear= |pages= |quote= |isbn=0-87815-064-1 |oclc= |doi= |url= |accessdate=}}</ref><ref name="isbn0-387-78702-X">{{cite book |author=Hersh, William R. |authorlink= |editor= |others= |title=Information Retrieval: A Health and Biomedical Perspective (Health Informatics) |edition= |language= |publisher=Springer |location=Berlin |year=2008 |origyear= |pages= |quote= |isbn=0-387-78702-X |oclc= |doi= |url= |accessdate=}} [http://books.google.com/books?id=H3f9xsW0a_8C&printsec=toc Google books]</ref><ref name="isbn0-13-607224-0">{{cite book |author=Trevor Strohman; Croft, Bruce; Donald Metzler |authorlink= |editor= |others= |title=Search Engines: Information Retrieval in Practice |edition= |language= |publisher=Addison Wesley |location=Harlow |year=2009 |origyear= |pages= |quote= |isbn=0-13-607224-0 |oclc= |doi= |url=http://www.pearsonhighered.com/croft1epreview/ |accessdate=}}</ref>


===Precision and recall===
===Precision and recall===

Revision as of 12:05, 5 April 2009

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Template:TOC-right Information retrieval is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."[1] As noted by Carl Sagan, "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."[2] The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.[3]

Although information retrieval is usually thought of being done by computer, retrieval can also be done by humans for other humans.[4]

Classification by user purpose

Information retrieval can be divided into information discovery, information recovery, and information awareness.[5]

Information discovery

Information discovery is searching for information that the searcher has not seen before and the searcher does not know for sure that the information exists. Information discovery includes searching in order to answer a question at hand, or searching for a topic without a specific question in order to improve knowledge of a topic.

Information recovery

Information recovery is searching for information that the searcher has seen before and knows to exist.

Information awareness

Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he [the searcher] did not know existed".[5] Examples of this prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters[6], email alerts, and RSS feeds.[7]

Classification by indexing methods used

Document retrieval

  • Boolean
  • Vector space model (relevancy)
  • Probabilistic (Bayes)

Factors associated with unsuccessful retrieval

The field of medicine provides much research on the difficulties of information retrieval. Barriers to successful retrieval include:

  • Lack of prior experience with the information retrieval system being used[8][3]
  • Low visual spatial ability[8]
  • Poor formulation of the question to be searched[9]
  • Difficulty designing a search strategy when multiple resources are available[9]
  • "Uncertainty about how to know when all the relevant evidence has been found so that the search can stop"[9]
  • Difficulty synthesizing an answer across multiple documents[9]

Factors associated with successful retrieval

Characteristics of how the information is stored

For storage of text content, the quality of the index to the content is important. For example, the use of stemming, or truncating, words by removing suffixes may help.[10]

Display of information

Information that is structured was found to be more effective in a controlled study.[11] In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.[12] This allows the reader to take only an overview, or choose more detail.

Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.[13]

Characteristics of the search engine

John Battelle has described features of the perfect search engine of the future.[14] For example, the use of Boolean searching may not be as efficient.[15]

Characteristics of the searcher

In healthcare, searchers are more likely to be successful if their answer is answer before searching, they have experience with the system they are searching, and they have a high spatial visualization score.[8] Also in healthcare, physicians with less experience are more likely to want more information.[16] Physicians who report stress when uncertain are more likely to search textbooks than source evidence.[17]

In healthcare, using expert searchers on behalf of physicians led to increased satisfaction by the physicians with the search results.[18]

Evaluation of the quality of information retrieval

Various methods exist to evaluate the quality of information retrieval.[19][20][21]

Precision and recall

Recall is the fraction of relevant documents that are successfully retrieved. This is the same as sensitivity.

Precision is the fraction of retrieved documents that are relevant to the search. This is the same as positive predictive value.

F1 is the unweighted harmonic mean of the recall and precision.[21]

References

  1. National Library of Medicine. Information Storage and Retrieval. Retrieved on 2007-12-12.
  2. Sagan, Carl (1993). The Dragons of Eden: Speculations on the Evolution of Human Intelligence. New York: Ballantine Books. ISBN 0-345-34629-7. 
  3. 3.0 3.1 de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D (1994). "Information retrieved from a database and the augmentation of personal knowledge". J Am Med Inform Assoc 1 (4): 328–38. PMID 7719819[e] Cite error: Invalid <ref> tag; name "pmid7719819" defined multiple times with different content
  4. Mulvaney, S. A., Bickman, L., Giuse, N. B., Lambert, E. W., Sathe, N. A., & Jerome, R. N. (2008). A randomized effectiveness trial of a clinical informatics consult service: impact on evidence-based decision-making and knowledge implementation, J Am Med Inform Assoc, 15(2), 203-211. doi: 10.1197/jamia.M2461.
  5. 5.0 5.1 Garfield, E. “ISI Eases Scientists’ Information Problems: Provides Convenient Orderly Access to Literature,” Karger Gazette No. 13, pg. 2 (March 1966). Reprinted as “The Who and Why of ISI,” Current Contents No. 13, pages 5-6 (March 5, 1969), which was reprinted in Essays of an Information Scientist, Volume 1: ISI Press, pages 33-37 (1977). http://www.garfield.library.upenn.edu/essays/V1p033y1962-73.pdf
  6. Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).
  7. Koerner B (2008). Algorithms Are Terrific. But to Search Smarter, Find a Person. Wired Magazine. Retrieved on 2008-04-04.
  8. 8.0 8.1 8.2 Hersh WR, Crabtree MK, Hickam DH, et al (2002). "Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions". J Am Med Inform Assoc 9 (3): 283–93. PMID 11971889. PMC 344588[e] Cite error: Invalid <ref> tag; name "pmid11971889" defined multiple times with different content
  9. 9.0 9.1 9.2 9.3 Ely JW, Osheroff JA, Ebell MH, et al (March 2002). "Obstacles to answering doctors' questions about patient care with evidence: qualitative study". BMJ 324 (7339): 710. PMID 11909789. PMC 99056[e]
  10. Porter MF. An algorithm for suffix stripping. Program. 1980;14:130–7.
  11. Beck AL, Bergman DA (September 1986). "Using structured medical information to improve students' problem-solving performance". J Med Educ 61 (9 Pt 1): 749–56. PMID 3528494[e]
  12. Nielsen J (1996). Writing Inverted Pyramids in Cyberspace (Alertbox). Retrieved on 2007-12-12.
  13. Lau AY, Coiera EW (October 2008). "Can cognitive biases during consumer health information searches be reduced to improve decision making?". J Am Med Inform Assoc. DOI:10.1197/jamia.M2557. PMID 18952948. Research Blogging.
  14. John Battelle. The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Portfolio Trade. ISBN 1-59184-141-0. 
  15. Verhoeff, J (2001). Inefficiency of the use of Boolean functions for information retrieval system. Communications of the ACM. 1961;4:557 DOI:10.1145/366853.366861
  16. Gruppen LD, Wolf FM, Van Voorhees C, Stross JK (1988). "The influence of general and case-related experience on primary care treatment decision making". Arch. Intern. Med. 148 (12): 2657–63. PMID 3196128[e]
  17. McKibbon KA, Fridsma DB, Crowley RS (2007). "How primary care physicians' attitudes toward risk and uncertainty affect their use of electronic information resources". J Med Libr Assoc 95 (2): 138–46, e49–50. DOI:10.3163/1536-5050.95.2.138. PMID 17443246. Research Blogging.
  18. Shelagh A. Mulvaney et al., “A Randomized Effectiveness Trial of a Clinical Informatics Consult Service: Impact on Evidence Based Decision-Making and Knowledge Implementation,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2461v1 (accessed December 21, 2007).
  19. Lancaster, Frederick Wilfrid; Warner, Amy J. (1993). Information retrieval today. Arlington, Va: Information Resources Press. ISBN 0-87815-064-1. 
  20. Hersh, William R. (2008). Information Retrieval: A Health and Biomedical Perspective (Health Informatics). Berlin: Springer. ISBN 0-387-78702-X.  Google books
  21. 21.0 21.1 Trevor Strohman; Croft, Bruce; Donald Metzler (2009). Search Engines: Information Retrieval in Practice. Harlow: Addison Wesley. ISBN 0-13-607224-0.