Information retrieval: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Badgett
mNo edit summary
 
(77 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Information retrieval''' is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."<ref name="title">{{cite web |url=http://www.nlm.nih.gov/cgi/mesh/2007/MB_cgi?term=Information+Storage+and+Retrieval |title=Information Storage and Retrieval |accessdate=2007-12-12 |author=National Library of Medicine |authorlink= |coauthors= |date= |format= |work= |publisher= |pages= |language= |archiveurl= |archivedate= |quote=}}</ref> As noted by [[Carl Sagan]], "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."<ref name="isbn0-345-34629-7">{{cite book |author=Sagan, Carl |title=Dragons of Eden |publisher=Ballantine Books |location=[New York |year= |pages= |isbn=0-345-34629-7 |oclc= |doi=}}</ref> The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.<ref name="pmid7719819">{{cite journal |author=de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D |title=Information retrieved from a database and the augmentation of personal knowledge |journal=J Am Med Inform Assoc |volume=1 |issue=4 |pages=328–38 |year=1994 |pmid=7719819 |doi=}}</ref>
{{subpages}}
{{TOC|right}}
'''Information retrieval''' is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."<ref name="title">{{cite web |url=http://www.nlm.nih.gov/cgi/mesh/2007/MB_cgi?term=Information+Storage+and+Retrieval |title=Information Storage and Retrieval |accessdate=2007-12-12 |author=National Library of Medicine |authorlink= |coauthors= |date= |format= |work= |publisher= |pages= |language= |archiveurl= |archivedate= |quote=}}</ref> As noted by [[Carl Sagan]], "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."<ref name="isbn0-345-34629-7">{{cite book |author=Sagan, Carl |authorlink= |editor= |others= |title=The Dragons of Eden: Speculations on the Evolution of Human Intelligence |edition= |language= |publisher=Ballantine Books |location=New York |year=1993 |origyear= |pages= |quote= |isbn=0-345-34629-7 |oclc= |doi= |url= |accessdate=}}</ref> The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge or transactive memory have been shown in comparisons with rote memory.<ref name="pmid7719819">{{cite journal |author=de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D |title=Information retrieved from a database and the augmentation of personal knowledge |journal=J Am Med Inform Assoc |volume=1 |issue=4 |pages=328–38 |year=1994 |pmid=7719819 |doi=}}</ref><ref>{{Cite journal
| doi = 10.1126/science.1207745
| last = Sparrow
| first = Betsy
| coauthors = Jenny Liu, Daniel M. Wegner
| title = Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips
| journal = Science
| accessdate = 2011-07-16
| date = 2011-07-14
| url = http://www.sciencemag.org/content/early/2011/07/13/science.1207745.abstract
}}</ref>


==Classification==
Although information retrieval is usually thought of being done by computer, retrieval can also be done by humans for other humans.<ref> Mulvaney, S. A., Bickman, L., Giuse, N. B., Lambert, E. W., Sathe, N. A., & Jerome, R. N. (2008). [http://www.jamia.org/cgi/content/full/15/2/203 A randomized effectiveness trial of a clinical informatics consult service: impact on evidence-based decision-making and knowledge implementation], J Am Med Inform Assoc, 15(2), 203-211. doi: 10.1197/jamia.M2461.</ref> In addition, some Internet search engines such as [http://mahalo.com mahalo.com] and http://www.chacha.com/ may have human supervision or editors.
 
Some Internet search engines such http://www.deeppeep.org and http://www.deepdyve.com/  as attempt to index the Deep Web which is web pages that are not normally public.<ref>Wright A. (2009) [http://www.nytimes.com/2009/02/23/technology/internet/23search.html Exploring a ‘Deep Web’ That Google Can’t Grasp]. New York Times.</ref>
 
The usefulness of a search engine has been proposed to be:<ref name="pmid7964548">{{cite journal |author=Shaughnessy AF, Slawson DC, Bennett JH |title=Becoming an information master: a guidebook to the medical information jungle |journal=J Fam Pract |volume=39 |issue=5 |pages=489–99 |year=1994 |month=November |pmid=7964548 |doi= |url= |issn=}}</ref>
 
:<math> \mbox{Usefulness}=\frac{\mbox{Relevance} * \mbox{Validity}}{\mbox{Work}} </math>
 
==Classification by user purpose==
Information retrieval can be divided into information discovery, information recovery, and information awareness.<ref name="garfiled1966">Garfield, E. “ISI Eases Scientists’ Information Problems: Provides Convenient Orderly Access to Literature,” Karger Gazette No. 13, pg. 2 (March 1966). Reprinted as “The Who and Why of ISI,” Current Contents No. 13, pages 5-6 (March 5, 1969), which was reprinted in Essays of an Information Scientist, Volume 1: ISI Press, pages 33-37 (1977).  http://www.garfield.library.upenn.edu/essays/V1p033y1962-73.pdf</ref>
Information retrieval can be divided into information discovery, information recovery, and information awareness.<ref name="garfiled1966">Garfield, E. “ISI Eases Scientists’ Information Problems: Provides Convenient Orderly Access to Literature,” Karger Gazette No. 13, pg. 2 (March 1966). Reprinted as “The Who and Why of ISI,” Current Contents No. 13, pages 5-6 (March 5, 1969), which was reprinted in Essays of an Information Scientist, Volume 1: ISI Press, pages 33-37 (1977).  http://www.garfield.library.upenn.edu/essays/V1p033y1962-73.pdf</ref>


Line 11: Line 31:


===Information awareness===
===Information awareness===
Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he <nowiki>[the searcher]</nowiki> did not know existed".<ref name="garfiled1966"/> Examples of this prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters, email alerts, and RSS feeds.
Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he <nowiki>[the searcher]</nowiki> did not know existed".<ref name="garfiled1966"/> Information awareness can be further divided into:<ref name="pmid21099399">{{cite journal| author=Tanna GV, Sood MM, Schiff J, Schwartz D, Naimark DM| title=Do E-mail Alerts of New Research Increase Knowledge Translation? A "Nephrology Now" Randomized Control Trial. | journal=Acad Med | year= 2011 | volume= 86 | issue= 1 | pages= 132-138 | pmid=21099399 | doi=10.1097/ACM.0b013e3181ffe89e | pmc= | url= }} </ref>
* Information familiarity
* Knowledge acquisition (or called recollection) is the ability to apply the new knowledge.
 
Examples of information awareness prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters<ref name="pmidpendingGrad">Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).</ref>, email alerts, and RSS feeds.<ref name="titleAlgorithms Are Terrific. But to Search Smarter, Find a Person">{{cite web |url=http://www.wired.com/techbiz/it/magazine/16-04/bz_curator |title=Algorithms Are Terrific. But to Search Smarter, Find a Person |accessdate=2008-04-04 |author=Koerner B |authorlink= |coauthors= |date=2008 |format= |work= |publisher=Wired Magazine |pages= |language= |archiveurl= |archivedate= |quote=}}</ref>
 
These methods may increase information familiarity.<ref name="pmid21099399">{{cite journal| author=Tanna GV, Sood MM, Schiff J, Schwartz D, Naimark DM| title=Do E-mail Alerts of New Research Increase Knowledge Translation? A "Nephrology Now" Randomized Control Trial. | journal=Acad Med | year= 2011 | volume= 86 | issue= 1 | pages= 132-138 | pmid=21099399 | doi=10.1097/ACM.0b013e3181ffe89e | pmc= | url= }} </ref>


==Factors associated with successful retrieval==
==Classification by indexing methods used==
===Document retrieval===
Models for information retrieval of documents are based on either the text of the document or links to and from the document and other documents.<ref name="isbn0321416910">{{cite book |author=Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto |authorlink= |editor= |others= |title=Modern information retrieval |edition= |language= |publisher=Addison-Wesley |location=Boston |year=2009 |origyear= |pages= |quote= |isbn=0-321-41691-0 |oclc= |doi= |url= |accessdate=}}</ref>
 
Models based on analysis of the text are the boolean, vector, and probabilistic.<ref name="isbn0321416910"/>
 
Models based on analysis of the links include [[PageRank]], HITS, and [[impact factor]].<ref name="isbn0321416910"/>
 
====Boolean (set theoretic, exact matching)====
Variants of the boolean model include:<ref name="isbn0321416910"/>
* Fuzzy logic (used with thesauri)
* Extended boolean
 
====Vector space model (relevancy, algebraic, partial match, ranking)====
Relevancy is determined by weighting concept ''i'' in a document ''j'' by (tf-idf weighting):<ref>{{cite book |editor=Fagan, Lawrence Marvin; Shortliffe, Edward Hance; Perreault, Leslie E.; Wiederhold, Gio |title=Medical Informatics: Computer Applications in Health Care and Biomedicine |publisher=Springer |location=Berlin |year=2001 |pages=549 |isbn=0-387-98472-0 |oclc= |doi= |accessdate=|chapter=Information Retrieval Systems|author=Hersh, William R}}</ref>
<br/>
<math>\text{Weight}_{ij} = \text{Term frequency}_{ij} \times \text{Inverse document frequency}_{ij}</math>
<div>Where</div>
<math>\text{Inverse document frequency}_{i} =  \text{log} \left ( \frac{\text{number of documents}}{\text{number of documents with term }i} \right ) + 1 </math>
<div>and</div>
<math>\text{Term frequency}_{ij} = \text{log}\left ( \text{frequency of term } i \text{ in document } j\right )</math>
Variants of the vector space model include:<ref name="isbn0321416910"/>
* Generalized vector space model (allows for correlated search terms)
* Latent semantic indexing model (allows for search for synonymous concepts rather than literal search terms)
* Neural network model
 
====Probabilistic (Bayes)====
Variants of the probabilistic model include:<ref name="isbn0321416910"/>
* Inference network
* Belief network
 
====Analysis of links====
{{main|PageRank|Impact factor}}
 
==Factors associated with ''unsuccessful'' retrieval==
The field of [[medicine]] provides much research on the difficulties of information retrieval. Barriers to successful retrieval include:
* Lack of prior experience with the information retrieval system being used<ref name="pmid11971889">{{cite journal |author=Hersh WR, Crabtree MK, Hickam DH, ''et al'' |title=Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions |journal=J Am Med Inform Assoc |volume=9 |issue=3 |pages=283–93 |year=2002 |pmid=11971889 |pmc=344588 |doi= |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=11971889 |issn=}}</ref><ref name="pmid7719819"/>
* Low visual spatial ability<ref name="pmid11971889"/>
* Poor formulation of the question to be searched<ref name="pmid11909789">{{cite journal |author=Ely JW, Osheroff JA, Ebell MH, ''et al'' |title=Obstacles to answering doctors' questions about patient care with evidence: qualitative study |journal=BMJ |volume=324 |issue=7339 |pages=710 |year=2002 |month=March |pmid=11909789 |pmc=99056 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=11909789 |issn=}}</ref>
* Difficulty designing a search strategy when multiple resources are available<ref name="pmid11909789">{{cite journal |author=Ely JW, Osheroff JA, Ebell MH, ''et al'' |title=Obstacles to answering doctors' questions about patient care with evidence: qualitative study |journal=BMJ |volume=324 |issue=7339 |pages=710 |year=2002 |month=March |pmid=11909789 |pmc=99056 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=11909789 |issn=}}</ref>
* "Uncertainty about how to know when all the relevant evidence has been found so  that the search can stop"<ref name="pmid11909789">{{cite journal |author=Ely JW, Osheroff JA, Ebell MH, ''et al'' |title=Obstacles to answering doctors' questions about patient care with evidence: qualitative study |journal=BMJ |volume=324 |issue=7339 |pages=710 |year=2002 |month=March |pmid=11909789 |pmc=99056 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=11909789 |issn=}}</ref>
* Difficulty synthesizing an answer across multiple documents<ref name="pmid11909789">{{cite journal |author=Ely JW, Osheroff JA, Ebell MH, ''et al'' |title=Obstacles to answering doctors' questions about patient care with evidence: qualitative study |journal=BMJ |volume=324 |issue=7339 |pages=710 |year=2002 |month=March |pmid=11909789 |pmc=99056 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=11909789 |issn=}}</ref>
 
==Factors associated with ''successful'' retrieval==
===Characteristics of how the information is stored===
===Characteristics of how the information is stored===
For storage of text content, the quality of the index to the content is important. For example, the use of stemming, or truncating, words by removing suffixes may help.<ref>Porter MF. An algorithm for suffix stripping. Program.  1980;14:130–7.</ref>
====Display of information====
====Display of information====
Information that is structured was found to be more effective in a controlled study.<ref name="isbn0-345-34629-7">{{cite book |author=Sagan, Carl |title=Dragons of Eden |publisher=Ballantine Books |location=[New York |year= |pages= |isbn=0-345-34629-7 |oclc= |doi=}}</ref> In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.<ref name="titleWriting Inverted Pyramids in Cyberspace (Alertbox)">{{cite web |url=http://www.useit.com/alertbox/9606.html |title=Writing Inverted Pyramids in Cyberspace (Alertbox) |accessdate=2007-12-12 |format= |work=}}</ref> This allows the reader to take only an overview, or choose more detail.
Information that is structured may be more effective according to controlled studies.<ref name="pmid17873258">{{cite journal| author=Schwartz LM, Woloshin S, Welch HG| title=The drug facts box: providing consumers with simple tabular data on drug benefit and harm. | journal=Med Decis Making | year= 2007 | volume= 27 | issue= 5 | pages= 655-62 | pmid=17873258 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=17873258 | doi=10.1177/0272989X07306786 }} </ref><ref name="pmid3528494">{{cite journal |author=Beck AL, Bergman DA |title=Using structured medical information to improve students' problem-solving performance |journal=J Med Educ |volume=61 |issue=9 Pt 1 |pages=749–56 |year=1986 |month=September |pmid=3528494 |doi= |url= |issn=}}</ref> In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.<ref name="titleWriting Inverted Pyramids in Cyberspace (Alertbox)">{{cite web |author=Nielsen J | date=1996| url=http://www.useit.com/alertbox/9606.html |title=Writing Inverted Pyramids in Cyberspace (Alertbox) |accessdate=2007-12-12 |format= |work=}}</ref> This allows the reader to take only an overview, or choose more detail. Some Internet search engines such as http://www.kosmix.com/ try to organize search results beyond a one dimensional list of results.
 
Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.<ref name="pmid18952948">{{cite journal |author=Lau AY, Coiera EW |title=Can cognitive biases during consumer health information searches be reduced to improve decision making? |journal=J Am Med Inform Assoc |volume= |issue= |pages= |year=2008 |month=October |pmid=18952948 |doi=10.1197/jamia.M2557 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=18952948 |issn=}}</ref>
 
===Characteristics of the search engine===
John Battelle has described features of the perfect search engine of the future.<ref name="isbn1-59184-141-0">{{cite book |author=John Battelle |title=The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture |publisher=Portfolio Trade |location= |year= |pages= |isbn=1-59184-141-0 |oclc= |doi=}}</ref> For example, the use of Boolean searching may not be as efficient.<ref>Verhoeff, J (2001).  Inefficiency of the use of Boolean functions for information retrieval system. Communications of the ACM. 1961;4:557 {{doi|10.1145/366853.366861}}</ref> Meta-searching and task based searching may improve decision velocity.<ref name="pmid18579828">{{cite journal| author=Coiera E, Westbrook JI, Rogers K| title=Clinical decision velocity is increased when meta-search filters enhance an evidence retrieval system. | journal=J Am Med Inform Assoc | year= 2008 Sep-Oct | volume= 15 | issue= 5 | pages= 638-46 | pmid=18579828 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=18579828 | doi=10.1197/jamia.M2765 | pmc=PMC2528038 }} </ref>
 
====Meta-search====
Meta-search engines search multiple resources and integrate the results for the user. Examples in health care include [http://www.tripdatabase.com/ Trip Database], [http://plus.mcmaster.ca/macplusfs/ MacPLUS], and [http://www.chi.unsw.edu.au/chiweb.nsf/page/QuickClinical QuickClinical].


===Characteristics of the searcher===
===Characteristics of the searcher===
In healthcare, searchers are more likely to be successful if their answer is answer before searching, they have experience with the system they are searching, and they have a high spatial visualization score.<ref name="pmid11971889">{{cite journal |author=Hersh WR, Crabtree MK, Hickam DH, ''et al'' |title=Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions |journal=J Am Med Inform Assoc |volume=9 |issue=3 |pages=283–93 |year=2002 |pmid=11971889 |doi=}}</ref> Also in healthcare, physicians with less experience are more likely to want more information.<ref name="pmid3196128">{{cite journal |author=Gruppen LD, Wolf FM, Van Voorhees C, Stross JK |title=The influence of general and case-related experience on primary care treatment decision making |journal=Arch. Intern. Med. |volume=148 |issue=12 |pages=2657–63 |year=1988 |pmid=3196128 |doi=}}</ref> Physicians who report stress when uncertain are more likely to search textbooks than source evidence.<ref name="pmid17443246">{{cite journal |author=McKibbon KA, Fridsma DB, Crowley RS |title=How primary care physicians' attitudes toward risk and uncertainty affect their use of electronic information resources |journal=J Med Libr Assoc |volume=95 |issue=2 |pages=138–46, e49–50 |year=2007 |pmid=17443246 |doi=10.3163/1536-5050.95.2.138}}</ref>
In healthcare, searchers are more likely to be successful if their answer is answer before searching, they have experience with the system they are searching, and they have a high spatial visualization score.<ref name="pmid11971889"/> Also in healthcare, physicians with less experience are more likely to want more information.<ref name="pmid3196128">{{cite journal |author=Gruppen LD, Wolf FM, Van Voorhees C, Stross JK |title=The influence of general and case-related experience on primary care treatment decision making |journal=Arch. Intern. Med. |volume=148 |issue=12 |pages=2657–63 |year=1988 |pmid=3196128 |doi=}}</ref> Physicians who report stress when uncertain are more likely to search textbooks than source evidence.<ref name="pmid17443246">{{cite journal |author=McKibbon KA, Fridsma DB, Crowley RS |title=How primary care physicians' attitudes toward risk and uncertainty affect their use of electronic information resources |journal=J Med Libr Assoc |volume=95 |issue=2 |pages=138–46, e49–50 |year=2007 |pmid=17443246 |doi=10.3163/1536-5050.95.2.138}}</ref>
 
In healthcare, using expert searchers on behalf of physicians led to increased satisfaction by the physicians with the search results.<ref name="PMIDPending_Shelagh">Shelagh A. Mulvaney et al., “A Randomized Effectiveness Trial of a Clinical Informatics Consult Service: Impact on Evidence Based Decision-Making and Knowledge Implementation,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2461v1 (accessed December 21, 2007).</ref>
 
Use of term overlap is associated with success.<ref name="pmid9794316">{{cite journal| author=Hersh WR, Hickam DH| title=How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. | journal=JAMA | year= 1998 | volume= 280 | issue= 15 | pages= 1347-52 | pmid=9794316 | doi= | pmc= | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=9794316  }} </ref>
 
==Impact of information retrieval==
The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.<ref name="pmid7719819"/>
Various before and after comparisons are summarized in the tables.
 
{| class="wikitable"
|+ Impact of medical searching by physicians and medical students<ref name="pmid16929042">{{cite journal |author=McKibbon KA, Fridsma DB |title=Effectiveness of clinician-selected electronic information resources for answering primary care physicians' information needs |journal=J Am Med Inform Assoc |volume=13 |issue=6 |pages=653–9 |year=2006 |pmid=16929042 |pmc=1656967 |doi=10.1197/jamia.M2087 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=16929042 |issn=}}</ref><ref name="pmid15800302">{{cite journal |author=Westbrook JI, Gosling AS, Coiera EW |title=The impact of an online evidence system on confidence in decision making in a controlled setting |journal=Med Decis Making |volume=25 |issue=2 |pages=178–85 |year=2005 |pmid=15800302 |doi=10.1177/0272989X05275155 |url=http://mdm.sagepub.com/cgi/pmidlookup?view=long&pmid=15800302 |issn=}}</ref><ref name="pmid18579828">{{cite journal |author=Coiera E, Westbrook JI, Rogers K |title=Clinical Decision Velocity is Increased when Meta-search Filters Enhance an Evidence Retrieval System |journal=J Am Med Inform Assoc |volume=15 |issue=5 |pages=638–46 |year=2008 |pmid=18579828 |pmc=2528038 |doi=10.1197/jamia.M2765 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=18579828 |issn=}} [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=citizendium&pubmedid=18579828 Pubmed Central]</ref><ref name="pmid11971889"/>
! rowspan="2"|Search engine!!rowspan="2"| Users!!rowspan="2"| Questions!! colspan="2"|Portion of answers correct||rowspan="2"|Portion of answers that moved from correct to incorrect
|-
! Before searching!!After searching
|-
| Quick Clinical<ref name="pmid15800302"/><ref name="pmid18579828"/><br/>(federated search)|| 73 practicing doctors and clinical nurse consultants||[http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2528038&rendertype=table&id=tbl2 Eight clinical questions]<br/>600 total responses||align="center"|37%||align="center"|50%||align="center"|7%
|-
| User's own choice<ref name="pmid16929042"/>||23 primary care physicians|| 2 questions from a pool of [http://www.jamia.org/cgi/content/full/13/6/653/FOO1 23 clinical questions] from Hersh<ref name="pmid11971889"/><br/>46 total responses||align="center"|39%||align="center"|42%||align="center"|11%
|-
| OVID<ref name="pmid11971889"/>|| 45 senior medical students (data available for nursing students)||5 questions from a pool of [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=344588&rendertype=table&id=t2 23 clinical questions] from Hersh<ref name="pmid11971889"/><br/>324 total responses||align="center"|32%||align="center"|52%||align="center"|13%
|}
 
{| class="wikitable"
|+ Frequency that searching changed medical care.<ref name="pmid21949275">{{cite journal| author=Izcovich A, Malla CG, Diaz MM, Manzotti M, Catalano HN| title=Impact of facilitating physician access to relevant medical literature on outcomes of hospitalised internal medicine patients: a randomised controlled trial. | journal=Evid Based Med | year= 2011 | volume= 16 | issue= 5 | pages= 131-5 | pmid=21949275 | doi=10.1136/ebmed-2011-100117 | pmc= | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=21949275  }}  [http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=21949274 Review in: Evid Based Med. 2011 Oct;16(5):129-30] </ref><ref name="pmid15109337">{{cite journal| author=Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG et al.| title=The impact of evidence on physicians' inpatient treatment decisions. | journal=J Gen Intern Med | year= 2004 | volume= 19 | issue= 5 Pt 1 | pages= 402-9 | pmid=15109337 | doi=10.1111/j.1525-1497.2004.30306.x | pmc=PMC1492243 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=15109337  }} </ref><ref name="pmid12634206">{{cite journal |author=Crowley SD, Owens TA, Schardt CM, ''et al'' |title=A Web-based compendium of clinical questions and medical evidence to educate internal medicine residents |journal=Acad Med |volume=78 |issue=3 |pages=270–4 |year=2003 |month=March |pmid=12634206 |doi= |url=http://meta.wkhealth.com/pt/pt-core/template-journal/lwwgateway/media/landingpage.htm?issn=1040-2446&volume=78&issue=3&spage=270 |issn=}}</ref><ref name="pmid1600426">{{cite journal |author=Marshall JG |title=The impact of the hospital library on clinical decision making: the Rochester study |journal=Bull Med Libr Assoc |volume=80 |issue=2 |pages=169–78 |year=1992 |month=April |pmid=1600426 |pmc=PMC225641 |doi= |url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=citizendium&pubmedid=1600426 |issn=}}</ref><ref name="pmid3450340">{{cite journal |author=King DN |title=The contribution of hospital library information services to clinical care: a study in eight hospitals |journal=Bull Med Libr Assoc |volume=75 |issue=4 |pages=291–301 |year=1987 |month=October |pmid=3450340 |pmc=PMC227744 |doi= |url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=citizendium&pubmedid=3450340 |issn=}}</ref>
! &nbsp;!! Searches!!Frequency useful information found||Frequency changed care
|-
| Izcovich<ref name="pmid21949275"/><br/>2011||[[Randomized controlled trial|RCT]] of 407 inpatients compared to 402 control inpatients<br/>Searcher sought answers to questions that arose during "morning report".<br/>Search resources ''did not'' include UpToDate. Results emailed to teams.||&nbsp;||No difference between study groups
|-
| Lucas<ref name="pmid15109337"/><br/>2011||Before after study of 146 inpatients<br/>Searcher sought answers to corroborate principle treated decisions for ''all'' patients.<br/>Search resources included UpToDate. Search results given to attendings.<br/>Blinded outcome assessment||&nbsp;||&bull;&nbsp;Treatments changed in 18%<br/>&bull;&nbsp;Treatments improved in 14%
|-
| Crowley<ref name="pmid12634206"/><br/>2003|| 625 self-initiated searches, uncontrolled study||align="center"|83%||39%
|-
| Rochester study<ref name="pmid1600426"/><br/>1992||uncontrolled study||&nbsp;||80%
|-
| Chicago study<ref name="pmid3450340"/><br/>1987||questions searched by librarians in response to physician queries; uncontrolled study||&nbsp;||74%
|}
 
[[Critical incident]] studies can also document impact of information retrieval.<ref name="pmid16798071">{{cite journal| author=Westbrook JI, Coiera EW, Sophie Gosling A, Braithwaite J| title=Critical incidents and journey mapping as techniques to evaluate the impact of online evidence retrieval systems on health care delivery and patient outcomes. | journal=Int J Med Inform | year= 2007 Feb-Mar | volume= 76 | issue= 2-3 | pages= 234-45 | pmid=16798071
| url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=clinical.uthscsa.edu/cite&email=badgett@uthscdsa.edu&retmode=ref&cmd=prlinks&id=16798071 | doi=10.1016/j.ijmedinf.2006.03.006 }}</ref><ref name="pmid8505815">{{cite journal| author=Lindberg DA, Siegel ER, Rapp BA, Wallingford KT, Wilson SR| title=Use of MEDLINE by physicians for clinical problem solving. | journal=JAMA | year= 1993 Jun 23-30 | volume= 269 | issue= 24 | pages= 3124-9 | pmid=8505815
| url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=clinical.uthscsa.edu/cite&email=badgett@uthscdsa.edu&retmode=ref&cmd=prlinks&id=8505815 }}</ref>
 
==Evaluation of the quality of information retrieval==
{{Image|600px-Survival-curve.png|right|350px|Survival curve modeling amount of time taken to answer questions. The units for time are arbitrary and meaningless in this example.}}
{{Image|600px-Logistic-curve.svg.png|right|350px|Logistic curve modeling rate of correct answers over time. The units for time are arbitrary and meaningless in this example.}}
Various methods exist to evaluate the quality of information retrieval.<ref name="isbn0-87815-064-1">{{cite book |author=Lancaster, Frederick Wilfrid; Warner, Amy J. |authorlink= |editor= |others= |title=Information retrieval today |edition= |language= |publisher=Information Resources Press |location=Arlington, Va |year=1993 |origyear= |pages= |quote= |isbn=0-87815-064-1 |oclc= |doi= |url= |accessdate=}}</ref><ref name="isbn0-387-78702-X">{{cite book |author=Hersh, William R. |authorlink= |editor= |others= |title=Information Retrieval: A Health and Biomedical Perspective (Health Informatics) |edition= |language= |publisher=Springer |location=Berlin |year=2008 |origyear= |pages= |quote= |isbn=0-387-78702-X |oclc= |doi= |url= |accessdate=}} [http://books.google.com/books?id=H3f9xsW0a_8C&printsec=toc Google books]</ref><ref name="isbn0-13-607224-0">{{cite book |author=Trevor Strohman; Croft, Bruce; Donald Metzler |authorlink= |editor= |others= |title=Search Engines: Information Retrieval in Practice |edition= |language= |publisher=Addison Wesley |location=Harlow |year=2009 |origyear= |pages= |quote= |isbn=0-13-607224-0 |oclc= |doi= |url=http://www.pearsonhighered.com/croft1epreview/ |accessdate=}}</ref> Hersh<ref name="isbn0-387-78702-X"/> noted the classification of evaluation developed by Wancaster and Warner<ref name="isbn0-87815-064-1"/> in which the first level of evaluation is:
* Costs/resources consumed in learning and using a system
* Time needed to use the system<ref name="pmid11903763">{{cite journal |author=Cabell CH, Schardt C, Sanders L, Corey GR, Keitz SA |title=Resident utilization of information technology |journal=J Gen Intern Med |volume=16 |issue=12 |pages=838–44 |year=2001 |month=December |pmid=11903763 |pmc=1495306 |doi= |url= |issn=}}</ref>
* Quality of the results.
** Coverage. An estimated of coverage can be crudely automated.<ref name="pmid17641755">{{cite journal |author=Fenton SH, Badgett RG |title=A comparison of primary care information content in UpToDate and the National Guideline Clearinghouse |journal=J Med Libr Assoc |volume=95 |issue=3 |pages=255–9 |year=2007 |month=July |pmid=17641755 |pmc=1924927 |doi=10.3163/1536-5050.95.3.255 |url= |issn=}}</ref> However, more accurate judgment of relevance requires a human judge which introduces subjectivity.<ref>Hersh WR, Buckley C, Leone TJ, Hickam DH, [http://medir.ohsu.edu/~hersh/sigir-94-ohsumed.pdf OHSUMED:  An interactive retrieval evaluation and new large test collection for research], Proceedings of the 17th Annual ACM SIGIR Conference, 1994, 192-201. </ref>
** Precision and recall
** Novelty. This has been judged by independent reviewers.<ref name="pmid15109337">{{cite journal |author=Lucas BP, Evans AT, Reilly BM, ''et al'' |title=The impact of evidence on physicians' inpatient treatment decisions |journal=J Gen Intern Med |volume=19 |issue=5 Pt 1 |pages=402–9 |year=2004 |month=May |pmid=15109337 |pmc=1492243 |doi=10.1111/j.1525-1497.2004.30306.x |url= |issn=}}</ref>
** Completeness and accuracy of results. An easy method of assessing this is to let the searcher make a subjective assessment.<ref name="pmid12634206">{{cite journal |author=Crowley SD, Owens TA, Schardt CM, ''et al'' |title=A Web-based compendium of clinical questions and medical evidence to educate internal medicine residents |journal=Acad Med |volume=78 |issue=3 |pages=270–4 |year=2003 |month=March |pmid=12634206 |doi= |url=http://meta.wkhealth.com/pt/pt-core/template-journal/lwwgateway/media/landingpage.htm?issn=1040-2446&volume=78&issue=3&spage=270 |issn=}}</ref><ref name="pmid15561792">{{cite journal |author=Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME |title=Answering physicians' clinical questions: obstacles and potential solutions |journal=J Am Med Inform Assoc |volume=12 |issue=2 |pages=217–24 |year=2005 |pmid=15561792 |pmc=551553 |doi=10.1197/jamia.M1608 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=15561792 |issn=}}</ref><ref name="pmid11604759">{{cite journal |author=Gorman P |title=Information needs in primary care: a survey of rural and nonrural primary care physicians |journal=Stud Health Technol Inform |volume=84 |issue=Pt 1 |pages=338–42 |year=2001 |pmid=11604759 |doi= |url= |issn=}}</ref><ref name="pmid11711012">{{cite journal |author=Alper BS, Stevermer JJ, White DS, Ewigman BG |title=Answering family physicians' clinical questions using electronic medical databases |journal=J Fam Pract |volume=50 |issue=11 |pages=960–5 |year=2001 |month=November |pmid=11711012 |doi= |url=http://www.jfponline.com/Pages.asp?AID=2383 |issn=}}</ref> Other methods may be to use a bank of questions with known target documents<ref name="pmid2403476">{{cite journal |author=Haynes RB, McKibbon KA, Walker CJ, Ryan N, Fitzgerald D, Ramsden MF |title=Online access to MEDLINE in clinical settings. A study of use and usefulness |journal=Ann. Intern. Med. |volume=112 |issue=1 |pages=78–84 |year=1990 |month=January |pmid=2403476 |doi= |url= |issn=}}</ref> or known answers<ref name="pmid11971889"/>.
 
*Usage
** Self-reported
** Measured<ref name="pmid11903763">{{cite journal |author=Cabell CH, Schardt C, Sanders L, Corey GR, Keitz SA |title=Resident utilization of information technology |journal=J Gen Intern Med |volume=16 |issue=12 |pages=838–44 |year=2001 |month=December |pmid=11903763 |pmc=1495306 |doi= |url= |issn=}}</ref>
 
===Precision and recall===
Recall is the fraction of relevant documents that are successfully retrieved. This is the same as [[sensitivity and specificity|sensitivity]]. The recall has also been called the "yield"<ref name="pmid20102628">{{cite journal| author=Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH| title=Semi-automated screening of biomedical citations for systematic reviews. | journal=BMC Bioinformatics | year= 2010 | volume= 11 | issue=  | pages= 55 | pmid=20102628 | doi=10.1186/1471-2105-11-55 | pmc=PMC2824679 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=20102628  }} </ref> and comprehensiveness<ref name="pmid22249990">{{cite journal| author=Shariff SZ, Sontrop JM, Haynes RB, Iansavichus AV, McKibbon KA, Wilczynski NL et al.| title=Impact of PubMed search filters on the retrieval of evidence by physicians. | journal=CMAJ | year= 2012 | volume= 184 | issue= 3 | pages= E184-90 | pmid=22249990 | doi=10.1503/cmaj.101661 | pmc=PMC3281182 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=22249990  }} </ref>.
 
:<math> \mbox{recall}=\frac{|\{\mbox{relevant documents}\}\cap\{\mbox{retrieved documents}\}|}{|\{\mbox{relevant documents}\}|} </math>
 
Precision is the fraction of retrieved documents that are relevant to the search.  Precision has also been called efficiency.<ref name="pmid22249990">{{cite journal| author=Shariff SZ, Sontrop JM, Haynes RB, Iansavichus AV, McKibbon KA, Wilczynski NL et al.| title=Impact of PubMed search filters on the retrieval of evidence by physicians. | journal=CMAJ | year= 2012 | volume= 184 | issue= 3 | pages= E184-90 | pmid=22249990 | doi=10.1503/cmaj.101661 | pmc=PMC3281182 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=22249990  }} </ref> This is the same as [[sensitivity and specificity|positive predictive value]].
 
:<math> \mbox{precision}=\frac{|\{\mbox{relevant documents}\}\cap\{\mbox{retrieved documents}\}|}{|\{\mbox{retrieved documents}\}|} </math>
 
F<sub>1</sub> is the unweighted [[harmonic mean]] of the recall and precision.<ref name="isbn0-13-607224-0">{{cite book |author=Trevor Strohman; Croft, Bruce; Donald Metzler |authorlink= |editor= |others= |title=Search Engines: Information Retrieval in Practice |edition= |language= |publisher=Addison Wesley |location=Harlow |year=2009 |origyear= |pages= |quote= |isbn=0-13-607224-0 |oclc= |doi= |url=http://www.pearsonhighered.com/croft1epreview/ |accessdate=}}</ref><ref name="pmid20819854">{{cite journal| author=Uzuner O, Solti I, Cadag E| title=Extracting medication information from clinical text. | journal=J Am Med Inform Assoc | year= 2010 | volume= 17 | issue= 5 | pages= 514-8 | pmid=20819854 | doi=10.1136/jamia.2010.003947 | pmc=PMC2995677 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=20819854  }} </ref>
 
===Number needed to read===
The number Needed to Read (NNR) is "how many papers in a journal have to be read to find one of adequate clinical quality and relevance."<ref name="pmid15910578">{{cite journal |author=Toth B, Gray JA, Brice A |title=The number needed to read-a new measure of journal value |journal=Health Info Libr J |volume=22 |issue=2 |pages=81–2 |year=2005 |pmid=15910578 |doi=10.1111/j.1471-1842.2005.00568.x}}</ref><ref name="pmid15350200">{{cite journal |author=McKibbon KA, Wilczynski NL, Haynes RB |title=What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals? |journal=BMC Med |volume=2 |issue= |pages=33 |year=2004 |pmid=15350200 |doi=10.1186/1741-7015-2-33}}</ref><ref name="pmid12386115">{{cite journal |author=Bachmann LM, Coray R, Estermann P, Ter Riet G |title=Identifying diagnostic studies in MEDLINE: reducing the number needed to read |journal=J Am Med Inform Assoc |volume=9 |issue=6 |pages=653–8 |year=2002 |pmid=12386115 |doi=}}</ref><ref name="pmid17603909">{{cite journal |author=Haase A, Follmann M, Skipka G, Kirchner H |title=Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance |journal=BMC Med Res Methodol |volume=7 |issue= |pages=28 |year=2007 |pmid=17603909 |doi=10.1186/1471-2288-7-28}}</ref> Of note, the NNR has been proposed as a metric to help libraries to decide which journals to subscribe to.<ref name="pmid15910578"/> The NNR has also been called the "burden."<ref name="pmid20102628">{{cite journal| author=Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH| title=Semi-automated screening of biomedical citations for systematic reviews. | journal=BMC Bioinformatics | year= 2010 | volume= 11 | issue=  | pages= 55 | pmid=20102628 | doi=10.1186/1471-2105-11-55 | pmc=PMC2824679 | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=20102628  }} </ref>
 
===Number needed to search===
The humber needed to search (NNS) is the number of questions that would have to be searched for one question to be well answered.<ref name="pmid15109337">{{cite journal| author=Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG et al.| title=The impact of evidence on physicians' inpatient treatment decisions. | journal=J Gen Intern Med | year= 2004 | volume= 19 | issue= 5 Pt 1 | pages= 402-9 | pmid=15109337
| url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=clinical.uthscsa.edu/cite&email=badgett@uthscdsa.edu&retmode=ref&cmd=prlinks&id=15109337 | doi=10.1111/j.1525-1497.2004.30306.x | pmc=PMC1492243 }}  [http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=clinical.uthscsa.edu/cite&email=badgett@uthscdsa.edu&retmode=ref&cmd=prlinks&id=11280129 Review in: ACP J Club. 2001 Mar-Apr;134(2):A11-3] <!--Formatted by http://sumsearch.uthscsa.edu/cite/--></ref>
 
===Hit curve===
A hit curve is the number of relevant documents retrieved among the first n results.<ref name="pmid16469545">{{cite journal |author=Herskovic JR, Iyengar MS, Bernstam EV |title=Using hit curves to compare search algorithm performance |journal=J Biomed Inform |volume=40 |issue=2 |pages=93–9 |year=2007 |pmid=16469545 |doi=10.1016/j.jbi.2005.12.007}}</ref><ref name="pmid16221938">{{cite journal |author=Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR |title=Using citation data to improve retrieval from MEDLINE |journal=J Am Med Inform Assoc |volume=13 |issue=1 |pages=96–105 |year=2006 |pmid=16221938 |doi=10.1197/jamia.M1909}}</ref>
 
===Decision velocity===
Time need to answer a question can be compared between two systems with a Kaplan-Meir survival analysis method.<ref name="pmid18579828">{{cite journal |author=Coiera E, Westbrook JI, Rogers K |title=Clinical Decision Velocity is Increased when Meta-search Filters Enhance an Evidence Retrieval System |journal=J Am Med Inform Assoc |volume=15 |issue=5 |pages=638–46 |year=2008 |pmid=18579828 |pmc=2528038 |doi=10.1197/jamia.M2765 |url=http://www.jamia.org/cgi/pmidlookup?view=long&pmid=18579828 |issn=}}</ref>
 
In [[health care]], difficult questions make take hours to answer.<ref name="pmid11548078">{{cite journal| author=Del Mar CB, Silagy CA, Glasziou PP, Weller D, Spinks AB, Bernath V et al.| title=Feasibility of an evidence-based literature search service for general practitioners. | journal=Med J Aust | year= 2001 | volume= 175 | issue= 3 | pages= 134-7 | pmid=11548078
| url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=clinical.uthscsa.edu/cite&email=badgett@uthscdsa.edu&retmode=ref&cmd=prlinks&id=11548078 }}</ref>
 
If the correct answer to the search question is known, a logistic function can model rate of ''correct'' answers over time. The result is an [[S-curve]] (also called sigmoid curve or logistic growth curve) in which most questions are answered after an initial delay; however, a minority of questions take a much longer time.
 
===Critical incidents===
Analysis of critical incidents may help.<ref name="pmid16798071"/>


==References==
==Footnotes==
<references/>
<small>
<references>


[[Category:CZ Live]] [[Category:Health Sciences Workgroup]][[Category:Library and Information Science Workgroup]]
</references>
In addition:
* {{cite book |author=Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto |authorlink= |editor= |others= |title=Modern information retrieval |edition= |language= |publisher=Addison-Wesley |location=Boston |year=2009 |origyear= |pages= |quote= |isbn=0-321-41691-0 |oclc= |doi= |url= |accessdate=}}
* {{cite book |author=Shortliffe, Edward Hance; Cimino, James D. |authorlink= |editor= |others= |title=Biomedical informatics: computer applications in health care and biomedicine |edition= |language= |publisher=Springer |location=Berlin |year=2006 |origyear= |pages= |quote= |isbn=0-387-28986-0 |oclc= |doi= |url= |accessdate=}}[[Category:Suggestion Bot Tag]]
</small>

Latest revision as of 10:12, 13 September 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Information retrieval is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."[1] As noted by Carl Sagan, "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."[2] The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge or transactive memory have been shown in comparisons with rote memory.[3][4]

Although information retrieval is usually thought of being done by computer, retrieval can also be done by humans for other humans.[5] In addition, some Internet search engines such as mahalo.com and http://www.chacha.com/ may have human supervision or editors.

Some Internet search engines such http://www.deeppeep.org and http://www.deepdyve.com/ as attempt to index the Deep Web which is web pages that are not normally public.[6]

The usefulness of a search engine has been proposed to be:[7]

Classification by user purpose

Information retrieval can be divided into information discovery, information recovery, and information awareness.[8]

Information discovery

Information discovery is searching for information that the searcher has not seen before and the searcher does not know for sure that the information exists. Information discovery includes searching in order to answer a question at hand, or searching for a topic without a specific question in order to improve knowledge of a topic.

Information recovery

Information recovery is searching for information that the searcher has seen before and knows to exist.

Information awareness

Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he [the searcher] did not know existed".[8] Information awareness can be further divided into:[9]

  • Information familiarity
  • Knowledge acquisition (or called recollection) is the ability to apply the new knowledge.

Examples of information awareness prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters[10], email alerts, and RSS feeds.[11]

These methods may increase information familiarity.[9]

Classification by indexing methods used

Document retrieval

Models for information retrieval of documents are based on either the text of the document or links to and from the document and other documents.[12]

Models based on analysis of the text are the boolean, vector, and probabilistic.[12]

Models based on analysis of the links include PageRank, HITS, and impact factor.[12]

Boolean (set theoretic, exact matching)

Variants of the boolean model include:[12]

  • Fuzzy logic (used with thesauri)
  • Extended boolean

Vector space model (relevancy, algebraic, partial match, ranking)

Relevancy is determined by weighting concept i in a document j by (tf-idf weighting):[13]

Where

and

Variants of the vector space model include:[12]

  • Generalized vector space model (allows for correlated search terms)
  • Latent semantic indexing model (allows for search for synonymous concepts rather than literal search terms)
  • Neural network model

Probabilistic (Bayes)

Variants of the probabilistic model include:[12]

  • Inference network
  • Belief network

Analysis of links

For more information, see: PageRank and Impact factor.


Factors associated with unsuccessful retrieval

The field of medicine provides much research on the difficulties of information retrieval. Barriers to successful retrieval include:

  • Lack of prior experience with the information retrieval system being used[14][3]
  • Low visual spatial ability[14]
  • Poor formulation of the question to be searched[15]
  • Difficulty designing a search strategy when multiple resources are available[15]
  • "Uncertainty about how to know when all the relevant evidence has been found so that the search can stop"[15]
  • Difficulty synthesizing an answer across multiple documents[15]

Factors associated with successful retrieval

Characteristics of how the information is stored

For storage of text content, the quality of the index to the content is important. For example, the use of stemming, or truncating, words by removing suffixes may help.[16]

Display of information

Information that is structured may be more effective according to controlled studies.[17][18] In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.[19] This allows the reader to take only an overview, or choose more detail. Some Internet search engines such as http://www.kosmix.com/ try to organize search results beyond a one dimensional list of results.

Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.[20]

Characteristics of the search engine

John Battelle has described features of the perfect search engine of the future.[21] For example, the use of Boolean searching may not be as efficient.[22] Meta-searching and task based searching may improve decision velocity.[23]

Meta-search

Meta-search engines search multiple resources and integrate the results for the user. Examples in health care include Trip Database, MacPLUS, and QuickClinical.

Characteristics of the searcher

In healthcare, searchers are more likely to be successful if their answer is answer before searching, they have experience with the system they are searching, and they have a high spatial visualization score.[14] Also in healthcare, physicians with less experience are more likely to want more information.[24] Physicians who report stress when uncertain are more likely to search textbooks than source evidence.[25]

In healthcare, using expert searchers on behalf of physicians led to increased satisfaction by the physicians with the search results.[26]

Use of term overlap is associated with success.[27]

Impact of information retrieval

The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.[3] Various before and after comparisons are summarized in the tables.

Impact of medical searching by physicians and medical students[28][29][23][14]
Search engine Users Questions Portion of answers correct Portion of answers that moved from correct to incorrect
Before searching After searching
Quick Clinical[29][23]
(federated search)
73 practicing doctors and clinical nurse consultants Eight clinical questions
600 total responses
37% 50% 7%
User's own choice[28] 23 primary care physicians 2 questions from a pool of 23 clinical questions from Hersh[14]
46 total responses
39% 42% 11%
OVID[14] 45 senior medical students (data available for nursing students) 5 questions from a pool of 23 clinical questions from Hersh[14]
324 total responses
32% 52% 13%
Frequency that searching changed medical care.[30][31][32][33][34]
  Searches Frequency useful information found Frequency changed care
Izcovich[30]
2011
RCT of 407 inpatients compared to 402 control inpatients
Searcher sought answers to questions that arose during "morning report".
Search resources did not include UpToDate. Results emailed to teams.
  No difference between study groups
Lucas[31]
2011
Before after study of 146 inpatients
Searcher sought answers to corroborate principle treated decisions for all patients.
Search resources included UpToDate. Search results given to attendings.
Blinded outcome assessment
  • Treatments changed in 18%
• Treatments improved in 14%
Crowley[32]
2003
625 self-initiated searches, uncontrolled study 83% 39%
Rochester study[33]
1992
uncontrolled study   80%
Chicago study[34]
1987
questions searched by librarians in response to physician queries; uncontrolled study   74%

Critical incident studies can also document impact of information retrieval.[35][36]

Evaluation of the quality of information retrieval

Survival curve modeling amount of time taken to answer questions. The units for time are arbitrary and meaningless in this example.
Logistic curve modeling rate of correct answers over time. The units for time are arbitrary and meaningless in this example.

Various methods exist to evaluate the quality of information retrieval.[37][38][39] Hersh[38] noted the classification of evaluation developed by Wancaster and Warner[37] in which the first level of evaluation is:

  • Costs/resources consumed in learning and using a system
  • Time needed to use the system[40]
  • Quality of the results.
    • Coverage. An estimated of coverage can be crudely automated.[41] However, more accurate judgment of relevance requires a human judge which introduces subjectivity.[42]
    • Precision and recall
    • Novelty. This has been judged by independent reviewers.[31]
    • Completeness and accuracy of results. An easy method of assessing this is to let the searcher make a subjective assessment.[32][43][44][45] Other methods may be to use a bank of questions with known target documents[46] or known answers[14].
  • Usage
    • Self-reported
    • Measured[40]

Precision and recall

Recall is the fraction of relevant documents that are successfully retrieved. This is the same as sensitivity. The recall has also been called the "yield"[47] and comprehensiveness[48].

Precision is the fraction of retrieved documents that are relevant to the search. Precision has also been called efficiency.[48] This is the same as positive predictive value.

F1 is the unweighted harmonic mean of the recall and precision.[39][49]

Number needed to read

The number Needed to Read (NNR) is "how many papers in a journal have to be read to find one of adequate clinical quality and relevance."[50][51][52][53] Of note, the NNR has been proposed as a metric to help libraries to decide which journals to subscribe to.[50] The NNR has also been called the "burden."[47]

Number needed to search

The humber needed to search (NNS) is the number of questions that would have to be searched for one question to be well answered.[31]

Hit curve

A hit curve is the number of relevant documents retrieved among the first n results.[54][55]

Decision velocity

Time need to answer a question can be compared between two systems with a Kaplan-Meir survival analysis method.[23]

In health care, difficult questions make take hours to answer.[56]

If the correct answer to the search question is known, a logistic function can model rate of correct answers over time. The result is an S-curve (also called sigmoid curve or logistic growth curve) in which most questions are answered after an initial delay; however, a minority of questions take a much longer time.

Critical incidents

Analysis of critical incidents may help.[35]

Footnotes

  1. National Library of Medicine. Information Storage and Retrieval. Retrieved on 2007-12-12.
  2. Sagan, Carl (1993). The Dragons of Eden: Speculations on the Evolution of Human Intelligence. New York: Ballantine Books. ISBN 0-345-34629-7. 
  3. 3.0 3.1 3.2 de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D (1994). "Information retrieved from a database and the augmentation of personal knowledge". J Am Med Inform Assoc 1 (4): 328–38. PMID 7719819[e]
  4. Sparrow, Betsy; Jenny Liu, Daniel M. Wegner (2011-07-14). "Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips". Science. DOI:10.1126/science.1207745. Retrieved on 2011-07-16. Research Blogging.
  5. Mulvaney, S. A., Bickman, L., Giuse, N. B., Lambert, E. W., Sathe, N. A., & Jerome, R. N. (2008). A randomized effectiveness trial of a clinical informatics consult service: impact on evidence-based decision-making and knowledge implementation, J Am Med Inform Assoc, 15(2), 203-211. doi: 10.1197/jamia.M2461.
  6. Wright A. (2009) Exploring a ‘Deep Web’ That Google Can’t Grasp. New York Times.
  7. Shaughnessy AF, Slawson DC, Bennett JH (November 1994). "Becoming an information master: a guidebook to the medical information jungle". J Fam Pract 39 (5): 489–99. PMID 7964548[e]
  8. 8.0 8.1 Garfield, E. “ISI Eases Scientists’ Information Problems: Provides Convenient Orderly Access to Literature,” Karger Gazette No. 13, pg. 2 (March 1966). Reprinted as “The Who and Why of ISI,” Current Contents No. 13, pages 5-6 (March 5, 1969), which was reprinted in Essays of an Information Scientist, Volume 1: ISI Press, pages 33-37 (1977). http://www.garfield.library.upenn.edu/essays/V1p033y1962-73.pdf
  9. 9.0 9.1 Tanna GV, Sood MM, Schiff J, Schwartz D, Naimark DM (2011). "Do E-mail Alerts of New Research Increase Knowledge Translation? A "Nephrology Now" Randomized Control Trial.". Acad Med 86 (1): 132-138. DOI:10.1097/ACM.0b013e3181ffe89e. PMID 21099399. Research Blogging.
  10. Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).
  11. Koerner B (2008). Algorithms Are Terrific. But to Search Smarter, Find a Person. Wired Magazine. Retrieved on 2008-04-04.
  12. 12.0 12.1 12.2 12.3 12.4 12.5 Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto (2009). Modern information retrieval. Boston: Addison-Wesley. ISBN 0-321-41691-0. 
  13. Hersh, William R (2001). “Information Retrieval Systems”, Fagan, Lawrence Marvin; Shortliffe, Edward Hance; Perreault, Leslie E.; Wiederhold, Gio: Medical Informatics: Computer Applications in Health Care and Biomedicine. Berlin: Springer, 549. ISBN 0-387-98472-0. 
  14. 14.0 14.1 14.2 14.3 14.4 14.5 14.6 14.7 Hersh WR, Crabtree MK, Hickam DH, et al (2002). "Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions". J Am Med Inform Assoc 9 (3): 283–93. PMID 11971889. PMC 344588[e]
  15. 15.0 15.1 15.2 15.3 Ely JW, Osheroff JA, Ebell MH, et al (March 2002). "Obstacles to answering doctors' questions about patient care with evidence: qualitative study". BMJ 324 (7339): 710. PMID 11909789. PMC 99056[e]
  16. Porter MF. An algorithm for suffix stripping. Program. 1980;14:130–7.
  17. Schwartz LM, Woloshin S, Welch HG (2007). "The drug facts box: providing consumers with simple tabular data on drug benefit and harm.". Med Decis Making 27 (5): 655-62. DOI:10.1177/0272989X07306786. PMID 17873258. Research Blogging.
  18. Beck AL, Bergman DA (September 1986). "Using structured medical information to improve students' problem-solving performance". J Med Educ 61 (9 Pt 1): 749–56. PMID 3528494[e]
  19. Nielsen J (1996). Writing Inverted Pyramids in Cyberspace (Alertbox). Retrieved on 2007-12-12.
  20. Lau AY, Coiera EW (October 2008). "Can cognitive biases during consumer health information searches be reduced to improve decision making?". J Am Med Inform Assoc. DOI:10.1197/jamia.M2557. PMID 18952948. Research Blogging.
  21. John Battelle. The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Portfolio Trade. ISBN 1-59184-141-0. 
  22. Verhoeff, J (2001). Inefficiency of the use of Boolean functions for information retrieval system. Communications of the ACM. 1961;4:557 DOI:10.1145/366853.366861
  23. 23.0 23.1 23.2 23.3 Coiera E, Westbrook JI, Rogers K (2008 Sep-Oct). "Clinical decision velocity is increased when meta-search filters enhance an evidence retrieval system.". J Am Med Inform Assoc 15 (5): 638-46. DOI:10.1197/jamia.M2765. PMID 18579828. PMC PMC2528038. Research Blogging. Cite error: Invalid <ref> tag; name "pmid18579828" defined multiple times with different content Cite error: Invalid <ref> tag; name "pmid18579828" defined multiple times with different content
  24. Gruppen LD, Wolf FM, Van Voorhees C, Stross JK (1988). "The influence of general and case-related experience on primary care treatment decision making". Arch. Intern. Med. 148 (12): 2657–63. PMID 3196128[e]
  25. McKibbon KA, Fridsma DB, Crowley RS (2007). "How primary care physicians' attitudes toward risk and uncertainty affect their use of electronic information resources". J Med Libr Assoc 95 (2): 138–46, e49–50. DOI:10.3163/1536-5050.95.2.138. PMID 17443246. Research Blogging.
  26. Shelagh A. Mulvaney et al., “A Randomized Effectiveness Trial of a Clinical Informatics Consult Service: Impact on Evidence Based Decision-Making and Knowledge Implementation,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2461v1 (accessed December 21, 2007).
  27. Hersh WR, Hickam DH (1998). "How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review.". JAMA 280 (15): 1347-52. PMID 9794316[e]
  28. 28.0 28.1 McKibbon KA, Fridsma DB (2006). "Effectiveness of clinician-selected electronic information resources for answering primary care physicians' information needs". J Am Med Inform Assoc 13 (6): 653–9. DOI:10.1197/jamia.M2087. PMID 16929042. PMC 1656967. Research Blogging.
  29. 29.0 29.1 Westbrook JI, Gosling AS, Coiera EW (2005). "The impact of an online evidence system on confidence in decision making in a controlled setting". Med Decis Making 25 (2): 178–85. DOI:10.1177/0272989X05275155. PMID 15800302. Research Blogging.
  30. 30.0 30.1 Izcovich A, Malla CG, Diaz MM, Manzotti M, Catalano HN (2011). "Impact of facilitating physician access to relevant medical literature on outcomes of hospitalised internal medicine patients: a randomised controlled trial.". Evid Based Med 16 (5): 131-5. DOI:10.1136/ebmed-2011-100117. PMID 21949275. Research Blogging. Review in: Evid Based Med. 2011 Oct;16(5):129-30
  31. 31.0 31.1 31.2 31.3 Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG et al. (2004). "The impact of evidence on physicians' inpatient treatment decisions.". J Gen Intern Med 19 (5 Pt 1): 402-9. DOI:10.1111/j.1525-1497.2004.30306.x. PMID 15109337. PMC PMC1492243. Research Blogging. Cite error: Invalid <ref> tag; name "pmid15109337" defined multiple times with different content Cite error: Invalid <ref> tag; name "pmid15109337" defined multiple times with different content
  32. 32.0 32.1 32.2 Crowley SD, Owens TA, Schardt CM, et al (March 2003). "A Web-based compendium of clinical questions and medical evidence to educate internal medicine residents". Acad Med 78 (3): 270–4. PMID 12634206[e]
  33. 33.0 33.1 Marshall JG (April 1992). "The impact of the hospital library on clinical decision making: the Rochester study". Bull Med Libr Assoc 80 (2): 169–78. PMID 1600426. PMC PMC225641[e]
  34. 34.0 34.1 King DN (October 1987). "The contribution of hospital library information services to clinical care: a study in eight hospitals". Bull Med Libr Assoc 75 (4): 291–301. PMID 3450340. PMC PMC227744[e]
  35. 35.0 35.1 Westbrook JI, Coiera EW, Sophie Gosling A, Braithwaite J (2007 Feb-Mar). "Critical incidents and journey mapping as techniques to evaluate the impact of online evidence retrieval systems on health care delivery and patient outcomes.". Int J Med Inform 76 (2-3): 234-45. DOI:10.1016/j.ijmedinf.2006.03.006. PMID 16798071. Research Blogging.
  36. Lindberg DA, Siegel ER, Rapp BA, Wallingford KT, Wilson SR (1993 Jun 23-30). "Use of MEDLINE by physicians for clinical problem solving.". JAMA 269 (24): 3124-9. PMID 8505815.
  37. 37.0 37.1 Lancaster, Frederick Wilfrid; Warner, Amy J. (1993). Information retrieval today. Arlington, Va: Information Resources Press. ISBN 0-87815-064-1. 
  38. 38.0 38.1 Hersh, William R. (2008). Information Retrieval: A Health and Biomedical Perspective (Health Informatics). Berlin: Springer. ISBN 0-387-78702-X.  Google books
  39. 39.0 39.1 Trevor Strohman; Croft, Bruce; Donald Metzler (2009). Search Engines: Information Retrieval in Practice. Harlow: Addison Wesley. ISBN 0-13-607224-0. 
  40. 40.0 40.1 Cabell CH, Schardt C, Sanders L, Corey GR, Keitz SA (December 2001). "Resident utilization of information technology". J Gen Intern Med 16 (12): 838–44. PMID 11903763. PMC 1495306[e]
  41. Fenton SH, Badgett RG (July 2007). "A comparison of primary care information content in UpToDate and the National Guideline Clearinghouse". J Med Libr Assoc 95 (3): 255–9. DOI:10.3163/1536-5050.95.3.255. PMID 17641755. PMC 1924927. Research Blogging.
  42. Hersh WR, Buckley C, Leone TJ, Hickam DH, OHSUMED: An interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th Annual ACM SIGIR Conference, 1994, 192-201.
  43. Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME (2005). "Answering physicians' clinical questions: obstacles and potential solutions". J Am Med Inform Assoc 12 (2): 217–24. DOI:10.1197/jamia.M1608. PMID 15561792. PMC 551553. Research Blogging.
  44. Gorman P (2001). "Information needs in primary care: a survey of rural and nonrural primary care physicians". Stud Health Technol Inform 84 (Pt 1): 338–42. PMID 11604759[e]
  45. Alper BS, Stevermer JJ, White DS, Ewigman BG (November 2001). "Answering family physicians' clinical questions using electronic medical databases". J Fam Pract 50 (11): 960–5. PMID 11711012[e]
  46. Haynes RB, McKibbon KA, Walker CJ, Ryan N, Fitzgerald D, Ramsden MF (January 1990). "Online access to MEDLINE in clinical settings. A study of use and usefulness". Ann. Intern. Med. 112 (1): 78–84. PMID 2403476[e]
  47. 47.0 47.1 Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010). "Semi-automated screening of biomedical citations for systematic reviews.". BMC Bioinformatics 11: 55. DOI:10.1186/1471-2105-11-55. PMID 20102628. PMC PMC2824679. Research Blogging.
  48. 48.0 48.1 Shariff SZ, Sontrop JM, Haynes RB, Iansavichus AV, McKibbon KA, Wilczynski NL et al. (2012). "Impact of PubMed search filters on the retrieval of evidence by physicians.". CMAJ 184 (3): E184-90. DOI:10.1503/cmaj.101661. PMID 22249990. PMC PMC3281182. Research Blogging.
  49. Uzuner O, Solti I, Cadag E (2010). "Extracting medication information from clinical text.". J Am Med Inform Assoc 17 (5): 514-8. DOI:10.1136/jamia.2010.003947. PMID 20819854. PMC PMC2995677. Research Blogging.
  50. 50.0 50.1 Toth B, Gray JA, Brice A (2005). "The number needed to read-a new measure of journal value". Health Info Libr J 22 (2): 81–2. DOI:10.1111/j.1471-1842.2005.00568.x. PMID 15910578. Research Blogging.
  51. McKibbon KA, Wilczynski NL, Haynes RB (2004). "What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals?". BMC Med 2: 33. DOI:10.1186/1741-7015-2-33. PMID 15350200. Research Blogging.
  52. Bachmann LM, Coray R, Estermann P, Ter Riet G (2002). "Identifying diagnostic studies in MEDLINE: reducing the number needed to read". J Am Med Inform Assoc 9 (6): 653–8. PMID 12386115[e]
  53. Haase A, Follmann M, Skipka G, Kirchner H (2007). "Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance". BMC Med Res Methodol 7: 28. DOI:10.1186/1471-2288-7-28. PMID 17603909. Research Blogging.
  54. Herskovic JR, Iyengar MS, Bernstam EV (2007). "Using hit curves to compare search algorithm performance". J Biomed Inform 40 (2): 93–9. DOI:10.1016/j.jbi.2005.12.007. PMID 16469545. Research Blogging.
  55. Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR (2006). "Using citation data to improve retrieval from MEDLINE". J Am Med Inform Assoc 13 (1): 96–105. DOI:10.1197/jamia.M1909. PMID 16221938. Research Blogging.
  56. Del Mar CB, Silagy CA, Glasziou PP, Weller D, Spinks AB, Bernath V et al. (2001). "Feasibility of an evidence-based literature search service for general practitioners.". Med J Aust 175 (3): 134-7. PMID 11548078.

In addition:

  • Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto (2009). Modern information retrieval. Boston: Addison-Wesley. ISBN 0-321-41691-0. 
  • Shortliffe, Edward Hance; Cimino, James D. (2006). Biomedical informatics: computer applications in health care and biomedicine. Berlin: Springer. ISBN 0-387-28986-0.