Information In Practice

Information retrieval for patient care

BMJ 1997; 314 doi: (Published 29 March 1997) Cite this as: BMJ 1997;314:950
  1. Martin Gardner, clinical research fellow (martin{at}
  1. a Information Retrieval Research Group Department of Computing Science University of Glasgow Glasgow G12 8QQ,
  • Accepted 5 March 1997


Doctors need clinical information during most consultations with patients, and much of this need could be satisfied by material from online sources. Advances in data communication technologies mean that multimedia information can be transported rapidly to various clinical care locations. However, selecting the few items of information likely to be useful in a particular clinical situation from the mass of information available is a major problem. Current information retrieval systems are designed primarily for use in research rather than clinical care. The design, implementation, and critical evaluation of new information retrieval systems for clinical care should be guided by knowledgeable clinical users.


During a consultation with a patient, a doctor must consider two types of information–the patient's medical record and medical knowledge relevant to the present problem. Nowadays, a doctor who chose, as routine practice, to rely on his or her memory of a patient's medical record rather than actually examining the record as part of the consultation might be considered eccentric, complacent, and possibly negligent. There is a huge and rapidly expanding volume of information in journals, textbooks, and other data sources constituting the body of knowledge on which modern medical practice is or should be founded; yet how often is any of this information examined during clinical consultations?

Increasing numbers of doctors recognise the need for such clinical information, and in the near future computer devices and communications networks will be capable of supplying information to the point of care (not only to consulting rooms but to hospital bedsides, operating theatres, patients' homes, road accidents, etc). Unfortunately, a major barrier to progress is that there is as yet no satisfactory solution to the problem of finding those few items of information most likely to be useful in any given situation among the mass of data available.

This general problem of information retrieval is not new or unique to medicine. Over the past 40 years, research into information retrieval has become a large and active discipline with applications in subjects such as law, finance, defence, publishing, research, and entertainment. A consequence of the recent explosive growth of the world wide web is that millions of people now use information retrieval systems on a daily basis.

There are three reasons why doctors should be aware of the principles of information retrieval technology. Firstly, one criterion by which you may judge the maturity of a technology is whether good performance requires technical, in addition to procedural, knowledge. By this criterion, word processing technology, for example, is mature since you can become a highly proficient user without any technical knowledge about how commands are translated into results. Unfortunately, information retrieval technology is not mature in this respect, and doctors who have some understanding of how information retrieval systems work will get better results than those who do not. Secondly, as a setting for information retrieval, the point of care is very different from other contexts in which information retrieval systems are used. Technology for supplying clinical information to the point of care cannot become mature without the insight, guidance, and commitment of knowledgeable users. Thirdly, information retrieval systems are likely to become important items in healthcare budgets. Purchasing decisions should be influenced by informed clinicians at a local level.

Information needs at the point of care

Many doctors now recognise the need for reference information at the point of care. Indeed, the inaugural article of this section of the BMJ addressed just this topic.1 Richard Smith presented evidence that information needs arise in nearly every consultation between a doctor and a patient, that many of these needs could be satisfied by material in reference sources, and that improved outcomes might accrue. There are problems of both memory registration and memory recall. Thus, no doctor can have read all the information relevant to any particular clinical decision, nor can a doctor expect to have impeccable recall of that sample which he or she has read. In future, healthcare purchasers may expect doctors to justify individual clinical decisions with explicit reference to evidence. More importantly, timely provision of information to the point of care could promote patients' ability both to participate in clinical decision making and to accept responsibility for the outcome.

The information needs of clinicians at the point of care are very different from those of academic researchers in a library or laboratory. Clinicians require access to a much wider range of material (not only journal articles but also passages from textbooks, drug monographs, protocols for patient care, medicolegal information, reference images (of dermatopathology perhaps), etc). They practise in a wide variety of environments (patients' homes and workplaces, wards, clinics, treatment rooms, etc), where standard desktop computers may not be available but information is still required. While researchers require the maximum number of information items with relevance to the topic but do not need rapid browsing interfaces, clinicians require a small representative sample of information items useful for decision making presented in a rapidly browsable manner. The problem of unperceived information needs is much greater for clinicians.

Supplying information to the point of care

The supply of information to the point of care relies on four technologies; information sources in digital form, data communication networks, computer devices at the point of care, and information retrieval systems.

There is already a large volume of medically related information available in digital format. This includes abstracts of journal articles (and the full contents of some), full contents of textbooks, clinical trial repositories, care protocols, critical incident reports, libraries of medical images, medical audio libraries (such as characteristic heart sound recordings of hundreds of cardiac disorders), and video clips of medical procedures (such as endoscopy and fibreoptic intubation). In the near future there will be an explosive increase in the volume of information available in digital format.

Data communication technology is advancing rapidly. Some hospitals already have ATM (asynchronous transfer mode) networks capable of transporting a full size, high resolution chest radiograph in less than a second. A number of general practice surgeries have local area network or ISDN (integrated service digital network) connections. It is now also possible to transmit data at acceptable rates using mobile phone links, or infrared waves over short distances.

Currently available battery powered laptop computers, which are small enough to carry in a briefcase to a patient's home, now have storage devices that can hold the entire contents of many thousands of journal articles or several large textbooks. You can buy palmtop computers which can store several megabits of information, display text and pictures, recognise handwriting, be activated by speech, and be connected to the Internet with a mobile phone.

Advances in these three technologies highlight the need for progress with the fourth–information retrieval.

What is information retrieval?

Most information retrieval researchers would agree that there is no simple definition of information retrieval. I will attempt an implicit definition by citing examples. Systems which involve information retrieval include Index Medicus, commercial interfaces to Medline (such as Ovid and SilverPlatter), and search engines for the world wide web such as AltaVista or Lycos. Systems that do not involve information retrieval include age and sex registers and most electronic medical records–these might rather be termed database systems.

Fundamentally, the differences between information retrieval systems and database systems stem from the fact that information objects in the former tend to be large, complex, heterogeneous, and loosely structured (such as journal articles, book sections, images, audio or video clips, executable programs), whereas in the latter they tend to be small and simple with known value ranges (birth dates, diagnostic codes, lab results, drug prescriptions). As a consequence, information retrieval systems must address the problem that the relevance of any particular information object to a user's need for information is generally both partial and uncertain. Not only is it difficult for users to create queries that accurately reflect their needs, but their very conception of those needs is often initially vague and can be clarified only by a “dialogue” with the information retrieval system. Thus information retrieval is inherently an interactive process.

Boolean system of information retrieval

Information need

A patient with epilepsy wishes to know whether it is safe to accept a job which involves prolonged use of a computer screen. What is the evidence?

User input

The doctor constructs a query expression such as “(video or monitor or screen*) and epilepsy”

Note that “screen*” will match any word beginning with “screen”–such as “screens” and, less appropriately, “screened.” It is also usually possible to express metaconstraints–for example, that a term must appear in the document title or that the document is written in a particular language or published in a particular range of years

System response

The system finds all documents that match the query expression and the metaconstraints (any document which mentions epilepsy and any one of the other three terms) and presents these as a list

Examples of systems based on boolean model

BIDS Embase; most commercial Medline search systems

Current medical information retrieval systems

There are several theoretical models of information retrieval. Most doctors will have used information retrieval systems based on the boolean model (after George Boole, a 19th century English logician), and some will have used information retrieval systems based on what I shall call, for the purposes of this paper, the ranking model.

Information retrieval based on the boolean model

Within the boolean model, documents (or their surrogates–that is, titles, abstracts, or lists of key words) are considered to be mathematical expressions. In order to find documents of interest, the user in effect creates another expression consisting of terms such as “myasthenia” and “prognosis” and operators with the meaning of and, or, not. (In most systems each term may be specified as a subexpression using “wild card” characters). For each document in the collection, the information retrieval system attempts a process of unification–that is, it attempts to find substitutions that make both expressions the same.

The boolean model has some attractive features. With appropriate indexing structures, boolean systems can run fast on relatively cheap computers. Also, although creating effective boolean expressions can be difficult, the principle is conceptually simple, and it is easy for users to see why documents do or do not match the query.

A minor disadvantage of this model is that the user must learn the syntax for expressing queries (generally different for each system). Two much more serious problems result from the all-or-none nature of the matching. Firstly, these systems tend to exhibit “brittle” responses to modification of a query: a common experience is that a query with three terms returns no matching documents, but removal of any one of the terms produces thousands of matches. Secondly, the matching documents cannot be ordered in any useful manner–the most relevant document might be presented as 39th in a list of 122. Though no more than a nuisance for a research user, this is unacceptable for a clinical user at the point of care.

Information retrieval based on the ranking model

Within the ranking model, documents are considered to be objects described by the values of properties related to the words they contain. You can then devise summative measures to assess the similarity between documents. Crudely, a document mentioning “hypertension” three times is considered more similar to one that mentions it four times than to one which does not mention it at all. In order to pose a query, the user constructs a mini-document (essentially just a list of terms without the need for operators or any system specific syntax), and the documents in the collection can be ranked in order of their similarity to it.

Ranking system of information retrieval

Information need

An old man who has been taking low dose digoxin for many years hears that a friend in similar circumstances has been advised to discontinue his treatment. Should he do likewise?

User input

The doctor types a list of words such as “usefulness of low dose digoxin in old person”

System response

Ranking systems typically ignore words such as “of” and “in,” augment the query with synonyms from a thesaurus (in this case adding words such as “utility” and “elderly”), and then derive a quantitative measure of similarity between this augmented query and all the documents in the collection

Examples of systems based on ranking model

Knowledge Finder; most general purpose search engines for the world wide web

Some vendors describe ranking systems as natural language systems. This they most certainly are not. For example, in response to the query “Minoxidil and hypertension but not hair follicle stimulation,” all the top 20 documents returned by one such system were about hair growth.

Many similarity metrics have been used, most of which involve weighting terms according to their distribution in the document collection and making corrections for document length. Although these are computationally demanding, advances in computer power mean that systems based on this model are now commercially available, including search engines for the world wide web that analyse millions of pages.

This approach still suffers from the problem of ambiguous terms, particularly when queries are short: for example, “arms” could reference anatomy or weaponry, “tears” could refer to crying or ripping, and “blind loop syndrome” seldom affects vision. Longer queries (5-10 terms or more) give better results, but the vast majority of users construct queries of only one or two terms, perhaps wrongly applying their experience of boolean systems.

Progress in information retrieval

Currently, nearly all fielded medical information retrieval systems suffer from four further limitations.

  • The only medium that can be handled is text

  • Each information retrieval system can search only one information collection, and each collection can be searched by only one information retrieval system

  • Systems cannot adapt their responses to different user circumstances or behaviour

  • There is no linkage between information retrieval systems and patient record systems.

However, research in information retrieval is addressing these limitations, and I give a brief overview of progress.

Multimedia information retrieval

Although some workers have addressed retrieval of audio and video information, most research in multimedia information retrieval is currently focused on static images. There are basically two approaches, one based on tagging and the other on image content.

Image tagging–This approach requires that for each image there is an associated piece of text, so that an image can be retrieved by applying existing information retrieval techniques to its textual partner. The associated text might be created by manual annotation, or inferred automatically from the image context (for example, by assuming that words that appear on the same page as a given image are likely to be related to the image's content). This approach is likely to have wide application in medicine because of the ubiquity of pairing image and text information in clinical medical specialties such as radiology, pathology, and microbiology.

Image content–The second approach involves direct matching of image content in terms of relations between the shapes, volumes, colours, and textures that constitute the image. Work on texture matching seems particularly fertile. I can foresee a general practitioner, during a consultation, submitting a digitised photograph of an unusual rash as a query to an information retrieval system, which then returns the best matching images from an annotated reference collection, together with text describing the characteristic features and diagnoses appropriate to each image.

Distributed information retrieval

Clinicians at the point of care require access to a wide variety of information sources, and the world wide web provides a physical infrastructure capable of supporting this. Because of severe time constraints, clinicians should be able to search all appropriate sources with a single interface and a single query.

Prototype systems are now able to translate users' queries into formats acceptable to several different resources and automatically forward these translations in parallel. Further research is needed on methods of predicting which sources are worth searching for any given information need (since the costs of searching all sources would be too high) and on merging the results for presentation to the user (since ranking metrics in different systems are not comparable).

Cognitive dimensions

Ten years ago a canny clinician who needed information would resort not to an inanimate information retrieval system but to a friendly medical librarian, who would clarify the requirements, undertake a library search when there was time, and assess the results in terms of some measure of relevance. In this “mediated mode” of use, the information retrieval system stands in a relation of cognitive isolation to the clinician, to the clinical context in which the need for information arose, to the clinical task addressed, and to the outcome for the patient. In the future information retrieval will increasingly be undertaken in “immediate mode”: that is, by a clinician personally during a consultation in pursuit of answers to a particular clinical problem.

Accordingly, cognitive issues are now a major focus of research in information retrieval. Four important themes in this work are modelling the user, context, and task (so the information retrieval system can adapt its responses to particular circumstances); interactivity (since clarification of the information need is now the responsibility of the clinician); presentation (the amount of time required to assimilate results is crucial to an immediate user); and evaluation (the information retrieval system should be assessed in terms of its impact on the process of consultation as a whole).

Information retrieval systems in the future

Information need

A middle aged lady with rheumatoid arthritis who is taking indomethacin has made an appointment to see her general practitioner because of shoulder pain

System response

Before the patient arrives, demographic details, coded problem and symptom lists, clinical notes, prescriptions, and consultation histories are automatically transferred from the computerised medical record to the information retrieval system. This searches multiple sources for recent, context specific, information (such as related to differential diagnosis of shoulder pain in an adult with rheumatoid arthritis, clinical protocols for management of rheumatoid arthritis, and complications associated with indomethacin) and arranges these in a simple menu for perusal by the doctor. In addition, the patient may wish to view the information before or after the consultation

Integration with medical records

In principle the integration of information retrieval systems with electronic medical record systems could have three benefits: firstly, saving time, since users would not have to switch between applications and terms could be manually copied (by “cut and paste”) or automatically transferred between systems; secondly, improved retrieval effectiveness, since terms from the medical record could be used to enhance the context specificity of a user's query; and, thirdly, support for system initiative–that is, a system might conceivably raise alerts in situations of unperceived need (in which a user was not aware of important new documents and did not initiate a search).

Several research systems have been constructed. Most depend on automated methods for matching terms in free text or problem coded medical records to UMLS (unified medical language system) Metathesaurus concepts, from which queries based on MeSH (medical subject headings) terms can be constructed. Another approach is manually to compile generic queries, each reflecting one type of commonly occurring question, then allow the user to choose terms from the medical record to be incorporated into the generic query. Integration with medical record systems is likely to be an essential feature for future information retrieval systems working at the point of care, but considerable further developments are needed for convincing clinical benefit to occur.


Many doctors may have concerns about patients' attitudes to information searching as part of a clinical consultation, or may themselves feel uncomfortable with it. If a doctor was to consult a financial adviser, however, would the doctor not have more confidence in one who could present, explain, and evaluate relevant information so as to reach decisions on a basis of shared responsibility, rather than one who simply told the doctor what was best?

Recommended reading

Information Retrieval: A Health Care Perspective by William R Hersh (Springer-Verlag, 1996. ISBN 0 387 94454 0). Written by a practising doctor, the book has a strong clinical orientation and is detailed, accessible, and up to date (except for the section on the world wide web, which, having been composed more than six months ago, is inevitably obsolescent)

Another concern is the practicality of information searching. Even if they are convinced of its benefit, doctors may nevertheless feel that frequent searches are not compatible with an average consultation time of seven minutes. This concern has considerable force with regard to currently available information retrieval technology, and most searches must be undertaken outside clinical hours–that is, it is necessary to adapt the task to suit the technology. A better solution is to develop new information systems specifically for clinical use–that is, to adapt the technology to suit the task.

I believe that the use of information retrieval as a tool in clinical consultations will become as commonplace as the use of a stethoscope is today. New approaches to information retrieval will be required if the potential benefit is to be maximised. There is also a pressing need for new methods of evaluating information retrieval systems for use at the point of care, not simply in terms of whether users consider the information retrieved to be relevant but taking into account the effect of information retrieval on the clinical process as a whole. Such developments will depend on the existence of a large community of users of clinical information with the knowledge and ability to participate in the critical evaluation of new information retrieval systems.


I thank Professor Keith van Rijsbergen, Department of Computing Science, University of Glasgow, for his help in preparing this article.

Funding: My work investigating new architectures for medical information retrieval systems is funded by the BJA: International Journal of Anaesthesia.

Conflict of interest: None.


  1. 1.
View Abstract