Intended for healthcare professionals

CCBYNC Open access

Filtering Medline for a clinical discipline: diagnostic test assessment framework

BMJ 2009; 339 doi: (Published 18 September 2009) Cite this as: BMJ 2009;339:b3435
  1. Amit X Garg, associate professor123,
  2. Arthur V Iansavichus, information specialist 1,
  3. Nancy L Wilczynski, assistant professor 3,
  4. Monika Kastner, PhD student 4,
  5. Leslie A Baier, research assistant 3,
  6. Salimah Z Shariff, PhD student 1,
  7. Faisal Rehman, assistant professor 1,
  8. Matthew Weir, research fellow 1,
  9. K Ann McKibbon, associate professor 3,
  10. R Brian Haynes, professor 3
  1. 1Division of Nephrology, University of Western Ontario, London, ON, Canada N6A 5C1
  2. 2Department of Epidemiology and Biostatistics, University of Western Ontario
  3. 3Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada L8N 3Z5
  4. 4Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada M5T 3M6
  1. Correspondence to: A Garg, London Kidney Clinical Research Unit, Room ELL-101, Westminster, London Health Sciences Centre, 800 Commissioners Road East, London, ON, Canada N6A 4G5 amit.garg{at}
  • Accepted 30 March 2009


Objective To develop and test a Medline filter that allows clinicians to search for articles within a clinical discipline, rather than searching the entire Medline database.

Design Diagnostic test assessment framework with development and validation phases.

Setting Sample of 4657 articles published in 2006 from 40 journals.

Reviews Each article was manually reviewed, and 19.8% contained information relevant to the discipline of nephrology. The performance of 1 155 087 unique renal filters was compared with the manual review.

Main outcome measures Sensitivity, specificity, precision, and accuracy of each filter.

Results The best renal filters combined two to 14 terms or phrases and included the terms “kidney” with multiple endings (that is, truncation), “renal replacement therapy”, “renal dialysis”, “kidney function tests”, “renal”, “nephr” truncated, “glomerul” truncated, and “proteinuria”. These filters achieved peak sensitivities of 97.8% and specificities of 98.5%. Performance of filters remained excellent in the validation phase.

Conclusions Medline can be filtered for the discipline of nephrology in a reliable manner. Storing these high performance renal filters in PubMed could help clinicians with their everyday searching. Filters can also be developed for other clinical disciplines by using similar methods.


Clinicians search bibliographic databases for information to guide the care of their patients.1 2 Medline is the most popular of the databases. About 800 million PubMed searches are now done each year; in a survey in 2002, 15% of all searches were done by clinicians (personal communication, National Library of Medicine staff).3 As of February 2009, this multipurpose electronic database contained information on 18 million articles from 5363 different journals; 12 500 new articles are added each week.4 5

However, when clinicians type searches into PubMed, they often do not retrieve all the key articles relevant to the questions they are trying to answer. One way to improve this would be to filter Medline to a discipline of interest when searching. The use of filters is akin to screening for disease in high risk populations. By filtering the database to do the search with a discipline specific set of articles, the likelihood of retrieving relevant information with the remaining search terms is increased.

To search for information on the effectiveness of hepatitis B vaccination in chronic kidney disease, for example, one could type a phrase as shown in figure 1. Alternatively, one could choose to use a renal filter and simply type in the phrase “hepatitis B vaccination” (fig 2). One would then no longer be searching the entire Medline database but, rather, searching within a set of articles relevant to a discipline. Selecting a discipline filter removes the need to type in terms for that discipline. The filter would use a pre-programmed combination of medical subject headings (MeSH), explosions, subheadings, and text words of key concepts, words, and phrases to embody a discipline of interest, in this case nephrology.6 7


Fig 1 Searching without using filter


Fig 2 Searching with use of filter

Members of our team previously developed and tested Medline filters to optimise the retrieval of studies and systematic reviews of treatment, diagnosis, prognosis, aetiology, and clinical prediction guides.8 9 The filters to retrieve primary studies are part of the PubMed interface in the clinical queries section, where users search a Medline database filtered for articles of high methodological merit.10 The clinical queries filters are independent of any particular clinical discipline, such as cardiology or nephrology.

In this study, we aimed to develop new high performance filters for a clinical discipline in medicine. We chose the area of renal medicine, as clinical information in this field is published across hundreds of multidisciplinary journals and is difficult to track down.11


Study overview

We used a diagnostic test assessment framework with development and validation phases (fig 3, table 1). We divided a sample of articles from all available articles in Medline into two sets: a development dataset and a validation dataset. We produced a “reference standard” by manually reviewing a sample of articles to determine whether they contained any type of renal information. We then compared the retrieval performance of various filters made up of individual search terms and combinations of terms with the reference standard of manual review. We treated each filter as a “diagnostic test” for the identification (retrieval) of renal articles. For each filter, we constructed a two by two contingency table and quantified agreement (measures outlined in table 1). We then examined in the validation set of articles those filters that performed well in the development set of articles.


Fig 3 Data collection and filter development

Table 1

 Formulas for calculating sensitivity, specificity, precision, and accuracy of each filter to identify articles with renal information

View this table:

Sample of articles

For efficient manual review of full text articles for relevance, we first sampled a set of journals and then sampled a set of articles within those journals. We had previously compiled a list of journals that published at least one article relevant to the care of renal patients in the period from 1961 to 2005. We ranked these 466 journals by the number of articles with renal information.11 We selected the top 20 ranked journals, divided the remaining 446 journals into five equal groups, and randomly selected four journals from each group. We ordered these 40 journals by rank and randomly divided the list into either the development set or the validation set by using a block size of five journals and a ratio of three to two (table 2). We then manually reviewed all articles published in the first three months of 2006 for each journal and restricted our searches to these articles (fig 1). We reviewed all types of articles indexed in Medline, including original investigations, reviews, letters, and editorials. We initially selected two additional journals through our sampling process,12 13 but we did not consider them further because they were not available to us in electronic format.

Table 2

 Division of 40 journals into development and validation sets

View this table:

Review of each article

We previously developed a standardised checklist to determine whether an article contained renal information (developed by a team of nephrologists, see web appendix). We derived this checklist by reviewing nephrology textbooks and the MeSH thesaurus. We used this checklist to determine whether the full text of each article was relevant to nephrology (four reviewers: AVI, LAB, MK, and AXG). Using five test sets of 298 articles, all reviewers were calibrated against a nephrologist (AXG) in their application of checklist criteria (agreement beyond chance, κ=0.98).14


We compiled renal terms used in the filters from the following sources: US National Library of Medicine (NLM) medical subject heading (MeSH) thesaurus using Medline MeSH browser,15 Medline permuted index,16 Emtree thesaurus,17 SNOMED clinical terms, nephrology textbooks,18 19 20 clinical practice guidelines,21 22 website glossaries,23 24 25 26 27 28 29 30 31 195 renal systematic reviews,11 21 clinicians from eight different countries, and seven librarians from three different countries. Any term considered potentially useful by anyone involved in this process was added to the list. Examples of terms used in the filters included kidney, renal, creatinine, nephropathy, uremia, and dialysis. We considered the terms both as MeSH terms and as text words. We considered MeSH terms with and without major focus (major focus refers to records in which an index term has been tagged as the major topic of the article) and as 42 possible subheadings, and with and without explosion capability (for example, exploding the MeSH “renal replacement therapy” means the following MeSH terms are included in the search: renal dialysis, hemodialysis, peritoneal dialysis, hemofiltration, hemodiafiltration, and kidney transplantation). We considered free text words as full and truncated terms (inclusion of multiple endings achieved though use of the $ symbol—for example, nephro$), using both American and British English spelling. Terms could appear anywhere in a citation (title, abstract, subject headings, and so on) but not in the journal name only. We automated the process of combining and testing the filters by using a computer implemented algorithm. We combined single term filters with a sensitivity greater than 10% and a specificity greater than 10% into multiple term filters, as well as two term filters with a sensitivity above 75% and a specificity above 50%. We used Boolean operators “OR,” “AND,” and “NOT” to combine terms.

Statistical analysis

We calculated the sensitivity, specificity, precision, and accuracy of each filter as described in table 1. We developed and tested filters by using Ovid Medline syntax. Compared with Ovid syntax, translations provided for the PubMed interface had an accuracy of more than 99.5%.

Proof of concept searches

To examine the utility of filters, we asked five clinicians independent of the research team to each type in a PubMed search for a focused clinical question. We selected these focused clinical questions because in each case a recent systematic review had used a comprehensive method to compile relevant primary studies.32 33 34 35 36 We randomly selected the clinicians from a list of nephrologists practising in Canada and asked them to complete an online survey on their medical information gathering practices. The sample included four men and one woman, the average age was 45 (range 37-52) years, the average length of practice was 11 (5-20) years, and the average number of Medline searches done was 5 (1-15) a month. Two clinicians were practising in a centre with a nephrology training programme.

We provided the clinicians with as much time as they needed to complete the survey. We asked each clinician to search for articles on one of the following: the renal effects of statins, the benefits of fenoldopam in acute kidney injury, the benefits of tacrolimus compared with ciclosporin in kidney transplantation, the efficacy of low dose dopamine in acute kidney injury, and the benefits of intradermal compared with intramuscular hepatitis B vaccination in chronic kidney disease. We restricted each search to the search dates provided in the methods of each of the identified systematic reviews and the records indexed in Medline. In each case, we determined how many relevant articles were identified by the clinician’s search and how many relevant articles were identified when the physician’s search was combined with the best performing filters developed as part of this study.


Sample of articles—We used 4657 articles: 2649 articles from 24 journals in the development set and 2008 articles from another 16 journals in the validation set (fig 3, table 2). We manually reviewed each article, and 19.8% contained renal information (table 2). We compiled a total of 24 027 unique terms, which formed 1 155 087 unique filters (fig 3).

Single term filters—We tested the filters in the development set of articles. The best single term filters were text word “kidney” and exploded major MeSH “kidney diseases”, which achieved sensitivities of 78.7% and 57.5% and specificities of 97.2% and 98.6% (table 3). Table 3 also shows the performance of other terms such as “renal” and the exploded MeSH “renal replacement therapy”. The retrieval performance of these filters was similar in the validation set of articles (table 3).

Table 3

 Best single term filters for high sensitivity (keeping specificity ≥50%), high specificity (keeping sensitivity ≥50%), and optimal balance of sensitivity and specificity, and performance of some other single term filters from 24 027 considered. Values are percentages (95% confidence intervals)

View this table:

Multiple term filters—We tested 1 131 060 filters using a combination of two to 14 terms in the development set of articles. Top filters achieved peak sensitivities of 97.8% and specificities of 98.5% (table 4 ). The best filters included the terms “renal replacement therapy”, “renal dialysis”, “kidney function tests”, “renal”, “nephr” truncated, “glomerul” truncated, and “proteinuria”. The performance of the best filters remained excellent in the validation set of articles (table 4).

Table 4

 Top filters yielding highest sensitivity (keeping specificity >90%) and highest specificity (keeping sensitivity >90%) based on combination of up to 14 terms. Values are percentages (95% confidence intervals)

View this table:

Proof of concept searches—The retrieval of relevant studies increased when we combined the best filters with a search by a clinician (table 5). For example, in the case of searching for the renal effects of statins, the clinician’s search on its own retrieved six of the 24 relevant articles. This increased to 20/24 when we combined this search with the most sensitive filter and to 16/24 when we combined the search with the most specific filter.

Table 5

 Number of relevant articles retrieved with and without search filters

View this table:


Previous attempts to develop Medline filters for a clinical discipline have met with limited success, and many have never been validated.7 37 38 39 We succeeded in proving that Medline can be filtered for a clinical discipline in a reliable manner. Our best renal filters had a sensitivity and specificity in excess of 96%. Clinicians retrieved more clinically relevant articles when they used these filters.

Strengths and limitations

We tested more than one million renal filters, using an empirical approach to discover those with the highest performance. However, these filters help only with the renal components of any search. Limitations of the accompanying terms, such as the description of a certain treatment or diagnostic test, will continue to contribute to poor performance of searches. To develop these high performance renal filters, we sampled clinical rather than basic science journals. We also deliberately enriched the sample with primary renal journals. Although the sensitivity and specificity will not change when these filters are applied to all Medline journals, the precision will be reduced from the values shown in table 4. However, this level of precision uses a very strict definition of relevance (referenced in a systematic review), and we expect that other types of articles such as review articles and clinical practice guidelines will also be relevant to the searcher. Finally, although these filters should improve the retrieval of relevant articles compared with unaided searches, they may return a greater number of non-relevant articles (table 5).

Of course, some articles are never indexed in Medline and can only be found through other bibliographic databases such as Embase. However, even when present in Medline, some articles may never be retrieved with the filters or otherwise because of poor indexing.40 41 42 For example, the subject heading for a recent citation on diabetic nephropathy was entered as diabetic neuropathy.43 Other articles lack accurate subject headings, key words, or a proper descriptive abstract,44 45 46 47 and some medical concepts lack existing MeSH terms.40 These filters may also need future updates if important changes in vocabulary occur, as happened when the concept of “chronic renal insufficiency” began to be referred to as “chronic kidney disease.”21 48

Using these renal search filters

These best performing filters are complex, with multiple terms. Coding these renal filters into the PubMed and Ovid search engine interfaces will permit their easy use by anyone doing a search (as done with our “clinical queries,” which as of March 2009 were located on the left hand menu of the PubMed screen). In the meantime, we provide these filters at By selecting a simple filter option, one can query only those articles filtered for renal information. As of March 2009, our most sensitive filter reduced the Medline database from 18 million citations to about 780 000 citations, and the most specific filter reduced it to about 435 000 citations.

Future research

Ongoing development of filters will help to prevent relevant articles from being missed. The best filters should also minimise the number of non-relevant articles retrieved. Future research should quantify the impact of filters on real searches by clinicians, clinicians’ knowledge, medical decision making, and even patients’ outcomes.49 Such research can also consider whether searchers’ characteristics, such as expertise in searching, influence filters’ utility. The impact of different types of filters in combination should be considered, including filters made for clinical disciplines, methodological characteristics, and subsets of journals. Developing filters for specific areas within a discipline may also have additional benefits, such as filters for transplantation or acute kidney injury within the discipline of nephrology. Finally, the methods described in this study can be used to develop filters for other disciplines. Whether high performance filters can be developed for other clinical disciplines, as we have done for the renal vocabulary, remains to be seen.

What is already known on this topic

  • Previous attempts to filter Medline for a clinical discipline have met with limited success

What this study adds

  • Medline can be filtered for a clinical discipline in a reliable manner

  • The best renal filters had sensitivity and specificity in excess of 97%

  • These filters can be programmed into the PubMed interface, so they are available for everyone to use


Cite this as: BMJ 2009;339:b3435


  • We thank other members of our research team: Nicholas Hobson and Chris Cotoi, who did the computer programming, and Robert Yang who helped to develop the criteria used to assess renal information.

  • Contributors: AXG, AVI, NLW, KAM, and RBH conceived the study. AVI compiled articles and managed data. AXG, AVI, MK, and LAB rated the articles for renal relevance. NLW and RBH supervised the computer programming. All authors had full access to data and aided the interpretation. SZS organised the clinicians’ searches. AXG drafted the manuscript, and all authors revised it. AXG is the guarantor.

  • Funding: This study was funded by the Kidney Foundation of Canada. AXG was supported by a clinician scientist award from the Canadian Institutes of Health Research. The researchers were independent of the funders. The funders had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

  • Competing interests: None declared.

  • Ethics approval: The study was approved by the regional ethics board of the University of Western Ontario. The five clinician searchers provided informed consent for study participation.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: and


View Abstract