Value of symptoms and additional diagnostic tests for colorectal cancer in primary care: systematic review and meta-analysisBMJ 2010; 340 doi: http://dx.doi.org/10.1136/bmj.c1269 (Published 01 April 2010) Cite this as: BMJ 2010;340:c1269
- Petra Jellema, research fellow 1,
- Daniëlle A W M van der Windt, professor in primary care epidemiology12,
- David J Bruinvels, senior researcher3,
- Christian D Mallen, senior lecturer in general practice2,
- Stijn J B van Weyenberg, affiliated professor of gastroenterology4,
- Chris J Mulder, professor of gastroenterology4,
- Henrica C W de Vet, professor of clinimetrics5
- 1Department of General Practice, EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, Netherlands
- 2Arthritis Research UK National Primary Care Centre, Keele University, Keele, Staffordshire ST5 5BG
- 3Department of Public and Occupational Health, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam
- 4Department of Gastroenterology and Hepatology, VU University Medical Center, Amsterdam
- 5Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam
- Correspondence to: H C W de Vet
- Accepted 1 February 2010
Objective To summarise available evidence on diagnostic tests that might help primary care physicians to identify patients with an increased risk for colorectal cancer among those consulting for non-acute lower abdominal symptoms.
Data sources PubMed, Embase, and reference screening.
Study eligibility criteria Studies were selected if the design was a diagnostic study; the patients were adults consulting because of non-acute lower abdominal symptoms; tests included signs, symptoms, blood tests, or faecal tests.
Study appraisal and synthesis methods Two reviewers independently assessed quality with a modified version of the QUADAS tool and extracted data. We present diagnostic two by two tables and pooled estimates of sensitivity and specificity. We refrained from pooling when there was considerable clinical or statistical heterogeneity.
Results 47 primary diagnostic studies were included. Sensitivity was consistently high for age ≥50 (range 0.81-0.96, median 0.91), a referral guideline (0.80-0.94, 0.92), and immunochemical faeces tests (0.70-1.00, 0.95). Of these, only specificity of the faeces tests was good. Specificity was consistently high for family history (0.75-0.98, 0.91), weight loss (0.72-0.96, 0.89), and iron deficiency anaemia (0.83-0.95, 0.92), but all tests lacked sensitivity. None of these six tests was (sufficiently) studied in primary care.
Conclusions Although combinations of symptom and results of immunochemical faeces tests showed good diagnostic performance for colorectal cancer, evidence from primary care is lacking. High quality studies on their role in the diagnostic investigation of colorectal cancer in primary care are urgently needed.
Colorectal cancer is the second most common cancer in Europe.1 2 The five year survival rate for early stage colorectal cancer is greater than 90%, whereas the five year survival rate for those diagnosed with widespread cancer is less than 10%.2 3 Early diagnosis is therefore of utmost importance. As patients with abdominal symptoms usually present to primary care,4 it is important that general practitioners can identify those at increased risk. This is not straightforward as abdominal symptoms are common in general practice,5 but each year a general practitioner would probably encounter no more than one new patient with colorectal cancer.6
Diagnostic tests could help general practitioners in the diagnostic process. To be of value in primary care, diagnostic tests should be directly accessible to general practitioners and their diagnostic accuracy should have been demonstrated in this setting. These include the signs and symptoms found with medical history and physical examination, blood tests, and faecal occult blood tests. Several guidelines have been developed to assist general practitioners in the diagnostic process. For example, in 2000 the Department of Health of England and Wales introduced guidelines so that all patients with suspected colorectal cancer could be seen by a specialist within two weeks of referral (TWR guideline, see appendix A on bmj.com).7 This referral guideline, however, has been criticised for using symptoms that are so common among the general practice population (such as change in bowel habits) that many referrals can falsely be classified as high risk.8 Although the evidence for6 and compliance with9 this guideline has already been reviewed, as has its effect on colorectal services,10 a meta-analysis of the diagnostic performance of the guideline itself is lacking. Other researchers advocate faecal blood testing in patients with symptoms as a guide to the urgency of investigation.11 12 Guaiac based tests are inexpensive but sensitive to diet and medication, and immunochemical based tests react only to human haemoglobin13 but are more expensive ($15 (€11) v $22 (€16), respectively14). In our hospital costs are around €11.80 (£10.60) and €18.00 (£16.20), respectively.
The challenge in primary care is to find a sensitive test that does not result in too many false positives.15 We summarised all the available evidence on the diagnostic performance of age, family history, weight loss, individual signs and symptoms; combinations of symptoms, referral guidelines; blood tests (such as for anaemia); and faecal occult blood tests in diagnosing colorectal cancer in adult patients with symptoms.
Data sources and searches
We searched PubMed and Embase for eligible diagnostic studies (all publications to September 2008). The search strategy used MeSH/EMTREE terms and free text words, and included subsearches related to the study population, index test, target condition, and publication type. We added a methodological filter to increase the specificity of the search. This sensitive filter was created by combining three filters for the identification of diagnostic studies via the Boolean operator “OR”.16 17 18
Reference lists of all retrieved primary diagnostic studies were checked for additional relevant diagnostic studies. Additionally, we checked references of relevant reviews, meta-analyses, guidelines, and commentaries identified in PubMed and Embase.
Two authors (PJ, DvdW) independently applied the predefined selection criteria. PJ checked all citations (titles and abstracts) identified by the search strategy, while DvdW checked eligibility of all citations assessed by PJ as (possibly) relevant. Consensus meetings were organised to discuss any disagreement regarding selection. Full publications were retrieved for studies that seemed relevant, and for those for which relevance was still unclear. A third review author (DB) was consulted in cases of persisting disagreement.
Participants, setting, and study design
We considered studies eligible if the study population consisted of adult patients consulting a physician with non-acute lower abdominal symptoms. Therefore, population based or screening studies—that is, studies that include people without abdominal symptoms—were excluded. We defined “non-acute” as being present for at least two weeks.19 Although primary care is the setting of interest, in some countries primary care is not well defined. Therefore, we decided to additionally include studies performed at the interface between primary and secondary care, such as two week referral clinics and open access outpatient clinics. In open access clinics, patients’ characteristics and the spectrum of disease might resemble those found in primary care populations. As not all publications clearly reported whether or not an outpatient clinic was directly accessible to patients, however, we decided to select only those secondary care studies with a prevalence of colorectal cancer of less than 15%. By using this criterion, which was the highest prevalence reported in the primary care studies, we tried to minimise the risk of bias from diagnostic pre-selection. Studies with hospital inpatients were also excluded.
We included primary diagnostic studies with a cohort design and case-control designs in which controls formed a representative sample of all patients with abdominal symptoms. We excluded studies for which we could not extract or reconstruct two by two tables, studies written in a language other than English, Dutch, German, or French, and reviews, editorials, and case reports.
We included studies that used colonoscopy, barium enema, or clinical follow-up as reference standards to diagnose or exclude colorectal cancer. Studies that used sigmoidoscopy as the single reference test were excluded.
We included studies on tests that can be carried out or are usually accessible in primary care, specifically age, family history, weight loss, individual signs and symptoms; combinations of symptoms, including referral guidelines; blood tests; and faecal occult blood tests. Studies reporting data only on main indications for colonoscopy were excluded as they ignore the presence of additional symptoms. As ultrasonography is not commonly used in primary care we excluded this test.
Data collection and quality assessment
The reviewers extracted data on setting and design, study population, test characteristics, and test results. Methodological quality was assessed with a modified version of the quality assessment of diagnostic accuracy studies (QUADAS) tool,20 which is recommended by the Cochrane Diagnostic Reviewers’ Handbook.21 This modified version consists of 11 items on methodological characteristics that have the potential to introduce bias (see appendix B on bmj.com). Items were scored as positive (no bias), negative (potential bias), or unclear.
Two reviewers assessed each paper: PJ extracted data from all studies while HdV, DvdW, and DB each extracted data from a third of the studies, independently from each other and using a standardised form. Agreement between observers was quantified and disagreements were resolved by consensus meetings.
As recommended by the designers of the QUADAS tool we did not apply weights to the QUADAS items or use a summary score in the analysis. Instead, we used subgroup analyses to explore whether scores on the following quality items explained variation in diagnostic performance: item 1 (validity of study sample), item 2 (test review bias), item 5 (validity of reference standard), and item 7 (differential verification bias). These items have been shown to result in biased estimates of diagnostic performance in empirical studies.22 23
Data synthesis and statistical analysis
We examined diagnostic two by two tables and diagnostic performance measures per study (sensitivity, specificity, predictive values). We also looked at study results by setting.
Positive predictive values (PPV) and the reverse of negative predictive values (1−NPV), represent the probability of colorectal cancer in patients with a positive or negative test result, respectively. These measures provide a clear indication of the diagnostic value of a test—that is, the extent to which the prior probability of colorectal cancer is modified by either a positive or a negative test result. To illustrate results of relevant diagnostic tests we present forest plots of PPV and 1−NPV.
We used MetaDiSc statistical software to calculate diagnostic performance measures and corresponding 95% confidence intervals.13 24 When four or more studies on a specific index test showed sufficient clinical and statistical homogeneity, we used bivariate analyses25 to calculate pooled estimates and 95% confidence intervals for the summary estimates of sensitivity and specificity, and of positive and negative predictive values. The bivariate analyses take into account variability within and between studies and the dependency between either sensitivity and specificity or positive and negative predictive values. Bivariate analyses based on a random effect model perform better than SROC regression models derived with the Moses and Littenberg method, which departs from a fixed effects model.26 We defined statistical heterogeneity as non-overlapping confidence intervals for estimates of diagnostic parameters and a difference in these estimates among the studies of more than 20%. When assessing heterogeneity we always simultaneously considered sensitivity and specificity (or PPV and 1−NPV). In case of statistical or considerable clinical heterogeneity (in terms of characteristics of populations or tests) we refrained from pooling and presented median values and ranges instead.
Investigations of heterogeneity
Factors that can contribute to variation in diagnostic performance across studies (heterogeneity) include differences in (a) setting (primary care v primary-secondary care interface v secondary care); (b) prevalence of CRC (<5% v ≥5%), (c) tumour location (rectum v other left sided (sigmoid, colon descendens, flexura lienalis) v right sided (rest)); (d) cancer type (Dukes’s A and B v Dukes’s C and D); (e1) faecal occult blood tests (guaiac v immunochemical); (e2) guaiac based faecal occult blood tests (dietary restrictions v no restrictions); (e3): guaiac based faecal occult blood tests (self test v regular test); (f) QUADAS items 1, 2, 5, or 7 (as described above). Subgroup analyses (a), (b), (e1), (e2), and (f) concern analyses between study subgroups, while (c), (d), (e1), and (e3) concern analyses within studies.
Subgroup analyses were performed only when each subgroup included data of at least four diagnostic studies. In case of statistical homogeneous results for both sensitivity and specificity per subgroup, we calculated pooled estimates using bivariate analyses. In case of statistical heterogeneous results, we presented the range of sensitivity and specificity per subgroup. Studies that provided insufficient information on a factor could not be included in that specific subgroup analysis.
Literature search and study selection
The literature search yielded 2859 references. A total of 421 full papers were retrieved, of which 38 were finally considered relevant for the review.11 12 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Reference checking yielded 11 additional relevant papers.8 63 64 65 66 67 68 69 70 71 72 As four papers8 29 30 72 presented information on two studies, our total number of primary diagnostic studies for inclusion was 47. Figure 1 summarises the search results⇓.
Full details of the 47 included studies are in appendix C on bmj.com. All studies were cohort studies on patients with abdominal symptoms. Nine studies took place in primary care, with the prevalence of colorectal cancer ranging from 3% to 15%.31 33 34 40 44 49 61 63 70 Signs and symptoms were the main index tests in these studies. Seven studies used rectal bleeding as the inclusion criterion.33 34 40 44 49 61 70 Five studies were performed at the interface between primary and secondary care, with prevalence of colorectal cancer ranging from 9% to 14%.28 65 66 67 68 Three studies included individual referral criteria as the index test28 65 68; four studies used the referral guideline itself (that is, combination of criteria).65 66 67 68 Of the 33 studies in secondary care, 20 were performed in diagnostic clinics (colonoscopy,8 27 32 36 37 39 41 43 45 46 50 51 52 53 55 71 double contrast barium enema54 57 62 64) and 13 in outpatient clinics.11 12 29 35 38 42 47 48 56 58 59 60 69 Prevalence of colorectal cancer ranged from 0.4% to 15%.
On average, the reviewers disagreed in three out of 11 items (range 1-6 across studies). Table 1⇓ presents the results of the quality assessment after consensus. Potential sources of bias most frequently identified concerned an invalid reference standard (item 5) and differential verification bias (item 7). Valid selection and representativeness of study populations (item 1), blind interpretation of results of the reference standard (item 8), and length of the period between index test and reference standard (item 9) were poorly described (that is, score unclear). Generally, 12 studies performed well, receiving a positive assessment of at least eight out of 11 QUADAS items.27 41 42 43 50 51 52 54 55 59 60 70
Diagnostic performance of individual characteristics
Table 2⇓ summarises the findings, including the results of tests that have been studied by at least four primary diagnostic studies.
Age, sex, family history, and weight loss
Results for age and sex are summarised in table 3⇓ and for family history and weight loss in table 4.⇓ For age, sensitivity and specificity were strongly dependent on the cut-off value; the lower the cut-off score (such as age ≥40), the higher sensitivity and the lower specificity.8 27 29 33 34 44 45 49 55 59 61 62 Figure 2⇓ shows the PPV and 1−NPV using a cut-off of ≥50 for age. Pooled estimates (six studies) showed that patients aged ≥50 had a 10% risk of colorectal cancer (95% confidence interval 7% to 13%), while patients aged <50 had a risk of 2% (1% to 3%). There is a sharp decrease in sensitivity with a cut-off for age of ≥70 compared with a cut-off of age ≥60 (median 0.50 and 0.83, respectively) (table 2)⇑. For sex (male) sensitivity ranged from 0.37 to 0.78, while specificity ranged from 0.46 to 0.57 (table 3)⇓.8 28 34 40 44 46 49 55 62 The risk for colorectal cancer in men is somewhat higher than in women (0.07 v 0.04), but confidence intervals overlap (table 2)⇑. For family history (present)27 27 29 29 32 32 40 46 46 62 70 and weight loss (present)8 28 29 34 40 44 45 46 49 54 61 62 70 specificity seemed to be rather consistent and high (medians 0.91 and 0.89, respectively) (tables 2 and 4).⇑ ⇓ Sensitivity, however, ranged from 0.00 to 1.00 for family history and from 0.13 to 0.44 for weight loss. For all four factors visual inspection showed no differences between the different settings of care.
Five studies reported on the diagnostic performance of a palpable mass (table 5)⇓.29 34 61 65 68 Sensitivity ranged from 0.04 (abdominal tumour) to 0.25 (rectal mass), while specificity ranged from 0.89 to 0.99 (rectal mass). In the study of Flashman et al general practitioners identified in the same cohort of patients many more palpable abdominal or rectal masses than clinicians in the clinic (43 v 22 and 53 v 28, respectively).68 Of the 43 patients identified by the general practitioner as having an abdominal mass, seven (16%) were diagnosed with colorectal cancer compared with four of the 22 (18%) identified in the clinic. Of the 53 patients identified by the general practitioner as having a rectal mass, 12 (23%) were diagnosed with colorectal cancer compared with 13 of 28 (46%) identified in the clinic.
Individual symptoms most commonly investigated included abdominal pain, rectal bleeding, (change in) bowel habit, and peri-anal symptoms. For abdominal pain (20 studies)8 12 27 28 29 34 40 44 45 46 49 54 55 59 60 61 62 64 70 71 test results were heterogeneous with sensitivity ranging from 0.00 to 0.73 and specificity from 0.19 to 0.91 (table 2).⇑ In four of the 13 secondary care studies (table 5⇑) the risk for colorectal cancer was significantly lower among those with abdominal pain than among those without.8 12 54 60
Table 6 shows data on rectal bleeding (13 studies8 12 27 29 45 46 54 55 59 60 62 64 71).⇓ Sensitivity ranged from 0.25 to 0.86, while specificity ranged from 0.31 to 0.88 (table 2).⇑ Comparing the risk for colorectal cancer in those with a positive test result with those with a negative test result shows that patients with rectal bleeding, and also patients with blood mixed with stool have a somewhat higher risk (pooled estimates 0.07 and 0.06, respectively) than those without (pooled estimates 0.04 and 0.03, respectively) (table 2⇑, fig 3)⇓. Confidence intervals, however, overlap each other. Patients with dark blood have a significantly higher risk than those without dark blood (pooled estimates 0.14, 0.09 to 0.21, and 0.05, 0.03 to 0.07, respectively) (table 2⇑, fig 4)⇓.
Table 7 shows data on change in bowel habits (18 studies8 12 27 29 33 34 40 44 45 49 54 55 59 60 62 64 70 71).⇓ Results were heterogeneous with sensitivity ranging from 0.06 to 1.00 and specificity from 0.28 to 0.94 (table 2).⇑ For eight studies confidence intervals for positive and negative test results did not overlap (table 2⇑, fig 5)⇓, indicating that the risk for colorectal cancer was significantly higher among those with change in bowel habit than among those without.8 29 33 34 44 59 60 71 For diarrhoea (six studies) sensitivity ranged from 0.06 to 0.25 and specificity from 0.65 to 0.79,27 45 55 70 71 with the exception of the study of Pepin et al,46 with a specificity of 0.96 (table 7).⇓ That study, however, used constipation as inclusion criterion. For constipation (four studies)27 45 70 71 sensitivity ranged from 0.00 to 0.51 and specificity from 0.53 to 0.90.
For peri-anal symptoms (five studies) the diagnostic performance depended on the definition used (table 7)⇑. When anal itch or anal protrusion was studied,40 sensitivity was significantly lower (0.06) than when a more general definition such as peri-anal symptoms was used (0.36 to 0.56).8 33 59 Patients with peri-anal symptoms might have a lower risk of colorectal cancer than patients without such symptoms, although the opposite might be true for the presence of peri-anal eczema.34
Of the remaining symptoms (table 8⇓) the presence of “mucus mixed with blood” might be informative as the risk of colorectal cancer was 14% for those reporting this symptom compared with 3% for those without, but only one study investigated it.8
Diagnostic performance of symptom combinations
Five primary care,33 34 44 49 63 three primary-secondary interface,28 65 68 and four secondary care studies8 28 41 59 60 65 68 presented diagnostic data on a whole range of symptom combinations, including two classification systems that were originally developed to differentiate organic from non-organic disease (Bellentani and Kruis criteria63), a self developed prediction rule by Fijten et al,34 and an experience based scoring method to predict colorectal cancer (Selva score8) (table 9⇓). The three primary-secondary interface studies presented diagnostic data on individual referral criteria of the two week referral guideline.
Sensitivity ranged from 0.03 for a combination of abdominal pain without rectal bleeding or change in bowel habit,59 to a sensitivity of 1.00 for a prediction rule including age, change in bowel habit, and blood mixed with or on stool.34 Specificity ranged from 0.50 for a combination of change in bowel habit and age ≥4528 to a specificity of 0.96 for a combination of rectal bleeding, (absence of) peri-anal symptoms, and age ≥60.68 A prediction rule showed favourable results for both sensitivity (1.00) and specificity (0.90).34
Thompson et al found that the risk of colorectal cancer increased from 6% to 12% when rectal bleeding is accompanied by a change in bowel habit.59 When additional information was gathered on peri-anal symptoms and they are absent, the risk increased further to 20%. When rectal bleeding was accompanied by peri-anal symptoms but not by a change in bowel habit, the risk of colorectal cancer decreased from 6% to 1%.59
Four studies evaluated the two week referral guideline in a two week referral clinic and two studies in secondary care (see appendix A for a description of the guideline). The formulation of the two week referral criteria differed (slightly) across studies. Selvachandran et al included only three of the six criteria.8 Sensitivity ranged from 0.80 for the abridged version8 to 0.9469; specificity ranged from 0.3065 to 0.56.69 For those meeting the guideline (that is, positive score on at least one of the six criteria) the risk varied from 8%8 to 25%67 with a median of 14%, while for patients who did not meet the guideline the risk varied from 1%69 to 4%65 with a median of 3% (table 2⇑, fig 6)⇓.
Diagnostic performance of blood tests
Eight studies reported on the diagnostic value of iron deficiency anaemia,27 29 45 46 55 65 68 71 and one primary care study34 on the diagnostic value of haemoglobin, erythrocyte sedimentation rate, and white cell count (table 10⇓). For (iron deficiency) anaemia sensitivity varies widely from 0.07 to 0.68, while specificity ranges from 0.83 to 0.95. In three of the eight studies the risk for colorectal cancer was significantly higher among those with a positive test result than among those with a negative test result (table 2, fig 7)⇓.27 45 65
Diagnostic performance of faecal occult blood test
Table 11⇓ gives details of the 15 studies that reported on the diagnostic performance of guaiac based faecal occult blood tests,11 12 31 34 35 36 37 38 43 47 48 53 56 57 58 three studies on do-it-yourself tests,48 56 57 eight studies on immunochemical based faecal occult blood tests,36 39 42 47 50 51 52 53 58 and one study on a combination of the occult blood tests.47 Few studies reported detailed information on diet restrictions before the test.
For guaiac based tests sensitivity ranged from 0.33 for the Coloscreen self test48 to 1.00 for a Haemoccult test,12 while specificity ranged from 0.72 for a Fecatwin test57 to 0.94 for the Coloscreen self test.48 Sensitivity of the self tests was low (range 0.33-0.57). For immunochemical based faecal occult blood tests sensitivity ranged from 0.70 for an iFOBT strip device50 to 1.00 for HemeSelect, Hemoblot, Insure, and faecal haemoglobin.36 39 53 Specificity ranged from 0.71 for a haemoglobin-albumin complex51 to 0.93 for an iFOBT strip device.50
The risk for colorectal cancer was significantly higher among patients with a positive test result than among those with a negative test result, with the exception of the study of Fijten et al34 (table 11⇑) that included solely patients with rectal bleeding. For guaiac based tests the median risk was 0.28 among those with a positive test result and 0.01 for those with a negative test result, while these numbers were 0.21 and 0.00, respectively for immunochemical based tests (table 2⇑, figs 8 and 9)⇓ ⇓.
Preplanned subgroup analyses
Because of lack of data in one or both subgroups several preplanned subgroup analyses could not be carried out. Table 12⇓ presents the results of the subgroup analyses between studies for which sufficient data were available. Comparing the subgroups’ (ranges of) values of sensitivity and specificity shows that none of the factors was clearly able to explain the tests’ heterogeneous results.
We looked at several subgroup analyses within studies. Sensitivity of the immunochemical based faecal occult blood tests was better than that of the guaiac based tests,36 53 58 and better for the regular guaiac based tests than the self tests (table 13)⇓.48 56 57 These findings are confirmed by the between study findings (table 2). Subgroup analyses within studies on Dukes’s types stages showed that immunochemical based tests were better than guaiac based tests in detecting Dukes’s A and B,53 58 and sensitivity seemed to be higher at all locations.58 This, however, was based on one or two studies with a small number of cases (table 14).⇓
The performance of tests in diagnosing colorectal cancer in adult patients with symptoms varied widely. Sensitivity was consistently high for age ≥50 (range 0.81-0.96, median 0.91) and for the two week referral guideline (range 0.80-0.94, median 0.92), but these lacked specificity (medians 0.36 and 0.42, respectively). These tests are suitable to rule out colorectal cancer at the cost of a high number of patients needing further diagnostic testing. Specificity was consistently high for family history (range 0.75-0.98, median 0.91), weight loss (range 0.72-0.96, median 0.89), and iron deficiency anaemia (0.83-0.95, median 0.92), but all tests lacked sensitivity (medians 0.16, 0.20 and 0.13, respectively). These tests are suitable to rule in colorectal cancer but at the cost of missing a considerable proportion of cases. Only the immunochemical based faecal occult blood tests had both a reasonable sensitivity (range 0.70-1.00, median 0.95) and specificity (range 0.71-0.93, median 0.84).
Diagnostic tests for colorectal cancer in primary care
This review focuses on the diagnostic performance of tests for patients who present with non-acute lower abdominal symptoms in primary care. We found that only a few studies were clearly carried out in primary care populations. We excluded screening studies, which would also include a large proportion of people without symptoms. Screening is useful if early stages of colorectal cancer can be detected, which have a favourable prognosis. In primary care, all colorectal cancer should be diagnosed, and preferably at an early stage. Therefore, it is useful to make a distinction between early stages (Dukes’s A/B)—that is, resectable colorectal cancer—and later stages (Dukes’s C/D). Some of the tests reflect symptoms of later stages, such as weight loss and iron deficiency anaemia, and will therefore not help to identify early stages of colorectal cancer.
When a patient presents to primary care with abdominal symptoms several differential diagnoses can be considered (such as colorectal cancer, irritable bowel syndrome, coeliac disease) and general practitioners should identify patients who should be referred for further diagnosis. Our review focused on colorectal cancer, yet to the clinician a positive test result (such as diarrhoea) leading to a diagnosis of inflammatory disease might be considered a true positive result.
Primary care settings differ between countries, and in only a few countries do general practitioners act as a gatekeeper to specialist clinical care. In other countries specialist care may be directly accessible. Therefore we also included two week referral clinics and secondary care populations with a low prevalence of colorectal cancer, which might reflect populations with a similar spectrum of disease as in primary care and a limited risk of investigation bias. Many studies, both in primary and secondary care settings, however, enrolled a selective population of patients by using the presence of a specific complaint as an inclusion criterion. For example, seven primary care studies investigating the diagnostic performance of signs and symptoms used rectal bleeding as an inclusion criterion. We presented the findings in such a way that differences between settings and populations can be easily identified.
Symptoms and signs
Of the typical symptoms of colorectal cancer, only weight loss had some diagnostic value with a fairly high specificity. This seemed to be translated in clear differences between the probability of colorectal cancer among patients with or without apparent weight loss (positive predictive value v 1−negative predictive value). Other symptoms, including presence of diarrhoea, constipation, change in bowel habit, or abdominal pain, showed poor diagnostic performance.
Studies showed a high degree of heterogeneity. This might be because studies used different definitions to classify self reported symptoms such as change in bowel habit. Furthermore, studies used different inclusion criteria, leading to an increased risk of selection bias in several studies. For example, in seven out of 10 primary care studies that reported on the diagnostic performance of signs and symptoms in symptomatic patients, rectal bleeding was used as inclusion criterion, thereby selecting a higher risk group. It is unlikely that the results of these studies are directly applicable to all primary care patients consulting their general practitioner with lower abdominal signs and symptoms.
Family history showed a high specificity combined with a low sensitivity. Its diagnostic value in primary care is limited, however, because only a small percentage of all cases have a family history. In the UK and other countries patients with a familial link are often referred for genetic assessment instead of immediate investigation with colonoscopy, which often results in a screening advice. The NICE guidelines for colorectal cancer state that there is insufficient evidence for the value of family history in symptomatic patients.73 The few studies in our review that presented information on family history showed heterogeneous results for diagnostic performance. To firmly establish the diagnostic performance of family history in symptomatic patients we need a clear definition for a “positive family history,” which describes the number, age, and degree of affected family members.
Combinations of symptoms and two week referral guidelines
Our results indicate that while the diagnostic performance of individual signs and symptoms is limited, combinations of symptoms improve the sensitivity at the cost of specificity as these symptoms are common in primary care. The two week referral guideline combines symptoms, resulting in a high sensitivity (range 0.80-0.94, median 0.92) and low specificity (0.30-0.56, 0.42).
In their review of the two week referral guideline Hamilton and Sharp6 conclude that rectal bleeding and change in bowel habit have a high predictive value for colorectal cancer, which is in contrast with the conclusion of the review of Ford et al.74 In our review only a few studies reported a significantly higher risk for colorectal cancer among patients reporting one of these symptoms compared with those without the symptom, indicating that the two week referral guideline might provide only limited diagnostic information. Heterogeneity in diagnostic value of a referral guideline could be due to the inclusion of different “tests.” Most favourable combinations of sensitivity and specificity were found for a prediction rule consisting of age, change in bowel habit, and blood in stools (sensitivity 1.0, specificity 0.9), but a study on the external validity of this prediction rule could not confirm these favourable results.72 Our review shows that 12% to 25% (median 14%) of the patients referred by the two week referral guideline were eventually diagnosed with colorectal cancer. Refining the current referral system could help to improve specificity.
We found a low sensitivity for blood tests (haemoglobin, erythrocyte sedimentation rate, white cell count) in detecting colorectal cancer. The median probability of cancer in patients with anaemia (positive predictive value) was only slightly higher than in patients with negative test results, indicating limited diagnostic performance of this test in clinical practice when used as a single test. This is in accordance with the NICE guidelines.73 Despite this, they might provide a useful adjunct to the general medical investigation, with conditions such as iron deficiency anaemia warranting further investigation.6
Faecal occult blood tests
We found relatively good results for diagnostic performance of the faecal occult blood tests, especially for the immunochemical based test, which showed high sensitivity and reasonable specificity in most studies. The probability of colorectal cancer is clearly higher in patients with positive rather than negative findings on the test. These favourable findings for the immunochemical based test contrast with the NICE guideline,73 which states that in patients with abdominal symptoms, the sensitivity, specificity, and positive predictive values of faecal occult blood tests are too low to make these tests helpful.
We did, however, find large heterogeneity in the results of studies on both guaiac based and immunochemical based tests. This might be because of different types of faecal occult blood tests being used in the primary studies. Furthermore, publications often lacked information on dietary restrictions, the definition of a positive test result (cut-off value, number of positive samples), and number of test failures. Not providing a dietary advice has been reported to affect the specificity of guaiac based tests,13 but our review could not confirm this. Overall, analyses both between and within studies showed better diagnostic performance of immunochemical based than guaiac based tests and that guaiac based self tests seemed to perform less well than the regular guaiac based tests.
Subgroup analyses within studies based on small numbers seemed to indicate that immunochemical based tests were more sensitive in detecting early stages of cancer than guaiac based tests.53 58 As early stages have far better prognoses this is an important finding. One of these studies also showed that immunochemical based tests were better than guaiac based tests at detecting colorectal cancer at all sites. 58 These results need confirmation in future, larger studies.
Strengths and weaknesses of our review
We extracted or reconstructed diagnostic data collected from symptomatic patients in primary and interface settings and excluded information from healthy (screening studies) or highly selected diseased controls, thereby preventing limited challenge bias.75
Furthermore, we studied a whole range of diagnostic tools that are available to general practitioners instead of focusing on only one or two tests. We adhered to the most recent guidelines for conducting a diagnostic review as described in the Cochrane Diagnostic Reviewers’ Handbook.21 We used an extensive search strategy, but by using a methodological filter we might have missed several relevant publications. By reference checking we tried to track down those publications that our search strategy might have failed to identify. Use of a language restriction during the selection phase led to the exclusion of only 0.7% of all citations.
There were quite a few discrepancies in the phase of abstract selection. The first reviewer used a highly sensitive approach and selected all abstracts that could in any way be relevant to the review, with the aim of not missing any relevant papers. The second reviewer subsequently considered all these pre-selected abstracts and excluded those that clearly did not meet the eligibility criteria. Anticipating poor agreement on some items of the QUADAS list,76 77 two reviewers independently assessed all papers for methodological quality and reached consensus by discussing disagreements on individual scores.
The studies of the various tests showed a high degree of clinical heterogeneity, which limited the possibilities for statistical pooling and strong conclusions on diagnostic performance. Reasons for heterogeneity include different definitions of signs and symptoms, variation in executions of tests (such as faecal occult blood tests), and selection of populations based on particular symptoms or complaints.
In subgroup analyses we took into account the generally poor reporting of diagnostic accuracy78 by excluding studies providing insufficient information on the characteristic under study. Finally, we extensively explored many potential sources of heterogeneity, including the adequacy of the reference standard. Because of the small number of studies in the subgroups, we could not use multivariable meta-regression analysis, making it difficult to disentangle the contribution of each source of heterogeneity.
Diagnostic tests as first line investigation in primary care need to be valid, easy to perform, well tolerated by patients, and sensitive, especially in case of serious disease. Our systematic review shows that immunochemical based faecal occult blood tests might prove to be such tests. Evidence is lacking, however, for the diagnostic performance of these tests in primary care populations. We therefore urgently need high quality diagnostic cohort studies enrolling consecutive patients presenting with non-acute abdominal symptoms in primary care. Symptom combinations or two week referral guidelines potentially have diagnostic value, but the performance of the guideline could be improved by standardisation, clear definitions, and the addition of important characteristics of those diagnosed with colorectal cancer but not fulfilling the current guideline.
In future research, cancer location and stage of disease should be an important factor in the analysis, especially as tests that are able to diagnose early stages of colorectal cancer are important tools to reduce the burden of cancer.
What is already known on this topic
To improve the prognosis of colorectal cancer the diagnosis should be made at an early stage
An important task for the primary care physician is to identify the patients with an increased risk for colorectal cancer among all those consulting for abdominal symptoms
What this study adds
The most promising primary care tests in terms of diagnostic performance are combinations of symptoms and faecal occult blood tests, especially immunochemical based tests
Cite this as: BMJ 2010;340:c1269
Contributors: DvdW secured funding. PJ and DvdW selected studies for inclusion. PJ, HdV, DvdW, and DB extracted data and assessed quality. PJ carried out the statistical analyses and wrote the original draft. The other authors (DvdW, DB, CDM, SvW, CJM, HdV) revised the draft critically for important intellectual content and approved the final version of the paper. PJ and DvdW are guarantors.
Funding: The study was supported by a grant from the Netherlands Organisation for Health Research and Development (ZonMw), The Hague, Netherlands (No 945-06-001). ZonMw had no involvement in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
Competing interests: None declared.
Ethical approval: Not required.
Data sharing: The full search strategies can be obtained from the corresponding author () on request.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.