- Jean V Craig, research associatea,
- Gillian A Lancaster, lecturer in medical statisticsb,
- Paula R Williamson, lecturer in medical statisticsb,
- Rosalind L Smyth, professor of paediatric medicine ()a
- a Institute of Child Health, Alder Hey Children's Hospital, Liverpool L12 2AP
- b Department of Mathematical Sciences, University of Liverpool, Liverpool L69 3BX
- Correspondence to: R L Smyth
- Accepted 4 April 2000
Objective: To evaluate the agreement between temperature measured at the axilla and rectum in children and young people
Design: A systematic review of studies comparing temperature measured at the axilla (test site) with temperature measured at the rectum (reference site) using the same type of measuring device at both sites in each patient. Devices were mercury or electronic thermometers or indwelling thermocouple probes.
Studies reviewed: 40 studies including 5528 children and young people from birth to 18 years.
Data extraction: Difference in temperature readings at the axilla and rectum.
Results: 20 studies (n=3201 (58%) participants) had sufficient data to be included in a meta-analysis. There was significant residual heterogeneity in both mean differences and sample standard deviations within the groups using different devices and within age groups. The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for mercury thermometers was 0.25°C (95% limits of agreement −0.15°C to 0.65°C) and for electronic thermometers was 0.85°C (−0.19°C to 1.90°C). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for neonates was 0.17°C (−0.15°C to 0.50°C) and for older children and young people was 0.92°C (−0.15°C to 1.98°C).
Conclusions: The difference between temperature readings at the axilla and rectum using either mercury or electronic thermometers showed wide variation across studies. This has implications for clinical situations where temperature needs to be measured with precision.
The presence of fever in children and young people affects the decisions of parents and clinicians. Parents may take vigorous steps to lower their child's temperature and will commonly seek medical advice,1 and clinicians may carry out investigations and interventions, including antipyretics, physical cooling measures, antibiotics, and admission to hospital.2 Measuring temperature in children can be difficult, especially when they are uncooperative or restless. Measurement of rectal temperature is frequently preferred over other ways of taking temperature but may not be acceptable to children and parents.2 The axilla is a safe and accessible site but concerns have been raised about its accuracy.3 4 We therefore systematically reviewed the agreement between temperature measured at the axilla and temperature measured at the rectum.
Studies were identified by a single reviewer (JVC) through electronic searches (see website) of Medline 1966 to October 1999, CINAHL 1982 to August 1999, the British Nursing Index June 1999, the Cochrane Library(issue 3, 1999), and the journals database of the Royal College of Nursing 1985-99. The National Research Register (issue 2, 1999) was searched for any unpublished studies, and conference abstracts were accessed through the BIDS index to Scientific and Technological Proceedings (1982-99). Authors of studies and suppliers of clinical thermometers were asked to provide details of other studies.
Two reviewers (JVC and Catherine Lees) independently judged the studies for eligibility according to predetermined criteria. We included: method comparison studies where temperature measured at the axilla (test site) was compared with temperature measured at the rectum (reference site) in the same individual; studies of children and adolescents from birth to 18 years; and studies using mercury or electronic thermometers or thermocouple probes.
We excluded children with hypothermia (rectal temperature less than 35.0°C), preterm infants (less than 37 weeks' gestational age), studies using different types of devices at the two sites, and studies where the rectal mercury thermometer was read before three minutes had elapsed (some authors were contacted to clarify placement times).5 6
Data extraction and quality assessment
Two reviewers (JVC and Catherine Lees) independently assessed studies for methodological quality. As there is no validated scoring system for assessing the methodological quality of method comparison studies, we modified a previously published checklist that had been developed for evaluating studies of diagnostic tests (see box).7 There was initial disagreement on occasions. This was resolved by discussion. Two reviewers (JVC and GAL) independently extracted data. When the outcome data were not provided, we asked the authors for the mean difference and standard deviation of the difference between the temperature measured at the axilla and rectum or, where this could not be provided, for the anonymised raw data. Where outcome data were missing, but the mean and standard deviation of the measurements were reported for the two sites separately with a correlation coefficient, we calculated the mean and standard deviation of the differences from these data. Correlation coefficients were not reported in several studies so we estimated these from similar studies.
Criteria and rationale for assessing methodological quality of method comparison studies7 *
Were thermometers calibrated?†
Was the placement time of the thermometer given?†
Mercury thermometers read before stabilisation underestimate body temperature
Were all tests carried out concurrently or immediately sequentially?†
Where there is a delay between the two readings, any difference in the results could potentially be attributed to a change in actual body temperature
Were the test and reference standard measured independently (blind) of each other?
Was the second reading taken before any interventions were given?
Avoids treatment paradox
Were both tests carried out in all children regardless of the first reading?
Avoids verification bias
*Criteria were graded as yes, no, or not stated.
† Additional criteria specific to temperature measurement.
We calculated the upper and lower 95% limits of agreement for each study.8 Where the standard deviation of the differences was estimated with a correlation coefficient from a similar study, we performed a sensitivity analysis including and excluding these studies. In a meta-analysis of randomised controlled trials, a pooled estimate of the relative treatment effect is of interest. For method comparison studies, systematic error (bias) and random error (limits of agreement) are of interest. To obtain a pooled estimate of bias, we used the usual Mantel-Haenszel weighted approach to combine individual study estimates of the mean difference. To obtain pooled estimates of the limits of agreement, we first obtained a pooled estimate of the standard deviation of individual differences and then combined this with the pooled estimate of the mean difference. We hypothesised a priori that type of thermometer, duration of placement time at the axilla for mercury thermometers, and age may be sources of heterogeneity, and we performed subgroup analyses based on these characteristics. Homogeneity of mean differences and standard deviations of differences across studies were evaluated with the standard large sample test.9 In the presence of significant residual heterogeneity, we calculated pooled estimates of the mean difference and the standard deviation of the individual differences using a random effects approach.9 From the combination of these estimates it was possible to calculate pooled estimates of the limits of agreement using a random effects approach. The techniques are described elsewhere (P R Williamson, personal communication).
Description of studies and methodological quality
Overall, 37 papers (34 in English) containing 40 method comparison studies including 5528 children and young people were suitable for inclusion. Disagreement about study inclusion on six occasions was resolved through discussion. Three studies were reported in two publications.10–15 Three publications were each considered to contain two studies because either two different target populations were included and the results for each reported separately 16 17 or two different measuring devices were studied in the same children.18 The table gives a description of the studies and dimensions of methodological quality. Disagreement between reviewers on the details of seven studies was resolved by discussion.
Outcome data were available from the article or author or were calculated for 16 studies (2870 (52%) participants). We estimated the standard deviation of the differences in temperature measurements for four studies (331 (6%)) (table). The analysis and conclusions with and without the data from these studies were similar and are included in the results.
Mean axillary temperature was always lower than mean rectal temperature. Significant heterogeneity was found between mean differences within device groups (mercury thermometer:χ2=1305, df=9, P<0.0001; electronic thermometer:χ2=959, df=9, P<0.0001). Significant heterogeneity was found between standard deviations within device groups (mercury: χ2=943, df=9, P<0.0001;electronic:χ2=519, df=9, P<0.0001). The pooled (random effects)mean temperature difference (rectal minus axillary temperature) for mercury thermometers was 0.25°C (95% limits of agreement −0.15°C to 0.65°C) and for electronic thermometers was 0.85°C (−0.19°C to 1.90°C) (fig 1). Studies with mercury thermometers were ordered according to placement time at the axilla (longest to shortest time), and there was a tendency towards improved accuracy as placement time increased.
We grouped neonates separately from other children (fig 2). Significant heterogeneity was found between mean differences within the groups (neonates:χ2=269, df=9, P<0.0001; older children and young people:χ2=548, df=9, P<0.0001). Significant heterogeneity was found between standard deviations within age groups (neonates: χ2=111, df=9, P<0.0001; older children and young people: χ2=169, df=9, P<0.0001). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for neonates was 0.17°C (−0.15°C to 0.50°C) and for older children and young people was 0.92 (−0.15 to 1.98).
Of the 20 eligible studies with insufficient data (see table A on website), nine studied neonates (mercury thermometer (four studies), electronic thermometer (four), indwelling thermocouple probe (one)), and 11 studied older children and young people (mercury thermometer (three), electronic thermometer (five), and indwelling thermocouple probes (three)).
We found large mean differences and wide limits of agreement between temperatures measured at the axilla and those measured at the rectum. Determining febrile status is an important part of the assessment of children and young people who are unwell. Accurate measurement of temperature is required in certain clinical situations or patient groups. In neutropenic patients the decision to commence antibiotics may be made on the basis of an accurate measurement of temperature.19 In neonates accurate measurement of temperature is important for ensuring a thermoneutral state.20 It is believed that rectal temperature can be estimated by adding 1°C to the temperature measured at the axilla. The wide range in the mean differences we have detected suggests that this is not the case.
In general, limits of agreement were narrower when mercury thermometers were used, placement time of mercury thermometers was longer, and measurements were made in neonates. Further investigation by age was not possible because many studies reported only the age range. Electronic thermometers were used in only two studies of neonates. One showed narrow limits of agreement.21 The other, with wide limits of agreement, was the only study published before the 1980s, and a different device, the telethermometer, was used.18 Electronic thermometers were used in eight of the 10 studies of older children and young people. This may have confounded the comparison of mercury with electronic thermometers. In neonates, although agreement is better with longer placement times, this may be difficult to achieve. Young children may be less compliant when placement time is prolonged, which may affect accuracy.
Although we used a sensitive search strategy to identify studies, we may not have identified relevant unpublished evidence. We cannot comment on the impact this may have had on our results because of lack of empirical evidence on publication bias for method comparison studies (P R Williamson, personal communication).
The design of most studies was limited to one measurement per site per participant. Lack of agreement may be caused by poor repeatability at either site. We were not able to look at within site variability to see how much it differed from between site variability as data on results for repeated measurements were not reported and no individual patient data were available. Six of the 20 studies gave the number of febrile children by their own definition (table), but no studies presented data separately to enable analysis of febrile children only. We did not find any evidence that systematic and random error varied by level of temperature.
Methods used in primary studies
Our results may have been influenced by methodological shortfalls in the primary studies. Verification bias was difficult to assess as selection of participants was not always clearly described. All studies seemed to take either convenience or random samples of children from a variety of settings. Seven studies gave specific exclusion criteria, based on clinical conditions. The rest gave no exclusion criteria. We defined verification bias to be the selecting out of participants on the basis of a temperature measurement. This was not evident in any study. There was no evidence of any effect of the quality criteria (see box) when results were subgrouped and factors examined univariately, but the number of studies in each subgroup was small.
Independent measurement of the reference standard and test was not attempted in any study.22 Blinding is likely to be an important methodological issue, especially when placement time is determined by the operator. This may occur when mercury thermometers are used or when electronic thermometers are used in monitor mode rather than predictive mode. In some sequential studies and in those where concurrent measurements were carried out, a different device (of the same type) was used at each site. Calibration is therefore important, even when new thermometers are used.23 Ten studies did not provide details of thermometer calibration before data collection.
When a thermometer is read before stabilisation, temperature is underestimated,24 which may be another problem where placement time is at the discretion of the operator. Six out of 10 studies with mercury thermometers gave details about stabilisation. Mode or placement time was reported in two out of 10 studies with electronic thermometers. In a further two studies the thermometer was read when it beeped, and it is likely that predictive mode was used. Seven studies did not report the depth of placement of the rectal thermometer. In sequential studies the time lapse between the two readings was not always reported. The longer the delay between readings, the more likely there is a change in body temperature, which will affect the second reading.
We recommend that in future studies temperatures should be measured independently at each site in a consecutive series of eligible individuals. All thermometers should be calibrated. Details should be provided about placement time and depth (if appropriate), steps should be taken to ensure stabilisation, and the mode used in electronic thermometers should be stated. Temperature readings should be carried out concurrently or immediately sequentially and the time between measurements clearly documented. The minimum analysis that should be carried out is the Bland and Altman method8 giving plots and 95% limits of agreement. Studies involving replicated or repeated measurements should take this into account in the analysis.
We have shown that in children and young people the agreement between temperature measured at the axilla and temperature measured at the rectum is relatively low. This may prevent low grade fever from being detected and has important implications when body temperature needs to be measured with precision. Further research is needed to establish whether sufficient accuracy can be achieved by measuring temperature at the axilla in neonates. We identified several methodological weaknesses in the included studies, which may have affected the results.
What is already known on this topic
Numerous studies of methods for measuring temperature in children and young people have been carried out
Although the methods and results of the studies vary, there are concerns about the agreement between temperature measured at the axilla and temperature measured at the rectum
What this study adds
In children and young people temperature measured at the axilla does not agree sufficiently with temperature measured at the rectum to be relied on in clinical situations where accurate measurement is important
Variability in results was related to the age of the child and duration of placement time of the measuring device
Research is needed to identify whether sufficient accuracy can be achieved for measurement of temperature at the axilla in neonates
Future studies of temperature measurement in children should be more methodologically rigorous
We thank the authors who provided us with data from their studies and the reviewers for their helpful comments.
Contributors: JVC wrote the protocol, participated in the review process, and drafted and revised the paper; she will act as guarantor for the paper. GAL and PRW assisted in the design of the study, the meta-analysis, and revising the final paper. GAL participated in the data extraction and data checking. Catherine Lees assisted in assessment of study inclusion and study quality. RLS conceived the idea, helped design the study, and assisted in drafting and revising the final paper. All authors commented on drafts of the paper.
Funding JVC is supported by a grant from the Royal Liverpool Children's NHS Trust Endowment Funds.
Competing interests None declared.
Search terms, references, and eligible studies with missing or inappropriate data appear on the BMJ's website