Reliability of ultrasonography in identification of reflux nephropathy in childrenBMJ 1994; 309 doi: http://dx.doi.org/10.1136/bmj.309.6949.235 (Published 23 July 1994) Cite this as: BMJ 1994;309:235
- E Stokland,
- M Hellstrom,
- S Hansson,
- O Jodal,
- A Oden,
- B Jacobsson
- Department of Paediatric Radiology, East Hospital, S-416 85 Gothenburg
- Sweden Department of Radiology, Sahlgrenska Hospital, Gothenburg
- Sweden Department of Paediatrics, East Hospital, Gothenburg, Sweden
- Correspondence to: Dr Stokland.
- Accepted 27 April 1994
Objective: To assess the ability of ultrasonography to identify reflux nephropathy in children after urinary tract infection.
Design: Ten experienced radiologists performed a total of 240 ultrasonographic examinations of kidneys in a one day study. The examiners were20unaware of the results of previous radiological and clinical examinations and of the proportions of normal and abnormal kidneys. Urography was used as method of reference, supported by static renal scintigraphy (dimercaptosuccinic acid labelled with technetium-99m) in half of the cases.
Setting: Outpatient radiology department.
Subjects: 25 children aged 2-16 years (20 kidneys with and 30 kidneys without renal scarring).
Main outcome measures: Renal scarring. Overall size and length of kidneys. Sensitivity and specificity including receiver operator characteristics and variation between observers. Results - With renal scarring as the diagnostic20criterion and including cases classified as abnormal, probably abnormal, and uncertain the sensitivity of ultrasonography was 54% (specificity 80%). Addition of reduced renal size as a diagnostic criterion increased the sensitivity to 64% (specificity 79%). There were, however, wide variations between observers, with sensitivity ranging between 40% and 90% (specificity 94% to 65%).
Conclusions: Because of its low sensitivity and specificity and poor agreement between observers, ultrasonography cannot be generally recommended for the detection of reflux nephropathy after urinary tract infection in children.
A tenth to a fifth of children with febrile urinary tract infection develop reflux nephropathy
As reflux nephropathy may lead to increased blood pressure, complications during pregnancy, and sometimes chronic renal failure the patients should have follow up for decades
Ultrasonography is not reliable in detecting reflux nephropathy and should not be used as the only imaging technique in children with urinary tract infection
Urinary tract infection is one of the most common bacterial infections in children. At 7 years of age 135 (7.9%) of 1719 girls and 31 (1.7%) of 1834 boys had had symptomatic urinary tract infection, verified by bacterial culture.1 As a consequence of renal infection in childhood 10-20% of children develop scarring or reflux nephropathy,2 which is the term often used for the permanent renal damage associated with infection.
Once renal scarring has developed and is recognised several diagnostic and preventive measures need to be instituted. As scarring is commonly associated with reflux additional radiological studies are indicated to detect reflux, which may require an operation or treatment with long term prophylactic antibiotics. Scarring also indicates follow up renal imaging studies to detect progression. Recurrent attacks of pyelonephritis require early treatment to avoid progressive or new renal scarring. Furthermore, patients with scarring will need long term follow up of blood pressure and renal function3 as well as increased attention during pregnancy to detect toxaemia.
There are various patterns of reflux nephropathy, including classical focal scars and generalised decrease of renal size or growth retardation.4 For the detection of reflux nephropathy urography has traditionally been used. Recently static renal scintigraphy (dimercapto- succinic acid labelled with technetium-99m) has also been used to identify changes in acute pyelonephritis5,6 and permanent renal scarring.*RF 7-9*
Ultrasonography is commonly used in the primary investigation of children with urinary tract infection because of its ability to detect major malformations and dilatation of the urinary tract*RF 10-12* and because of its widespread availability, relatively low cost, and absence of side effects. There is, however, disagreement about its usefulness in detecting reflux nephropathy. Some authors consider ultrasonography sufficient,*RF 13-15* and a recently published textbook states that ultrasonography can be used to recognise easily the patterns of reflux nephropathy.16 Other authors find it necessary to add urography or renal scintigraphy.9,17,18
Ultrasonography differs from other radiological techniques in that interpretation is done “live” - that is, the diagnosis is based on the examiner's impressions on the monitor while the patient is examined. Although film documentation is usually done by the examiner, this is of limited or no diagnostic value to others, and second opinions on ultrasonography films are of little help in most cases. The outcome is thus strongly related to the skill of the examiner, which must be considered in the evaluation of the efficacy of ultrasonography in a clinical test.
Although the shortcomings of ultrasonography in the detection of reflux nephropathy may be known to paediatric radiologists, it is still commonly used for this purpose in the follow up of children with urinary tract infection. A contributing factor to this may be the absence of ionising radiation and invasiveness, making ultrasonography attractive to children and their parents as well as the doctors.
Our aim was to assess the ability of ultrasonography to detect and exclude reflux nephropathy after urinary tract infection in children.
Subjects and methods Patients
Twenty five children (median (range) age 6.5 (2-16) years) who had recently been examined with urography because of urinary tract infection at the department of paediatric radiology were selected. Fifteen of the 25 children had renal scarring, which was unilateral in 10 and bilateral in five. Twenty kidneys with scarring and 30 without scarring were therefore studied.
The interval between urography or scintigraphy and the day of the ultrasonography study was less than one year in 20 of the 25 children (median 0.5 years, range 1 day-2.2 years).
Urography was considered the method of reference and was performed in all cases. Three radiologists well experienced in paediatric uroradiology jointly selected the urograms and classified the kidneys as scarred or normal. To avoid cases with equivocal findings in the reference method (urography), only cases with clearly normal or clearly abnormal findings were included. Thus, all cases classified as abnormal had caliceal deformation and parenchymal reduction.19 Of the 20 scarred kidneys, 16 had severe parenchymal reduction (more than 4 SD below the mean of normal), two moderate reduction (3-4 SD below the mean of normal), and two mild reduction (less than 3 SD below the mean of normal).
Renal scintigraphy (dimercaptosuccinic acid labelled with 99mTc) was performed in 13 of the 25 patients (seven with normal kidneys, four with unilateral scarring, and two with bilateral scarring). The findings on scintigraphy and urography agreed in all cases.
Ultrasonography was done by 10 radiologists from five university and five county hospitals. They represented a selection of the most experienced examiners in the country and all had a special interest in paediatric renal ultrasonography. To achieve similar and optimal conditions for each examiner and all examinations the study was performed on one day. Ten separate ultrasonography rooms were set up, equipped in accordance with the preference of each examiner. Six examiners used Acuson (128 XP) and four used Toshiba (SSA 270 A) ultrasonography machines with 3-7.5 MHz sector, vector, linear, or convex transducers. The examiners were unaware of the results of previous radiological and clinical examinations and about the proportions of scarred and normal kidneys. The 25 children were each scheduled for five examinations, and 20 minutes were allowed for each examination. The 10 examiners investigated between 10 and 15 children each (mean 12). Twenty three children were examined five times, one child three times, and one child twice. Thus, a total of 240 kidney examinations were performed. Patient cooperation was classified as satisfactory in 218 kidney examinations and acceptable in the remaining 22. None of the patients was unacceptably uncooperative. Each kidney was categorised as scarred or unscarred based on a graded scale: normal, probably normal, uncertain, probably abnormal, abnormal. To facilitate the ultrasonographic detection of reflux nephropathy overall renal size including renal length was used as an additional diagnostic criterion. The overall size of the kidney was estimated according to the standards of each individual examiner. The examiners were also asked to measure the length of each kidney.
The study was approved by the ethics committee of Gothenburg University.
Wilcoxon's test for two samples was used for comparisons with respect to age distribution. Detection rates were compared by use of Fisher's exact test. To assess whether the variation between observers of the determinations of kidney length depended on the actual length, Pitman's test20 was applied on the mean length (x) and the standard deviation (y) of each kidney.
Receiver operating curves21 were used to elucidate the relation between sensitivity and specificity. Two sided tests were used and P values less than 0.05 were considered significant.
The 240 kidney examinations comprised 97 scarred and 143 normal kidneys according to urography. When data from all examiners were pooled and urography was the method of reference for scarring only 25 of 97 kidneys were correctly classified as abnormal at ultrasonography (sensitivity 26%) (table I). When cases classified as probably abnormal and uncertain were also included 52 of 97 examinations were correctly classified (sensitivity 54%). The corresponding specificity was 80% when normal and probably normal were used to define normality (table I). When scarring or reduced renal size, or both, were used as diagnostic criteria, 32 of the 97 examinations were correctly classified as abnormal (sensitivity 33%) (table II). When cases classified as probably abnormal and uncertain were included 62 of the 97 examinations were correctly classified (sensitivity 64%). The corresponding specificity was 79% when normal and probably normal were used to define normality (table II).
The receiver operating curves in figure 1 illustrate the average sensitivity and specificity in detecting reflux nephropathy. The curves represent focal scarring alone or focal scarring or reduced renal size, or both, as diagnostic criteria, respectively. When the combined criteria are used sensitivity improves slightly with some decrease in the corresponding specificity.
The sensitivity and false positive rate were also calculated for each individual examiner (fig 2) by using abnormal, probably abnormal, and uncertain to define abnormality. There were considerable differences between the examiners, sensitivities ranging between 30% and 80% and false positive rates between 6% and 33% for renal scarring. By using scarring or reduced renal size, or both, as diagnostic criteria sensitivites improved for five of the 10 examiners (total range of sensitivities 40- 90%) whereas the false positive rates remained virtually unchanged (6- 35%). Half of the examiners reached a sensitivity of 78-90%, and the remaining examiners had a sensitivity of 40-60% when the combined criteria of scarring or reduced renal size, or both, were used (fig 2). Tables III and IV show the distribution of agreement between the observers related to the reference method (urography).
Figure 3 shows the interobserver variation of the measurements of renal length at ultrasonography according to the method recommended by Bland and Altman.22 The mean of the observers' measurements was considered to represent the actual length of each kidney. Ninety two per cent of the measurements were within 5 mm of the mean. The interobserver variation of the measurements of renal length at ultrasonography was not related to the length of the kidney (Pitman's test).
Ultrasonography is undisputed as the initial imaging method for screening of children at the time of their first urinary tract infection because of its ability to detect major anomalies and dilatation of the urinary tract. Opinions differ, however, about the usefulness of this technique in detecting reflux nephropathy.9,13,15,16,18
When evaluating the efficacy of an imaging method several methodological prerequisites should ideally be fulfilled - for example, definition of a gold standard (reference method), blinding of examiners, and use of optimal equipment. The frequent lack of fulfilment of such requirements has been emphasised.*RF 23-25*
Our study was designed to try to obtain an unbiased evaluation of ultrasonography compared with urography, which was used as the reference. The ultrasonographers were blinded regarding patient history, previous imaging results, and proportions of scarred and normal kidneys. Care was taken to optimise the study conditions by selecting experienced examiners, by having the examiners choose their own equipment, and by allowing ample time for each investigation. Children under the age of 2, in whom identification of reflux nephropathy may be very difficult, were not included in the study. In no case was patient cooperation considered unsatisfactory.
Despite these idealised conditions the sensitivity and specificity of ultrasonography in detecting reflux nephropathy was unsatisfactory. It could be argued that the high false positive rate (low specificity) was due to the better ability of ultrasonography to detect true reflux nephropathy in kidneys classified as normal at urography. This seems unlikely for several reasons. Firstly, the findings were not consistent among the examiners. Secondly, reflux nephropathy could not be confirmed by static renal scintigraphy in any of the cases in which this investigation was performed. Theoretically, new scarring in previously unscarred kidneys or progression of existing scarring could have occurred in the interval between urography and ultrasonography, as scarring may take some time to develop or become maximally evident. As we demanded a long time interval between the last pyelonephritis and the radiological studies and as the patients were free of infection in the interval between urography and ultrasonography this possibility seems remote. Also, the kidneys classified as abnormal at urography could not have become normal as the changes represent scarring, which by definition is irreversible.
Potential sources of bias
How good then was urography as a gold standard? Originally, the relevance of urography for detection of scarring was shown by Hodson and colleagues, who showed the correlation between scarring at urography and histology in animal studies.26 We used the classic, subjective, but well established criteria for renal scarring, caliceal deformity and parenchymal reduction.4 The additional use of detailed parenchymal measurements, related to a reference material,19 improves objectivity and adds to the relevance of urography as a reference method. Finally, the urographic findings were supported by renal scintigraphy in all cases in which this study was performed. To avoid cases with equivocal findings using the reference standard only cases with clearly normal or clearly abnormal findings were included. Thus, all cases classified as abnormal had caliceal changes together with parenchymal reduction, which in nearly all cases was severe (more than 4 SD below the mean of normal). Considering these factors the poor ability of ultrasonography to detect reflux nephropathy is discouraging.
The need to assess variation between observers in the evaluation of diagnostic procedures has been emphasised already by Garland.27 This is of special importance in the evaluation of ultrasonography, when the outcome is solely in the hands of the examiner and second opinions based on filmed examinations are of little or no value. The range of variation between examiners in the present study was wide. This may indicate a potential for improvement. It also reflects the limitations in clinical practice, however, as most patients are probably investigated by examiners less skilled than those in our study.
In contrast with the large variation among the examiners in detecting renal scarring, variation between observers for measurements of renal length at ultrasonography was fairly small. Measuring renal length, however, contributed only marginally to the identification of patients with reflux nephropathy. This is in agreement with previous findings, indicating that renal length is a poor indicator of reflux nephropathy as it often remains within normal limits despite considerable renal scarring.28
The low sensitivity and specificity and poor agreement between observers mean that ultrasonography is not accurate enough to identify kidneys with reflux nephropathy - that is, the children who are at risk of future complications after urinary tract infection.
This investigation was supported by grants from the Gothenburg Medical Society; the Swedish Medical Research Council; IngaBritt and Arne Lundberg Research Foundation; First of May Flower Annual Campaign; the Medical Faculty at Gothenburg University; Nycomed Company, Oslo; and the National Kidney Patient Association.
We thank the participating radiologists for their willingness to take part in the study; the Acuson and Toshiba companies for supporting us with equipment and technical assistance; and the departments of clinical physiology, paediatrics, and radiology at the East Hospital, Gothenburg Medical Centre, and the Wallenberg Laboratory at Sahlgrenska Hospital, Gothenburg for their support of this study.