CCBYNC Open access

Derivation and validation of a clinical prediction rule for uncomplicated ureteral stone—the STONE score: retrospective and prospective observational cohort studies

BMJ 2014; 348 doi: (Published 26 March 2014) Cite this as: BMJ 2014;348:g2191
  1. Christopher L Moore, associate professor1,
  2. Scott Bomann, emergency physician1,
  3. Brock Daniels, emergency physician2,
  4. Seth Luty, research assistant3,
  5. Annette Molinaro, associate professor1,
  6. Dinesh Singh, assistant professor4,
  7. Cary P Gross, professor56
  1. 1Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, USA
  2. 2Wellington Hospital, Wellington, New Zealand
  3. 3Yale New Haven Hospital, New Haven, CT, USA
  4. 4Departments of Neurosurgery and Epidemiology and Biostatistics, University of California San Francisco Medical School, San Francisco, CA, USA
  5. 5Department of Urology, Yale University School of Medicine, New Haven, CT, USA
  6. 6Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
  1. Correspondence to: C L Moore chris.moore{at}
  • Accepted 7 March 2014


Objective To derive and validate an objective clinical prediction rule for the presence of uncomplicated ureteral stones in patients eligible for computed tomography (CT). We hypothesized that patients with a high probability of ureteral stones would have a low probability of acutely important alternative findings.

Design Retrospective observational derivation cohort; prospective observational validation cohort.

Setting Urban tertiary care emergency department and suburban freestanding community emergency department.

Participants Adults undergoing non-contrast CT for suspected uncomplicated kidney stone. The derivation cohort comprised a random selection of patients undergoing CT between April 2005 and November 2010 (1040 patients); the validation cohort included consecutive prospectively enrolled patients from May 2011 to January 2013 (491 patients).

Main outcome measures In the derivation phase a priori factors potentially related to symptomatic ureteral stone were derived from the medical record blinded to the dictated CT report, which was separately categorized by diagnosis. Multivariate logistic regression was used to determine the top five factors associated with ureteral stone and these were assigned integer points to create a scoring system that was stratified into low, moderate, and high probability of ureteral stone. In the prospective phase this score was observationally derived blinded to CT results and compared with the prevalence of ureteral stone and important alternative causes of symptoms.

Results The derivation sample included 1040 records, with five factors found to be most predictive of ureteral stone: male sex, short duration of pain, non-black race, presence of nausea or vomiting, and microscopic hematuria, yielding a score of 0-13 (the STONE score). Prospective validation was performed on 491 participants. In the derivation and validation cohorts ureteral stone was present in, respectively, 8.3% and 9.2% of the low probability (score 0-5) group, 51.6% and 51.3% of the moderate probability (score 6-9) group, and 89.6% and 88.6% of the high probability (score 10-13) group. In the high score group, acutely important alternative findings were present in 0.3% of the derivation cohort and 1.6% of the validation cohort.

Conclusions The STONE score reliably predicts the presence of uncomplicated ureteral stone and lower likelihood of acutely important alternative findings. Incorporation in future investigations may help to limit exposure to radiation and over-utilization of imaging.

Trial NCT01352676.


Kidney stones are estimated to occur at some point in nearly 1 in 11 people in the United States, with flank or kidney pain resulting in over two million annual visits to the emergency department.1 2 Computed tomography (CT) has been described as the “best imaging study to confirm the diagnosis of a urinary stone” and is now the first line imaging study for suspected kidney stone in the United States.3 4 5 Though accurate, CT is costly, involves the use of ionizing radiation, and does not seem to have impacted patient centered outcomes, such as rates of diagnosis or hospital admission, in those with suspected kidney stones.3 6 7

Many patients with flank pain will not benefit from a CT scan, as most kidney stones will pass spontaneously. Moreover, it is unlikely that a CT scan in the setting of flank pain will detect acutely important alternative findings in patients without signs of infection.8 Hence an objective clinical prediction rule for renal colic that could reliably identify patients highly likely to have a ureteral stone (and thus unlikely to have an important alternative diagnosis) may allow patients to be safely managed without imaging, or imaged with other approaches such as ultrasonography or reduced dose CT.

We derived and validated a clinical prediction score for ureteral stones that cause symptoms, identifying patients with either a very high or a very low probability of having an uncomplicated ureteral stone. We hypothesized that patients who are highly likely to have a kidney stone are unlikely to harbor an important alternative diagnosis, and may be appropriate for imaging choices other than standard dose CT.


Study design and setting

We performed a retrospective derivation and prospective validation of a clinical scoring system for ureteral stones that cause symptoms in two separate emergency departments with the same medical record systems.9 10 11 The Yale New Haven Hospital emergency department is an urban, tertiary care teaching hospital and trauma center that sees over 80 000 adults annually. The Shoreline Medical Center emergency department is a freestanding eight bed suburban facility without residents, which sees approximately 20 000 adults and children annually. At the time of this study both sites utilized a templated, handwritten, scanned emergency department patient care record (Lynx Medical Systems, Bellevue, WA), with laboratory and dictated radiology reports on Sunrise Clinical Manage (Eclipsys, Atlanta, GA). The human investigation committee of the Yale institutional review board approved the derivation (retrospective) phase with a waiver of informed consent, and the validation (prospective) phase involved written informed consent from all patients.

Derivation phase

We electronically retrieved the dictated reports of all patients receiving a CT “flank pain protocol” (the name given at both sites to a non-contrast enhanced CT protocol for suspected kidney stone) at either of the two emergency department sites between April 2005 and November 2010. Patients were eligible if the CT was performed in the emergency department and they were 18 years of age or older at the time of imaging. From an original set of over 5000 computed tomograms, we selected approximately one third of records (estimated to yield about 1000 records that met the inclusion criteria) for full record review using a random number spreadsheet function (Microsoft Excel, Redmond, WA). Exclusion criteria were lack of any flank or back pain, history of trauma, evidence of infection (subjective or objective fever or presence of leukocytes on urine dipstick analysis), known active malignancy, known renal disease (including creatinine >1.5 mg/dL or 133 μmol/L), or previous urologic procedure (including lithotripsy or ureteral stent).8

Power calculation—derivation and validation sets

Our selection of about 1000 records was based on pilot data and earlier studies indicating that about 50% of patients undergoing CT would have a ureteral stone, and about 20% of these would undergo intervention for ureteral stone (or 10% of overall population, about 100 patients). As a general rule, when using logistic regression, each independent element of a clinical prediction requires approximately 10 events.12 This would have allowed us to incorporate a maximum of 10 elements in a rule to predict the need for intervention as well as being sufficiently powered to derive a rule for the more common outcome (any ureteral stone).

For the validation set we set minimally acceptable values for the classification probabilities of false and true positive fractions, of 0.05 and 0.95, respectively. All conclusions were to be based on a 90% (α=0.1) rectangular confidence region, using one sided exact confidence limits. As such we would attain 85% power with a minimum of 80 ureteral stones and a minimum of 256 non-stones.

Data abstraction

Based on clinical experience and review of the literature, five physician co-investigators from three specialties (emergency medicine, internal medicine, and urology) identified an a priori list of factors thought to potentially be predictive of ureteral stone (see supplementary appendix 1). We conducted a literature review using key word searches in PubMed and relevant citations through Web of Science (Thomson Reuters). These factors were then abstracted from medical records blinded to CT reports.10 The Lynx medical record used by emergency clinicians during the study period is a templated, handwritten chart that specifically prompts clinicians for the presence or absence of factors related to the chief complaint selected (typically flank or back pain), and was well suited to determining the presence or absence of factors. We abstracted the presence or absence of factors into a standardized form on an electronic database (Filemaker Pro 12, FileMaker, Santa Clara, CA).

We blindly abstracted and categorized the results of the dictated CT reports as previously described.8 The reports were reviewed primarily to determine whether a kidney stone was causing symptoms or whether the computed tomogram showed another cause of symptoms. We considered a kidney stone to be the cause of symptoms if it was located from the renal pelvis to the ureterovesical junction (parenchymal stones were not considered to cause symptoms) or if signs of passed ureteral stone were specifically mentioned in the CT report. We also documented acutely important alternative causes of symptoms (such as appendicitis, diverticulitis, and others).8 Other factors associated with kidney stone were also noted, including stone size, location, presence and degree of hydronephrosis or hydroureter, presence of perinephric or ureteral stranding, and asymptomatic stones as well as incidental findings (defined as unrelated to patient symptoms). We abstracted the CT results into a standard form on a separate FileMaker database.

Inter-rater reliability

To determine inter-rater reliability of elements abstracted from the medical record, we blindly re-reviewed a subset of 50 randomly selected records. A priori, any element with a κ of below 0.6 was not eligible for inclusion in the prediction rule. We performed inter-rater reliability of categorization of CT scan results from a random selection of records.8

Constructing the scoring system

All variables included were considered through univariate logistic regression analysis, with estimation of prevalence and odds ratios with corresponding 95% confidence intervals. We performed multivariate logistic regression, employing forward selection and 10-fold cross validation for model selection including estimation of two measures of prediction accuracy: the misclassification rate and the area under the receiver operating characteristic curve (AUC). Misclassification is a measure of prediction error, and ranges from 0 to 1, with lower scores indicating fewer errors in prediction. AUC ranges from 0.5 to 1, with higher scores indicating better prediction. The best model was the one that had a low cross validated misclassification rate and a high AUC. Subsequently, we included all observations to provide the most accurate estimates of the coefficients for the selected model and to derive a corresponding integer scoring system following the methods used in the Framingham study.11 The simplicity of this scoring system allows a patient’s risk to be calculated without the need for a calculator. Initially, we organized variables in the final multivariate model into meaningful categories, each with a specific reference value. We then assigned a referent risk for each factor with the base risk assigned 0 points in the scoring system, such that a higher point total conveys more risk. Next, we calculated the difference in terms of regression units between each category and the corresponding base category. We set the constant, B, as the number of regression units that corresponds to 1 point. We then computed the points for each risk factor’s risk categories as the difference in regression units between each category and its base category divided by B. Subsequently we calculated the risk associated with each point total through the multiple logistic regression equation. We used a weighted κ test is used to verify the agreement between risk estimates based on the point system and those based on the multivariate logistic regression model. In addition to estimating AUC for summarizing the model’s discrimination, we used the Hosmer and Lemeshow test to test for goodness of fit and calibration.

While the odds ratios (coefficients) from the multivariate regression analysis can be used to estimate the probability of an event (in this case ureteral stone), we sought to construct a more straightforward scoring system for clinical use without the use of complicated calculations. We assigned integer points to the presence of risk factors for ureteral stone using the coefficients from a multivariate analysis based on all observations, as described in the methods used to estimate the risk of cardiovascular disease in the Framingham study.11 We computed points for each factor as the difference in regression units between each category and its base category, which was given a value of zero.

To assess the difference in accuracy between the integer point system and the logistic regression model we calculated the misclassification rate, AUC, and weighted κ based on differences in classification for each model. In addition to estimating AUC for summarizing the model’s discrimination, we used the Hosmer and Lemeshow test to determine the goodness of fit and calibration.

After the point system was constructed from the derivation phase but before analysis of prospective data, the research team selected three categories for risk (low, moderate, and high) based on estimated clinical utility for the probability of ureteral stone by point total in each category.

Prospective validation

From May of 2011 to February of 2013, consecutive patients presenting during defined periods to the emergency department sites in whom the clinician intended to obtain a CT scan for kidney stone were approached for enrollment. Both clinicians and enrolling staff were not aware of the specific elements of the rule derived in the retrospective phase. Defined enrollment shifts included overnights, weekends, and holidays, and an automatic paging system was set up to notify the research associate of all CTs ordered for renal colic. Review of the hospital imaging system was conducted daily to monitor any patients missed during enrollment or when enrollment was not taking place.

Before analysis of the validation data, the scoring system was developed from the derivation set as described previously, yielding a 0-13 point scale. Also before analysis of the prospective data we stratified this scale based on estimated clinical utility into low (about 10%), moderate (about 50%), and high (about 90%) probability of ureteral stone. Estimated clinical utility of cut points on the scale were arrived at through consensus of all investigators, including physicians from emergency medicine, internal medicine, and urology. Stratification into three groups enabled the derivation and validation sets to be compared for clinical utility for discrimination of risk as well as allowing estimates of the prevalence of more rare important alternative findings in each group.

The research associated recorded all relevant factors (listed in supplementary appendix 1) from the derivation phase for the enrolled patients before the results of the CT were known. Research associates were not aware of the elements of the STONE score when prospective data were collected. They assigned point values of 0-13 and category of risk to each patient in the validation cohort blinded to the CT result, and the CT result was categorized blinded to the clinical factors (except laterality of pain) and point total. We used bootstrapping to estimate Hosmer-Lemeshow test and discrimination (AUC) with AUC point estimates and 95% confidence intervals.


Derivation sample

Of 5383 “flank pain protocol” CT scans (that is, the name for a non-enhanced CT scan using a renal colic protocol at our institution) performed in the emergency departments on patients 18 years of age or older during the retrospective period, 1853 (34.4%) were randomly selected for full record review. Of these, 1040 were complete records with no exclusion criteria (figure, also see supplementary figure 1). Table 1 lists the characteristics of the derivation and validation cohorts. Approximately half (49.5%; 515 of 1040) of the patients had a ureteral stone that was causing symptoms on their computed tomogram, whereas 2.9% (30 of 1040) had acutely important alternative causes of symptoms. Inter-rater reliability for categorization of the CT result yielded a κ of 0.75-0.80, indicating excellent agreement. Table 2 shows the factors that were significant for the presence or absence of ureteral stone on univariate analysis.


Prevalence of ureteral stone by STONE score category in derivation and validation cohorts. Percentages at top of bars indicate prevalence of ureteral stone in group. Values under bars indicate number within derivation and validation sets that fell within risk stratums

Table 1

Demographics of derivation and validation cohorts. Values are numbers (percentages) unless stated otherwise

View this table:
Table 2

Significant predictors for presence or absence of ureteral stone (univariate analysis of derivation set), with odds ratios and 95% confidence intervals

View this table:

STONE score

Multivariate analysis yielded five factors that were most significantly associated with the presence of a ureteral stone: male sex, acute onset of pain, non-black race, presence of nausea or vomiting, and microscopic hematuria (table 3). Previous visits to an emergency department were also significantly associated with a lower probability of ureteral stone but, to maximize generalizability between centers, were not included in the model. These five factors were incorporated into the STONE score with associated integer point values (table 3), yielding a total score ranging from 0-13.11 The multivariate logistic regression model had a misclassification rate of 0.23 (95% confidence interval 0.22 to 0.23) and an AUC of 0.86 (95% confidence interval 0.79 to 0.93), whereas the STONE score had a misclassification rate of 0.23 (0.22 to 0.23) and an AUC of 0.82 (0.74 to 0.90). Agreement between the risk estimates based on the STONE score and those based on the multivariate logistic regression model demonstrated a weighted κ of 0.87 (95% confidence interval 0.86 to 0.87), indicating minimal loss of accuracy by assigning integer points to the factors.

Table 3

STONE score, factors, and categories

View this table:

Prospective validation

From 25 May 2012 to 24 January 2013, 491 patients without exclusion criteria were enrolled (see supplementary figure 2). The characteristics of patients approached did not differ significantly from those that were not approached (table 1). For the validation cohort, the STONE score grouped into three levels of risk had an AUC of 0.792 (95% confidence interval 0.756 to 0.828) and the Hosmer-Lemeshow χ2=1.95 was not significant (P=0.38), indicating good discrimination and calibration.

Comparison of derivation and validation sets

In the derivation and validation sets, respectively, 19.8% and 15.5% of patients were classified as having a low probability of kidney stone, 49.6% and 46.8% as moderate, and 30.6% and 37.7% as high. The prevalence of ureteral stone by group in the derivation and validation sets was, respectively, 8.3% and 9.2% in the low probability group, 51.6% and 51.3% in the moderate group, and 89.6% and 88.6% in the high group (figure). Overall, acutely important alternative causes of symptoms were found on CT scan in 2.9% and 3.7% of the derivation and validation cohorts, with acutely important alternative causes in 0.3% and 1.6% of the high probability group, respectively. Table 4 shows the causes and frequency of acutely important alternative findings in the overall derivation and validation sets.

Table 4

Types and frequency of acutely important alternative causes of symptoms in derivation and validation sets, listed by decreasing frequency in derivation set

View this table:


This study showed that a clinical scoring system accurately predicts the likelihood of ureteral stone, which is inversely associated with likelihood of an acutely important alternate cause of symptoms. To our knowledge this is the first clinical scoring system to be derived and validated for prediction of uncomplicated ureteral stone in patients attending emergency departments in whom CT imaging is deemed indicated. A previous study from the intravenous pyelography era derived factors from 203 patients and validated the findings in 73 patients, finding four elements to be predictive of ureteral stone: flank pain, hematuria, acute onset of pain, and positive findings on a plain radiograph.13 Our data show that the quantitative effects of the five factors incorporated into the STONE score can accurately predict ureteral stone and allow stratification of patients in the emergency department with suspected kidney stone into one of three groups: low probability (≤10% chance of stone), moderate probability (about 50% chance of stone), and high probability (about 90% chance of stone).

Additionally, we found that the likelihood of an acutely important alternative finding is inversely proportional to the probability of a ureteral stone being present, as predicted by the STONE score. While the overall presence of acutely important alternative findings was 2.9% in the derivation set and 3.8% in the validation set, the prevalence of clinically important alternative diagnoses in the high probability group was less than half of this: 0.3% and 1.6% in the derivation and validation cohorts, respectively.

Clinical and policy implications

In deriving and validating this clinical prediction rule (rather than a decision rule), we are not necessarily stating that patients with a high stone score should not undergo CT imaging—though this may not be an unreasonable approach in certain situations. In any clinical situation the risk of a test (in this case from exposure to radiation) and the resources required to do the test will need to be balanced against the tolerance for uncertainty and risk of misdiagnosis on the part of both the clinician and the patient. In some patients—perhaps particularly younger ones who are more susceptible to radiation and less likely to have certain diagnoses such as diverticulitis, aortic disease, or malignancy—this score may be used to provide objective data to help balance the cost and risk of performing a CT. The other possibility is that this clinical prediction rule could be used to determine which patients may be most appropriate for substantially reduced dose CT, which has been shown to reliably identify ureteral stones, particularly large ones that may require intervention.14

CT use in the United States, and public health implications

Since the landmark paper by Smith and colleagues in 1996, CT has become the first line test for kidney stone in the United States.3 4 15 However, despite a 10-fold increase in the utilization of CT scanning for diagnosis of kidney stone from 1996-2007, the proportion of patients with a diagnosis of kidney stone, findings of significant alternative diagnoses, or hospital admission has not changed.3 7 This suggests that the increase in CT use for diagnosis of this condition may not be substantially improving patient centered outcomes.16 Outside of the United States, CT is not necessarily the first line test for suspected kidney stone.17 18 19 20 In 2011 the European Urology Association released comprehensive guidelines on urolithiasis in which it stated that “ultrasonography should be used as the primary procedure.”21 In 2007, the yearly rate of CT scanning in the United States was nearly 228 per 1000 population—more than double the rate in Canada and nearly four times the rate in the United Kingdom.22 23 These data are not specific to imaging in kidney stones and do not include patient outcomes, but the presence of wide regional variation (particularly in a condition that is not life threatening) suggests an opportunity for more appropriate utilization.24 25

While the health risk attributable to a single CT scan is small, in a country of 310 million people (approximate US population) it is important to note a lifetime incidence of nephrolithiasis of approximately 10%.1 If half of these people undergo a CT scan to detect nephrolithiasis (likely a conservative estimate as kidney stones are often recurrent and many patients undergo multiple CT scans26), we could expect 15 million CT scans to be performed on current US residents. In addition to the cost of this imaging, it could be estimated that exposure to ionizing radiation from CT would cause between 10 000 and 30 000 additional malignancies (using risk estimates of between 1 in 500 and 1 in 1500 for renal colic CT scans).27

In this setting CT was performed nearly as often in women as in men in both phases of the study (48.1% of CT scans in women in the derivation phase; 44.4% in the validation phase). However, the diagnostic yield (percentage of patients with ureteral stones on CT) for men was much higher: 68.8% in the derivation phase and 66.7% in the validation phase compared with women (28.7% and 41.7%, respectively). The lower diagnostic yield in women coupled with a higher risk from radiation of the pelvis with CT suggests that women (especially younger women) may be a group that could benefit from more judicious use of CT radiation.

Use of the score to select appropriate patients for reduced dose CT or ultrasonography

In terms of potential clinical utility, if a CT scan is being considered for suspected kidney stone and a patient has a high STONE score (which occurred in about a third of patients: 30.6% in the derivation cohort and 37.7% in the validation cohort), then the patient is very likely to have a kidney stone and very unlikely to have an important non-kidney stone cause of symptoms. Thus, if the STONE score is high a CT might be avoided entirely or a reduced dose CT could be performed (to ensure that there is not a large stone that may require intervention). It is important to note that it is still possible to miss an important alternative diagnosis in the high probability group if CT is not performed (of the roughly 10% of patients in the high group, about 10% of these, or 1-2% of the overall group, had an important alternative finding), However, the STONE score offers objective data to both the clinician and the patient that could help guide shared decision making about CT scanning, which is not without risk in terms of radiation and incidental findings that may lead to further testing or intervention. Our hope is that this score can be incorporated into imaging decisions for suspected renal colic to decrease exposure to radiation and over-utilization of imaging (that is, imaging without improvement in patient care).28 Further investigation, potentially including a randomized trial, may help to elucidate this.

Most kidney stones (smaller stones, about 80% in this study as is generally the case) will pass spontaneously with treatment of the symptoms. Patients with a very high probability of ureteral stone thus may not require any imaging and could be managed with pain control and drugs to enhance stone expulsion, with definitive diagnosis using a urine strainer. Clinicians may, however, still want to perform a CT to exclude potentially serious alternative causes of symptoms29 and to determine the size and location of any stone (with implications for prognosis and intervention).30 In this case, patients with a high STONE score may be ideally suited for substantially reduced dose CT scanning. Though data on low dose protocols have been published outside of the United States31 32 33 and the American College of Radiology states reduced dose techniques are “preferred,”4 data from the Dose Imaging Registry (part of the American College of Radiology National Radiology of Data Registry: www. indicates that the mean institutional dose for CT for renal colic is still greater than 10 mSv, and reduced dose techniques are rarely used in US hospitals (in press).34

Reduced dose CT has been shown to be accurate for kidney stones, particularly larger ones that may require intervention, but has not been widely used in the United States, likely because of concerns about accuracy in an unselected population.14 34 Reluctance to implement reduced dose CT protocols for renal colic may result from fear of missing other disease. An investigator looking at reduced dose CT for renal colic noted that to put these reduced dose protocols into practice they “would want to target it at patients who have a high pretest probability of calculi and obstructive uropathy, since the ability to detect other pathology is hindered.”35 In addition to predicting kidney stone, our data show that the group that is most likely to have kidney stones is also unlikely (<2%) to have an important alternative cause of symptoms. A probability of disease under 2% has been identified as a testing threshold (point at which the negatives of a test outweigh the positives) for CT use in detecting other important diseases, such as pulmonary embolism.36 Identifying patients in this group could safely direct some patients with suspected kidney stone to low dose or ultra low dose CT.

Ultrasound is another option that may be used for imaging in suspected renal colic, and ultrasonography is often a first line test outside of the United States.19 21 It has the advantage of avoiding radiation entirely and is sometimes definitively diagnostic: identifying the presence, size, and location of a kidney stone that is causing symptoms. Often, however, ultrasonography may show indirect evidence of obstruction (hydronephrosis) without visualizing the actual ureteral stone, which may be obscured by bowel. We did find the presence of hydronephrosis on CT to be highly predictive of ureteral stone, and future work will incorporate the presence of hydronephrosis on ultrasonography into the STONE score.

At our institution, the STONE score has been incorporated into the computerized physician order entry system (Epic, Verona WI). When a clinician orders a CT for kidney stone the questions asked and a STONE score with risk category accompanies the radiology order. This has been welcomed by the radiologists who were often unsure of the perceived likelihood of kidney stone on the part of the ordering physician. We have found that the STONE score is easily entered and calculated using our electronic health record. We are also currently using the STONE score in a prospective study to select patients who are appropriate for either expectant management (no CT) or an ultra low dose CT, with a radiation dose that is about 90% lower than conventional CT (effective dose of around 1 mSv, about that of a plain abdominal radiograph). On a population basis, assuming the no threshold linear model suggested by the Biologic Effects of Ionizing Radiation report (currently BEIR VII), an equivalent reduction in cancer risk could be expected.37 The current average effective dose of CT in the United States is 11.2 mSv, with only 2% of CT scans done using low doses.38

Strengths and limitations of this study

An important limitation of this study is that gestalt clinician pretest probability for kidney stone (that is, the overall clinician estimate for likelihood of kidney stone after initial clinician evaluation) has not been thoroughly investigated, and it is possible that it would perform similarly to an objective clinical prediction rule. A study by Abramson and colleagues showed that the pretest probability of emergency department physicians obtaining CT for suspected kidney stone clustered in the 41-60% and 71-90% ranges.39 However, the use of a relatively objective scoring system has the advantage that it is not dependent on clinician experience. In pulmonary embolism, for example, while gestalt pretest probability has been shown to be reasonably accurate, authors comparing gestalt pretest probability to objective scoring systems conclude that they “advocate the use of a clinical prediction rule because it has been shown to be accurate and can be used by less-experienced clinicians.”40 This study is also limited by being derived and validated in the same clinical setting; it is not known how well it would perform in other settings.


We have derived and validated a clinical prediction score for the presence of ureteral stones that cause symptoms. Multicenter validation and evaluation of incorporating the STONE score into imaging algorithms is warranted.

What is already known on this topic

  • Kidney stones are common, and imaging with computed tomography (CT) is now the first line diagnostic test

  • However, CT has not been shown to improve patient centered outcomes

  • An objective, validated clinical prediction rule for uncomplicated ureteral stone has not been demonstrated and could help decrease exposure to radiation or over-utilization of imaging

What this study adds

  • A clinical prediction rule was derived and validated that can identify patients with a high probability of uncomplicated ureteral stone and absence of other important cause of symptoms

  • Results from this study may be used to select patients who could benefit from management without CT, or from reduced dose CT


Cite this as: BMJ 2014;348:g2191


  • We thank Christal Esposito, project manager, and research associates Pasquale Cicarella and Richelle Jessey and other research associates for their help in collecting and organizing data. The authors have followed the STROBE checklist in collecting and reporting their data. Elements of the checklist are incorporated into the manuscript. Further details of any particular checklist item are available on request from the corresponding author (chris.moore{at}

  • Contributors: CLM conceived the work, monitored data collection for the whole trial, oversaw data cleaning and statistical analysis, and drafted and revised the manuscript. He is guarantor. SB helped with project conception and participated in data analysis and manuscript revision. BD assisted with data analysis and participated in manuscript revisions. SL was lead research assistant and data manager, he participated in data collection, designed data collection tools, oversaw cleaning and management of data, and assisted with presentation of results in the manuscript. AM provided input into methodology, performed statistical analyses, and participated in manuscript revisions. DS participated in study design, analysis of data, and manuscript revisions. CPG participated in study design, analysis of data, and manuscript revisions. The authors confirm that data were collected, results analyzed, and the manuscript prepared without influence from funding agencies.

  • Funding: This study was funded through a grant from the Agency for Healthcare Research and Quality (5R01HS018322-03): Identifying unnecessary irradiation of patients with suspected renal colic. This funding provided resources to assist with the collection, management, analysis and interpretation of the data, and preparation and review of the manuscript. The AHRQ provided funding and oversight of funding, but was not directly involved in collection or cleaning of data, analysis of results, or drafting of the manuscript.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: All authors with the exception of BD and SB were funded in part by a research grant from the Agency for Healthcare Research and Quality (5R01HS018322-03); CPG received support from the Robert Wood Johnson clinical scholars program; CLM has received compensation as a consultant from Philips Healthcare and Sonosite (a subsidiary of FujiFilm); CPG is a scientific advisory board member for Fair Health; CPG receives funding from Medtronic as a collaborator on the Yale University Open Data Access project; there are no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: This study was approved by the Yale Human Investigation Committee (institutional review board).

  • Data sharing: The raw data from this study is not currently available online; however, the authors are willing to share a deidentified dataset of primary data with interested parties. Available on request from the corresponding author (chris.moore{at}

  • Transparency: CLM, the lead author and guarantor, affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:


View Abstract