Predicting risk of osteoporotic and hip fracture in the United Kingdom: prospective independent and external validation of QFractureScoresBMJ 2011; 342 doi: http://dx.doi.org/10.1136/bmj.d3651 (Published 22 June 2011) Cite this as: BMJ 2011;342:d3651
- Gary S Collins, senior medical statistician,
- Susan Mallett, senior medical statistician,
- Douglas G Altman, director, professor of statistics in medicine
- 1Centre for Statistics in Medicine, Wolfson College Annexe, University of Oxford, Oxford OX2 6UD, UK
- Correspondence to: G S Collins
- Accepted 4 April 2011
Objective To evaluate the performance of the QFractureScores for predicting the 10 year risk of osteoporotic and hip fractures in an independent UK cohort of patients from general practice records.
Design Prospective cohort study.
Setting 364 UK general practices contributing to The Health Improvement Network (THIN) database.
Participants 2.2 million adults registered with a general practice between 27 June 1994 and 30 June 2008, aged 30-85 (13 million person years), with 25 208 osteoporotic fractures and 12 188 hip fractures.
Main outcome measures First (incident) diagnosis of osteoporotic fracture (vertebra, distal radius, or hip) and incident hip fracture recorded in general practice records.
Results Results from this independent and external validation of QFractureScores indicated good performance data for both osteoporotic and hip fracture end points. Discrimination and calibration statistics were comparable to those reported in the internal validation of QFractureScores. The hip fracture score had better performance data for both women and men. It explained 63% of the variation in women and 60% of the variation in men, with areas under the receiver operating characteristic curve of 0.89 and 0.86, respectively. The risk score for osteoporotic fracture explained 49% of the variation in women and 38% of the variation in men, with corresponding areas under the receiver operating characteristic curve of 0.82 and 0.74. QFractureScores were well calibrated, with predicted risks closely matching those across all 10ths of risk and for all age groups.
Conclusion QFractureScores are useful tools for predicting the 10 year risk of osteoporotic and hip fractures in patients in the United Kingdom.
The World Health Organization defined osteoporosis as a progressive systemic, skeletal disease characterised by low bone mass and micro-architectural deterioration of bone tissue with a consequent increase in bone fragility and susceptibility to fracture.1 In 2000 an estimated 9 million incident cases of osteoporotic fractures occurred worldwide, comprising 1.6 million hip fractures, 1.7 million fractures of the forearm, and 1.4 million vertebral fractures. The annual costs of health and social care for fractures in the United Kingdom are estimated to be £1.8bn (€2bn; $3bn).2
Identifying those at an increased risk of fracture and who might benefit from a therapeutic or preventive intervention is important. QFractureScores are sex specific multivariable risk scores that have recently been developed to predict the 10 year risk of both osteoporotic fracture (vertebra, distal radius, or hip) and hip fracture.3
The risk scores were developed and validated on a large cohort of patients (3.6 million) from the QResearch (www.qresearch.org) database; two thirds of the cohort were randomly allocated to model development and one third to model validation. QResearch is a large database comprising over 12 million anonymised health records from 602 practices throughout the United Kingdom that use the Egton Medical Information Systems (EMIS) computer system (used in 59% of general practices in England). QFractureScores were developed on 2.4 million patients aged between 30 and 85 years contributing 16 million person years of observation, with 32 284 incident diagnoses of osteoporotic fracture, and 14 726 hip fractures, recorded between 1 January 1993 and 30 June 2008. QFractureScores were derived using a Cox proportional hazards model. Fractional polynomials were used to model non-linear risk relations with continuous risk factors, and the presence of interactions between risk factors was tested.4 Multiple imputation was used to replace missing values for key risk factors (smoking status, alcohol consumption, and body mass index) to reduce the biases that can occur in a complete case analysis.5 6 The final prediction model included 17 risk factor terms for women and 12 for men (box). Open source codes to calculate the QFractureScores are available from www.qfracture.org released under the GNU lesser general public licence, version 3. The performance of the QFractureScores was assessed on an additional 1.3 million patients from the QResearch database with 18 471 incident osteoporotic fractures and 7162 incident hip fractures and showed favourable performance when compared with the FRAX (fracture risk assessment) score.7
Risk factors in QFractureScores
Women and men
Body mass index (continuous)
Smoking status (non-smoker, former smoker, light smoker (<10 cigarettes/day), moderate smoker (10-19 cigarettes/day), heavy smoker (≥20 cigarettes/day)
Recorded alcohol use (none, trivial <1 unit/day, light 1 or 2 units/day, medium 3-6 units/day, heavy 7-9 units/day, very heavy >9 units/day)
Diagnosis of rheumatoid arthritis (yes/no)
Diagnosis of cardiovascular disease (yes/no)
Diagnosis of type 2 diabetes (yes/no)
Diagnosis of asthma (yes/no)
At least two prescriptions for tricyclic antidepressants in the six months before baseline (yes/no)
At least two prescriptions for corticosteroids in the six months before baseline (yes/no)
History of falls (yes/no)
Diagnosis of chronic liver disease (yes/no)
Parental history of osteoporosis (yes/no)
Diagnosis of gastrointestinal malabsorption (yes/no)
Diagnosis of other endocrine symptoms (yes/no)
At least two prescriptions of hormone replacement therapy (yes/no)
Menopausal symptoms (yes/no)
The development of the model and initial internal validation study were both carried out using random samples of data from the same population. External validation is also important to evaluate the model’s performance more widely.8 9 We independently evaluated the performance of QFractureScores on a separate large dataset of general practice records in the United Kingdom.
Study participants were patients registered between 27 June 1994 and 30 June 2008 with records on the THIN database (www.thin-uk.com). This database comprises the records of general practices that use the INPS Vision system (In Practice Systems, London); currently 20% of UK general practices. Patients were eligible if they were aged between 30 and 85 years, had no previously recorded fracture (hip, distal radius, or vertebra), were permanent residents in the United Kingdom, and had no interrupted periods of registration with a practice.
The two primary outcomes were first (incident) diagnosis of an osteoporotic fracture (hip, distal radius, or vertebra) as recorded on the general practice computer records (THIN), and incident diagnosis of hip fracture.
We determined an entry date for each patient, which was the latest of the date of their 30th birthday, date of registration with the practice, date on which the practice computer system was installed plus one year, or the beginning of the study period (27 June 1994). Patients were included in the analysis only once they had a minimum of one year’s complete data in their medical record. Observation time was calculated from the entry date to an exit date, which was defined as the earliest date of recorded fracture, date of death, date of deregistration with the practice, date of last upload of computerised data, or the study end date (30 June 2008).
Smoking status (when recorded) was derived from two risk factors: whether the patient was a non-smoker, former smoker, or current smoker, and the number of cigarettes smoked—categorised as light (<10 cigarettes/day), moderate (10-19 cigarettes/day), or heavy (≥20 cigarettes/day). These two risk factors were then combined into five categories: non-smoker, former smoker, light smoker, moderate smoker, and heavy smoker.
Using QFractureScores we calculated the 10 year estimated risk of fracture (osteoporotic and hip) for every patient in the THIN cohort. We used the Kaplan-Meier method to obtain observed 10 year fracture risks. To replace missing values for smoking status, number of cigarettes smoked, alcohol consumption, and body mass index we used multiple imputation using all predictors plus the outcome variable. This involves creating multiple copies of the data and imputing the missing values with sensible values randomly selected from their predicted distribution. We generated five imputed datasets and combined the results from analyses on each of the imputed datasets using Rubin’s rules to produce estimates and confidence intervals that incorporate the uncertainty of imputed values.10
Predictive performance of QFractureScores for the THIN cohort was assessed by examining measures of calibration and discrimination. Calibration refers to how closely the predicted 10 year risk of fracture agrees with the observed 10 year risk of fracture. We assessed this for each 10th of predicted risk, ensuring 10 equally sized groups, and for each five year age band by calculating the ratio of predicted to observed fracture risk separately for men and for women. We assessed the calibration of the risk score predictions by plotting observed proportions versus predicted probabilities. The Brier score for censored survival data was also calculated, which is a measure of accuracy and is the average squared deviation between predicted and observed risk; a lower score represents greater accuracy.11
Discrimination is the ability of the risk score to differentiate between those patients who do and those who do not experience a fracture event during the study. This measure is quantified by calculating the area under the receiver operating characteristic curve statistic; a value of 0.5 represents chance and 1 represents perfect discrimination. We also calculated the D statistic12 and R2 statistic,13 which are measures of discrimination and explained variation, respectively, and are tailored towards censored survival data.
Between 27 June 1994 and 30 June 2008, 2 244 636 eligible patients from 364 general practices in the United Kingdom were registered in the THIN database. For the osteoporotic fracture end point, 2 209 451 patients (50.6% women) were eligible for analysis (27 551 had a recorded fracture before 27 June 1994 and were excluded) contributing 12 784 326 person years of observation, of which there were 25 208 incident cases of osteoporotic fracture. For the hip fracture end point, 2 244 636 patients were eligible for analysis (5202 had a recorded hip fracture before 27 June 1994 and were excluded) contributing 13 053 891 person years of observation, of which there were 12 188 incident cases of hip fracture. The median follow-up for the osteoporotic fracture end point was 5.98 (interquartile range 2.61-8.50) years and for the hip fracture end point was 6.03 (2.62-8.50) years. In total, 319 991 patients (14.5%) were followed up for 10 years or more for the osteoporotic end point and 329 571 patients (14.7%) for the hip fracture end point.
The 10 year observed risk of osteoporotic fracture in women (19 055 incident osteoporotic fractures) aged between 30 and 85 was 3.04% (95% confidence interval 2.99% to 3.09%) and in men (6153 incident osteoporotic fractures) was 1.04% (1.01% to 1.07%). The 10 year observed risk of hip fracture in women (9165 incident hip fractures) aged between 30 and 85 was 1.48% (1.44% to 1.51%) and in men (3023 incident hip fractures) was 0.52% (0.50% to 0.54%). Table 1⇓ details the characteristics of the patients in the THIN cohort.
Complete data on smoking status, smoking category, alcohol consumption, and body mass index were available for 44.5% of women (n=506 081) and 31.8% of men (n=352 380). Most patients (n=1 693 039; 75.4%) had no or only one missing risk factor (table 2⇓). Missing data were noticeably high for alcohol consumption (45.0% for women and 60.7% for men). For other risk factors 17.6% of women and 24.4% of men had missing data on body mass index, 13.5% and 12.4% on smoking status, and 6.1% and 10.8% for number of cigarettes smoked.
Table 3⇓ shows the incidence rates (per 1000 person years) for fracture by age and sex in the THIN cohort. Women experienced 19 055 fractures during 6 493 740 person years of follow-up: incidence rate 2.93 (95% confidence interval 2.89 to 2.98). Men experienced 6153 osteoporotic fractures during 6 290 586 person years of follow-up: incidence rate 0.98 (0.95 to 1.00). The rates of osteoporotic fracture were highest in those aged 80 to 85 years at baseline: incidence rate in women 17.7 (17.14 to 18.28) and in men 7.01 (6.54 to 7.51). Hip fractures accounted for 48.1% of fractures in women and 49.1% in men. Overall, 9165 hip fractures were recorded for women during the 6 673 972 person years of follow-up, giving an overall incidence rate of 1.37 (1.35 to 1.40), whereas for men, 3023 incident hip fractures occurred during 6 379 919 person years of follow-up, giving an incident rate of 0.47 (0.46 to 0.49). Among those aged 80 to 85 years the incidence of hip fracture in women was 12.47 (12.02 to 12.94) and in men was 5.42 (5.01 to 5.86).
Table 4⇓ presents the performance data on discrimination and calibration for the QFractureScores in the THIN cohort alongside those previously reported in the internal validation using the QResearch cohort.3 The R2 statistic (percentage of explained variation) was about 4% higher for the osteoporotic fracture score in women and 8% higher in men compared with the internal validation performance data. For the hip fracture score, the R2 statistic was marginally lower for both women (1%) and men (3%) in the THIN cohort compared with the internal validation. The D statistic was noticeably higher for hip fracture in both women and men (2.66 and 2.53) than for osteoporotic fracture (2.02 and 1.60). The D statistics of QFractureScores for osteoporotic fracture were higher in the external validation than in the internal validation. Values for the area under the receiver operating characteristic curve were higher for osteoporotic fracture in the THIN cohort in both women and men (0.816 and 0.739) than in the internal validation (0.788 and 0.688). Values for the area under the receiver operating characteristic curve for hip fracture alone were higher than for osteoporotic fracture, whereas values were comparable to those obtained in the internal validation. The Brier score (adjusted for censoring), a measure of prediction accuracy, was better for hip fracture than for osteoporotic fracture.
Figures 1⇓ and 2⇓ present the calibrations of QFractureScores for men and women by 10th of risk for both the THIN cohort (external validation) and the QResearch cohort (internal validation). Model calibration was good, with close agreement between predicted and observed risk of osteoporotic and hip fractures across all 10ths of risk, with no discernible over-prediction or under-prediction. Similarly, figure 3⇓ displays the calibration of QFractureScores for men and women by age group. Agreement between predicted and observed fracture risks was close across all age groups.
QFractureScores are four new risk scores for predicting the 10 year risk of osteoporotic and hip fracture separately in men and women, developed and internally validated on the large UK primary care electronic QResearch database.3 The database comprised 3.7 million patients registered between 1 January 1993 and 30 June 2008, contributing 50 755 osteoporotic fractures, 19 531 hip fractures, and 25 million person years of observation using the EMIS computer system.
QFractureScores were designed to be based on risk factors that are readily available and recorded in patients’ health records or which patients themselves are likely to know. Other risk scores for predicting osteoporotic fracture have included clinical ones such as loss of bone mineral density, as measured by dual energy x ray absorptiometry.7 16 The rationale for excluding laboratory tests or clinical measurements in QFractureScores was to make the score readily available and cost effective so that it could be easily implemented in routine clinical practice and national screening initiatives. All the variables in the QFractureScores will be known to the patient or collected as part of routine clinical practice and recorded within the patients’ healthcare record. Thus QFractureScores can be applied in primary care to identify patients who are at high risk and would benefit from further investigation without the need for expensive tests to identify those at an increased risk.
WHO has recently developed a risk prediction model, the FRAX (fracture risk assessment) instrument, to predict individual risks of fracture.7 FRAX has been incorporated into clinical guidelines in the United Kingdom and United States17 18 19 and uses information on age, sex, height, body mass index, previous fracture, family history of fracture, glucocorticoid use, current smoking status, alcohol consumption, rheumatoid arthritis, other secondary causes of osteoporosis, and, if available, bone mineral density of the femoral neck. The details and source code for the calculation of an individual’s risk using FRAX has to date not been published or released for independent evaluation. We sought to compare QFractureScores against FRAX for the hip fracture end point but our requests for an independent head to head comparison were not taken up by the developers of FRAX.
Although the development and intended use of QFractureScores were for the United Kingdom, the risk factors in both the osteoporotic and the hip fracture models include no UK specific terms (such as the Townsend deprivation score as used in other QResearch risk scores or ethnicity). There is thus no reason why the QFractureScores could not be used outside the United Kingdom; however, the performance of the models would need to be established before use in a different population.
Our independent evaluation of QFractureScores was carried out on the large separate database (THIN) based on general practices recording clinical data using the INPS Vision system, which is used in 20% of UK general practices. The dataset comprised 2.2 million patients registered between 27 June 1994 and 30 June 2008, contributing 13 million person years of observation. The performance data presented in this article on the THIN cohort provide strong evidence to support the external validity of QFractureScores in predicting 10 year risk of osteoporotic or hip fracture, with comparable performance to that observed in the internal validation data. QFractureScores for osteoporotic and hip fracture end points in both women and men were well calibrated in this large cohort, with good agreement between predicted and observed risks. Discriminant ability in both the internal validation and the external validation are impressive, with values for the area under the receiver operating characteristic curve exceeding 0.8 for women and 0.73 for osteoporotic fracture in men and 0.86 for hip fracture in men.
To date, the development, internal validation, and our external validation of QFractureScores have used 5.9 million patients. These patients have contributed 38 million person years of observation and 75 963 new cases of osteoporotic fractures and 31 719 of hip fractures during the observation periods. Such data have been used to develop and evaluate four risk scores to predict the 10 year risk of osteoporotic and hip fracture in adults aged 30 to 85.
In this study, we have provided an independent and external validation of the QFractureScores risk score on a large cohort of patients in the United Kingdom. We have assessed the performance of QFractureScores against the performance data from the internal validation of QFractureScores with comparable results and shown good evidence to confirm predictive power of the risk scores without the need for recalibration.
What is already known on this topic
International guidelines propose a targeted approach based on a 10 year absolute fracture risk for identifying high risk patients who may benefit from interventions
QFractureScores were developed and internally validated using a large cohort of UK patients and published in 2009
Risk prediction models need to be independently and externally validated to objectively evaluate performance
What this study adds
Results from this independent and external validation of QFractureScores indicated good performance data for both osteoporotic and hip fracture end points
The performance data for both risk prediction models, especially for hip fracture score, suggest that the QFractureScores could be useful in identifying patients in the UK who are at increased risk of osteoporotic fracture who could benefit from interventions
Cite this as: BMJ 2011;342:d3651
Contributors: GSC carried out the analysis and prepared the first draft, which was revised according to comments and suggestions from SM and DGA. GSC is guarantor.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not for profit sectors.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study was approved by the Trent multicentre research ethics committee.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.