- Julia Hippisley-Cox, professor of clinical epidemiology and general practice,
- Carol Coupland, associate professor in medical statistics
- Correspondence to: J Hippisley-Cox
- Accepted 8 July 2009
Objective To develop and validate two new fracture risk algorithms (QFractureScores) for estimating the individual risk of osteoporotic fracture or hip fracture over 10 years.
Design Prospective open cohort study with routinely collected data from 357 general practices to develop the scores and from 178 practices to validate the scores.
Setting General practices in England and Wales.
Participants 1 183 663 women and 1 174 232 men aged 30-85 in the derivation cohort, who contributed 7 898 208 and 8 049 306 person years of observation, respectively. There were 24 350 incident diagnoses of osteoporotic fracture in women and 7934 in men, and 9302 incident diagnoses of hip fracture in women and 5424 in men.
Main outcome measures First (incident) diagnosis of osteoporotic fracture (vertebral, distal radius, or hip) and incident hip fracture recorded in general practice records.
Results Use of hormone replacement therapy (HRT), age, body mass index (BMI), smoking status, recorded alcohol use, parental history of osteoporosis, rheumatoid arthritis, cardiovascular disease, type 2 diabetes, asthma, tricyclic antidepressants, corticosteroids, history of falls, menopausal symptoms, chronic liver disease, gastrointestinal malabsorption, and other endocrine disorders were significantly and independently associated with risk of osteoporotic fracture in women. Some variables were significantly associated with risk of osteoporotic fracture but not with risk of hip fracture. The predictors for men for osteoporotic and hip fracture were age, BMI, smoking status, recorded alcohol use, rheumatoid arthritis, cardiovascular disease, type 2 diabetes, asthma, tricyclic antidepressants, corticosteroids, history of falls, and liver disease. The hip fracture algorithm had the best performance among men and women. It explained 63.94% of the variation in women and 63.19% of the variation in men. The D statistic values for discrimination were highest for hip fracture in women (2.73) and men (2.68) and were over twice the magnitude of the corresponding values for osteoporotic fracture. The ROC statistics for hip fracture were also high: 0.89 in women and 0.86 for men versus 0.79 and 0.69, respectively, for the osteoporotic fracture outcome. The algorithms were well calibrated with predicted risks closely matching observed risks. The QFractureScore for hip fracture also had good performance for discrimination and calibration compared with the FRAX (fracture risk assessment) algorithm.
Conclusions These new algorithms can predict risk of fracture in primary care populations in the UK without laboratory measurements and are therefore suitable for use in both clinical settings and for self assessment (www.qfracture.org). QFractureScores could be used to identify patients at high risk of fracture who might benefit from interventions to reduce their risk.
Osteoporotic fractures are a major and increasing cause of morbidity in the population and a considerable burden to health services. Hip fractures, in particular, result in considerable pain, loss of function, and admission to hospital, making prevention a high priority for patients and physicians and for public health. Various therapeutic and lifestyle interventions might reduce the risk of osteoporosis and hence an individual’s risk of fracture.1 The challenge now is to improve methods for accurate identification of individuals at high risk who might benefit from a therapeutic or preventive intervention. While there is no universally accepted policy for screening for patients at risk of osteoporotic fracture, some guidelines,2 3 4 5 but not all,6 recommend a targeted approach to the prevention of osteoporosis based on the 10 year absolute risk of major osteoporotic fracture. Risk prediction utilities are therefore required to accurately estimate individual risk as well as enable a systematic targeted population based screening approach.
Traditional approaches based on measurement of bone mineral density alone are unsuitable for population screening because of cost and low sensitivity.7 Most fractures occur in women with normal bone mineral density,8 and the evidence suggests that risk prediction algorithms that do not include bone mineral density are almost as good as those that do.9 We need less expensive and more practical methods of identifying those at high risk; these should ideally be based on models developed from contemporaneous data in diverse populations representative of the clinical setting in which these models will subsequently be applied. Risk prediction utilities tend to perform best in the settings in which they have been derived both in terms of discrimination (that is, ability to separate out those who will and will not develop a fracture) and also calibration (how closely the overall predicted risk matches the observed risk).10
There are several established risk factors for osteoporotic fracture that could be used to derive a risk prediction algorithm for use within primary care. Many of these risk factors are reliably recorded within primary care clinical computer systems and hence such data can be used to derive robust utilities that can then be applied in primary care. The incidence of fracture and the prevalence of associated risk factors will change over time, and the methods to derive the risk prediction algorithms need to be dynamic so that they can be remodelled over time. UK datasets derived from family practices have the advantage of having large and broadly representative populations with historical data tracking back well over a decade in most practices, and they are continually updated.
We developed and validated two new fracture clinical risk scores (QFractureScores) derived from a large and representative primary care population from a validated clinical research database (www.qresearch.org). We analysed more than two million patients to address some of the research questions regarding risk factors for osteoporotic fracture in men and women highlighted within the recent NICE guidance,6 National Osteoporosis Guideline,5 and the World Health Organization.11 We incorporated traditional variables already included in the FRAX (fracture risk assessment) algorithm7 and added additional variables that affect risk of fracture, such as history of falls, type 2 diabetes, cardiovascular disease, asthma, use of hormone replacement therapy (HRT), and use of tricyclic antidepressants.5 12 13 14 We also extended the age range to include younger patients. Lastly, we incorporated a more detailed categorisation of alcohol and smoking status. Our new algorithm is based on variables that are readily available in patients’ electronic primary healthcare records15 or that the patients themselves would probably know, without the need for laboratory tests or clinical measurements. This approach is designed to enable the algorithms to be readily and cost effectively implemented in routine clinical practice or used by individual patients.
Study design and data source
We conducted a prospective cohort study in a large primary care population of patients from version 20 of the QResearch database. This is a large validated primary care electronic database containing the health records of over 11 million patients registered from 574 general practices that use the Egton Medical Information System (EMIS) computer system. Practices and patients contained on the database are nationally representative for England and Wales and similar to those on other large national primary care databases that use other clinical software systems.16
We included all QResearch practices in England and Wales once they had been using their current EMIS system for at least a year to ensure completeness of recording of morbidity and prescribing data. We randomly allocated two thirds of practices to the derivation dataset and the remaining third to the validation dataset; the simple random sampling utility in Stata was used to assign practices to the derivation or validation cohort.
We identified an open cohort of patients aged 30-85 at the study entry date, drawn from patients registered with eligible practices during the 15 years between 1 January 1993 and 30 June 2008. We used an open cohort design, rather than a closed cohort design, as this allows patients to enter the population throughout the whole study period rather than require registration on a fixed date, reflecting the realities of routine clinical practice. We excluded patients with a previous recorded fracture (hip, distal radius, or vertebral), temporary residents, patients with interrupted periods of registration with the practice, and those who did not have a valid Townsend deprivation score related to the postcode (about 4% of the population).
For each patient, we determined an entry date to the cohort, which was the latest of the date of their 30th birthday, date of registration with the practice, date on which the practice computer system was installed plus one year, and the beginning of the study period (1 January 1993). We included patients in the analysis only once they had a minimum of one year’s complete data in their medical record.17 For each patient we also determined an exit date, which was the earliest of date of recorded fracture, date of death, date of deregistration with the practice, date of last upload of computerised data, or the study end date (30 June 2008).
Our two primary outcomes were the first (incident) diagnosis of an osteoporotic fracture (hip, vertebral, or distal radius) as recorded on the general practice computer records and incident diagnosis of hip fracture.
Fracture risk factors
We examined the following explanatory variables in our analysis, all of which are known or thought to affect fracture risk and are also likely to be recorded within the patients’ electronic records as part of routine clinical practice:
Age at study entry (in single years)
Body mass index (BMI) (continuous)18
Townsend deprivation score (with 2001 census data, evaluated at output area, as a continuous variable)
Recorded parental history of osteoporosis or hip fracture in a first degree relative (binary variable yes/no)22
Diagnosis of cardiovascular disease at baseline (binary variable yes/no)12
Recorded use of alcohol (none, trivial <1 unit/day, light 1-2 units/day, medium 3-6 units/day, heavy 7-9 units/day, very heavy >9 units/day)23
Diagnosis of rheumatoid arthritis at baseline (binary variable yes/no)24
Diagnosis of type 2 diabetes at baseline (binary variable yes/no)25
Diagnosis of asthma at baseline (binary variable yes/no)
History of falls before baseline (binary variable yes/no)
Diagnosis of chronic liver disease at baseline (binary variable yes/no)
Diagnosis of gastrointestinal conditions likely to result in malabsorption (such as Crohn’s disease, ulcerative colitis, coeliac disease, steatorrhoea, blind loop syndrome) at baseline (binary variable yes/no)7
Diagnosis of other endocrine conditions (thyrotoxicosis, primary or secondary hyperparathyroidism, Cushing’s syndrome) at baseline (binary variable yes/no)
At least two prescriptions for systemic corticosteroids in the six months before baseline (binary variable yes/no)26
At least two prescriptions for tricyclic antidepressants in the six months before baseline (binary variable yes/no)14
At least two prescriptions for HRT (in women) in the six months before baseline13
Menopausal symptoms (in women), including vaginal dryness or hot flushes (binary variable yes/no), recorded at baseline.
We restricted all values of these variables to those that had been recorded in the person’s electronic healthcare record before baseline, except for BMI and alcohol and smoking status for which we used the values recorded closest to study entry date and recorded before the diagnosis of osteoporotic fracture (or before censoring for those who did not develop a fracture). We assumed that if there was no recorded value of a diagnosis, prescription, or family history then the patient did not have that exposure.
Model derivation and development
We calculated crude incidence rates of osteoporotic fracture (hip, vertebral, or distal radius fracture) and hip fracture by age and sex in the derivation and validation cohorts. We used Cox’s proportional hazards models in the derivation dataset to estimate the coefficients and hazard ratios associated with each potential risk factor for the first ever recorded diagnosis of overall fracture and hip fracture for men and women separately. We compared models using the Akaike information criterion (AIC) and the Bayes information criterion (BIC),27 which are likelihood measures in which lower values indicate better fit and in which a penalty is paid for increasing the number of variables in the model. We used fractional polynomials to model non-linear risk relations with continuous variables where appropriate.28 We tested for interactions between age and smoking; age and parental history of osteoporosis; age and BMI; age and falls; age and use of HRT; use of HRT and smoking; and use of HRT and deprivation. We included significant interactions in the final model when they improved the model fit based on the AIC. Continuous variables were centred for analysis. We checked the assumptions of the proportional hazards model for each variable graphically using log−log survival plots.
After conducting a complete case analysis, we used multiple imputation to replace missing values for alcohol, smoking status, and BMI, and used these values in our main analyses.29 30 31 32 We used the ICE procedure in Stata33 to obtain five imputed datasets. Our final model was fitted based on multiply imputed datasets using Rubin’s rules to combine effect estimates and estimate standard errors to allow for the uncertainty because of missing data. Multiple imputation is a statistical technique designed to reduce the biases that can occur in “complete case” analysis along with a substantial loss of power and precision.29 32 34 The imputation technique involves creating multiple copies of the data and replaces missing values with imputed values based on a suitable random sample from their predicted distribution. Multiple imputation therefore allows patients with incomplete data to still be included in analyses, thereby making full use of all the available data, thus increasing power and precision but without compromising validity.35
We took the regression coefficient (that is, the log of the hazard ratio) for each variable from the final model using multiply imputed data and used these as weights for the QFractureScores. As in previous studies,10 36 we combined these weights with the baseline survivor function for diagnosis of fracture or hip fracture obtained from the Cox model evaluated at 10 years and centred on the means of continuous risk factors to derive a risk equation for 10 years’ follow-up.
In women we determined the hazard ratios for osteoporotic fracture overall and for hip fracture by HRT use at baseline categorised by (unopposed, cyclical, or continuous) oestrogen dose (high v low dose) and type of oestrogen (equine v non-equine). These results were incorporated in the QFractureScores for women.
In a separate analysis, we used a time varying Cox regression analysis to examine the effects of duration of use of HRT and time since stopping HRT on risk of fracture in women, treating these terms as time varying covariates. For duration of use of HRT we analysed non-users and new users to determine the hazard ratio of each fracture outcome within one year, two to four years, five to nine years, and 10 years or more of taking HRT compared with no HRT use. We also determined change in risks after stopping HRT categorised as within a year of stopping, one to two years, two to five years, and five or more years. The date of stopping HRT was taken to be 270 days after the date of the last recorded prescription.
Validation of the QFractureScores
We tested the performances of the final models (QFractureScores) in the validation dataset. We calculated the 10 year estimated risk of sustaining a fracture or hip fracture for each patient in the validation dataset using multiple imputation to replace missing values for alcohol, smoking status, and BMI, as in the derivation dataset.
We calculated the mean predicted fracture risk and the observed fracture risk at 10 years36 and compared these by 10th of predicted risk. The observed risk at 10 years was obtained by using the 10 year Kaplan-Meier estimate. We calculated the D statistic (a measure of discrimination where higher values indicate better discrimination)37 and an R2 statistic (which is a measure of explained variation for survival data, where higher values indicate more variation is explained).38 We also calculated the area under the receiver operating characteristics (ROC) curve at 10 years, where higher values indicate better discrimination.
Validation against FRAX (fracture risk assessment)
We compared the performance of the QFractureScore in predicting risk of hip fracture with the performance of the FRAX algorithm using the above validation statistics. FRAX is a relatively new algorithm that predicts 10 year absolute risk of hip fracture and osteoporotic fracture.7 It is not currently in widespread use in primary care in the United Kingdom. We used the version that does not incorporate bone mineral density. This version of FRAX is based on the following variables;
Parental history of hip fracture
Use of alcohol (>3 units/day).
We restricted our comparative analysis to the hip fracture outcome as this is directly comparable between both scores, whereas the FRAX fracture outcome also includes humerus fractures. We used the UK version of the score from the FRAX website (www.shef.ac.uk/FRAX/index.htm) to calculate the 10 year predicted risk of hip fracture for all patients aged 40-85 in the validation dataset, based on relevant input variables of age, sex, height, weight, parent had fractured hip (yes/no), current smoking (yes/no), glucocorticoids (yes/no), rheumatoid arthritis (yes/no), secondary osteoporosis (yes/no), and alcohol >3 units/day (yes/no). Secondary osteoporosis was defined as having liver disease, malabsorption, or endocrine disorders. In all cases previous fracture was counted as negative as we restricted our cohort to patients without a previous fracture. We used the same multiply imputed data that replaced missing values for alcohol use, smoking status, and BMI to calculate the FRAX scores as we used in the validation for the QFractureScores. As with the QFractureScores we assumed that if there was no recorded value of a diagnosis, prescription, or family history then the patient did not have that exposure. We entered the variables for each patient twice, in random order, using automated software to test the reproducibility of the scores generated by the FRAX website.
As we used all the available data on the QResearch database we did not calculate sample size before the study. All analyses were conducted with Stata (version 10). We chose a significance level of 0.01 (two tailed) as we were considering several variables as potential risk factors in a large dataset and wanted to reduce the risk of having an overly complex model including variables with limited prognostic value.
Description of the derivation and validation dataset
Overall, 535 practices in England and Wales met our inclusion criteria, of which 357 were randomly assigned to the derivation dataset and 178 to the validation dataset.
In the derivation cohort there were 1 204 222 women (1 187 354 men) aged 30-85 at baseline, of whom 20 559 (13 122) had a recorded fracture before the start of the study and were therefore excluded, leaving 1 183 663 (1 174 232) free of fracture at baseline for analysis.
In the validation cohort there were 653 789 women (640 943 men) aged 30-85 at baseline, of whom 11 636 (7179) had a fracture before the start of the study and were therefore excluded, leaving 642 153 (633 764) free of fracture at baseline for analysis.
Table 1⇓ compares the key characteristics of eligible patients in each cohort. While this validation cohort was drawn from an independent group of practices, the baseline characteristics were similar to those for the derivation cohort across all measures in both men and women.
Table 2⇓ shows the incidence rates in each cohort. During the 7 898 208 person years of follow-up for women in the derivation cohort 24 350 fractures were recorded (hip, vertebral, or distal radius), giving an overall incidence rate of 3.08 per 1000 person years (95% confidence interval 3.04 to 3.12). For men, there were 7934 incident fractures arising from 8 049 306 person years, giving an incidence rate of 0.99 per 1000 person years (0.96 to 1.01). In women, 38.2% of the fractures were hip fractures, in men the corresponding figure was 38.9%. Similar incidence rates were found in the validation cohort (table 2).⇓ Incidence rates were higher in women than in men and rose steeply with age. In the derivation cohort, highest fracture rates were observed among those aged 75 and over at baseline: the incidence rate was 12.11 per 1000 person years (11.84 to 12.38) in women and 4.35 per 1000 person years (4.15 to 4.57) in men. The corresponding figures for incidence of hip fracture among those aged 75 and over were 7.19 per 1000 person years (6.99 to 7.40) in women and 3.13 per 1000 person years (2.95 to 3.31) in men.
Tables 3⇓ and 4⇓ show the characteristics of patients with and without BMI, smoking status, and alcohol status recorded. Table 4 also shows the characteristics of patients with complete data for all three variables. There were differences in observed characteristics between those with and without missing data, supporting the assumption that data are missing at random, which supports the use of multiple imputation.
Table 5⇓ shows the results of the multivariate final Cox regression analysis for fracture and hip fracture in men based on a complete cases analysis and using multiply imputed data. Table 6⇓ shows the results for women. There was no evidence that the proportional hazards assumption was not valid in any of the models presented.
Risk factors for fracture in men
After adjustment for all other variables in the model, we found significant associations with overall risk of fracture and risk of hip fracture in men for the following variables, which were therefore included in both final algorithms for men: age, BMI, smoking status, alcohol use, rheumatoid arthritis, cardiovascular disease, type 2 diabetes, asthma, use of tricyclic antidepressants, history of falls, and liver disease. For consistency, we also included current use of corticosteroids in the final models for osteoporotic fracture and hip fracture, although it tended towards significance only for hip fracture (P=0.067), which might reflect the lower numbers of patients with a hip fracture. Variables that were not significant on multivariate analyses in men (and that were not therefore included in the final algorithms) included deprivation, gastrointestinal malabsorption, other endocrine conditions, and parental history of osteoporosis. There were also no significant interactions.
Table 5 shows the adjusted hazard ratios for the variables included in both final algorithms for men based on the multiply imputed data; fractional polynomial terms for age and BMI were also included in the algorithms. Patients with liver disease had a 196% increased risk of hip fracture after adjustment for all other variables. Similarly, patients with a history of falls had a 166% increased risk of hip fracture. Heavy smokers had a 70% increased risk of hip fracture compared with non-smokers; patients with very heavy alcohol intake had a 70% increased risk compared with non-drinkers. Compared with patients without each disease, patients with rheumatoid arthritis had an 81% increased risk of hip fracture; those with cardiovascular disease had a 24% increased risk; those with type 2 diabetes had a 38% increased risk; and those with asthma had a 31% increased risk. Patients taking tricyclic antidepressants had a 67% increased risk of hip fracture, and those prescribed steroids had a 22% increased risk.
Figure 1⇓ shows the estimated adjusted hazard ratios with the fractional polynomial terms for age and BMI in men. There were two age terms for the osteoporotic fracture outcome (age/10) and (age/10)2 and one term for BMI (bmi/10)−2. There was a single age term for hip fracture, which was (age/10)2 with two terms for BMI: log(BMI/10) and (log(BMI/10))2.
Risk factors for fracture in women
After adjustment for all other variables in the model, we found significant associations with overall fracture risk in women for the following variables: use of HRT, smoking status, use of alcohol, parental history of osteoporosis, rheumatoid arthritis, cardiovascular disease, type 2 diabetes, asthma, tricyclic antidepressants, use of corticosteroids, history of falls, menopausal symptoms, chronic liver disease, gastrointestinal malabsorption, and other endocrine disorders (table 6).⇑ There were also significant associations with age and BMI with fractional polynomial terms. The final algorithm for osteoporotic fracture in women included all of these variables. Table 6 shows the adjusted hazard ratios.
Some variables were significantly associated with overall risk of fracture but not with risk of hip fracture at the 0.01 level. These were use of HRT, menopausal symptoms, parental history of osteoporosis, malabsorption, and other endocrine disorders. The magnitude and direction of the coefficients were similar to those for overall risk of fracture so they were included in the final hip fracture model for consistency. The final algorithm for risk of hip fracture in women included age, BMI, smoking status, alcohol use, use of HRT, parental history of osteoporosis, rheumatoid arthritis, cardiovascular disease, type 2 diabetes, asthma, current use of tricyclic antidepressants, current use of corticosteroids, history of falls, menopausal symptoms, liver disease, gastrointestinal malabsorption, and other endocrine disorders. Table 6 shows the adjusted hazard ratios for variables included in both final algorithms for women based on the multiply imputed data.⇑
Figure 1 shows the estimated adjusted hazard ratios with the fractional polynomial terms for age and BMI in women. There were two terms for age for the fracture outcome, these were (age/10)−1 and (age/10)−1log(age/10). There was one fractional polynomial term for BMI, which was (BMI/10)−1. The terms selected for inclusion in the hip fracture model in women were (age/10)3 and (age/10)3log(age/10) and one term for BMI, which was (BMI/10)−2.
There were significant interactions between age and BMI and between age and parental history of osteoporosis for overall fracture in women. The more complex model including the age interaction terms, however, did not improve the model fit statistics and resulted in similar predicted scores compared with the simpler model so we selected the more parsimonious version as our final model.
Effect of hormone replacement therapy on fracture risk
Overall, 168 536 women (14.24% of 1 183 663) were prescribed HRT at baseline. Of these, 16 425 (9.75%) were prescribed low dose oestrogen equine; 30 598 (18.16%) were prescribed low dose non-equine; 7474 (4.43%) were prescribed high dose equine; 7205 (4.28%) were prescribed high dose non-equine; 23 430 (13.90%) were prescribed cyclical low equine; 28 427 (16.87%) were prescribed cyclical low dose non-equine; 2342 (1.392%) were prescribed cyclical high dose equine; 16 753 (9.94%) were prescribed cyclical high dose non-equine; 6765 (4.01%) were prescribed continuous low dose equine; 11 629 (6.90%) were prescribed continuous low dose non-equine; 14 186 (8.42%) were prescribed continuous high dose non-equine; and 3302 (1.96%) were prescribed tibolone.
We found significant associations between risk of fracture and some types of HRT (table 6).⇑ Women prescribed unopposed oestrogen HRT had a reduced risk of osteoporotic fracture. There was a 25% decrease (8% to 38%) in risk of fracture with high dose unopposed equine oestrogen, and a 19% decreased risk (5% to 30%) for cyclical high dose non-equine, both of which were significant at the 0.01 level. There was a borderline significant decrease in risk (0.01<P<0.05) with low dose unopposed equine oestrogen, low dose unopposed non-equine oestrogen, high dose unopposed non-equine oestrogen, and continuous high dose non-equine. There was a 24% increase in risk with continuous low dose equine HRT (6% to 45%). The direction and the magnitude of the risks associated with HRT use for the hip fracture outcome were similar, although they were not significant at the 0.01 level. Low dose unopposed equine oestrogen, however, had a borderline association with a 20% decrease in risk (2% to 34%, P=0.026).
Table 7 shows the results of the time varying analyses for duration of use of HRT and time since stopping HRT⇓. There was a trend over time with increasing duration of use of HRT, with no significant effect of HRT on overall risk of fracture in the first year of use, a 36% significant decrease in risk (8% to 40%) for one to two years of use, and a 45% decreased risk among women with 10 or more years of use (16% to 64%). There was no significant increase or decrease in risk within the first year after stopping HRT, although there was a 23% increase in risk (8% to 42%) one to two years after stopping and a 23% increase (12% to 34%) two to five years after stopping compared with women who had not been prescribed HRT. The pattern for the time varying analysis for the hip fracture outcome with duration of use of HRT was similar: the decrease ranged between a 58% reduction in risk after one to two years of use and a 31% reduction after two to five years. Although the confidence intervals were wider because of the smaller number of incident cases, there was no association with time since stopping HRT
Validation of the QFractureScores
Table 8⇓ shows the discrimination statistics for the QFractureScores for men and women for both fracture outcomes. The hip fracture algorithm had the best performance among both men and women. It explained 63.94% (62.12% to 65.76%) of the variation in women and 63.19% (60.81% to 65.57%) of the variation in men. The D statistic values were high for women (2.73, 2.62 to 2.83) and men (2.68, 2.55 to 2.82) and were over twice the magnitude of the corresponding D statistic results for osteoporotic fracture in men and women. The ROC values for hip fracture were high with values of 0.89 for women and 0.86 for men compared with 0.79 and 0.69, respectively, for the overall fracture outcome.
Table 9⇓ compares the mean predicted scores applying the QFractureScores with the observed risks at 10 years within each 10th of predicted risk to assess the calibration of the model in the validation sample. There was close correspondence between predicted and observed 10 year risks within each model 10th for overall fracture. For example, in the top 10th of risk, the mean predicted 10 year risk of fracture was 12.9% and the observed risk was 13.0%. The ratio of predicted to observed risk in this 10th was 0.99, indicating almost perfect calibration (a ratio of 1 indicates perfect calibration—that is, no underprediction or overprediction). Similar results were obtained for men with a ratio of 1.0 in the top 10th of predicted risk. For hip fracture there was also close correspondence, except for overprediction in the lowest 10th in both sexes and the third lowest in women. There was also 19% overprediction of risk among the top 10th of predicted risk for hip fracture in men, with 4% overprediction for women.
Table 9 also shows the decile cut-offs for men and women for each of the QFractureScores and the number and proportion of incident cases in each 10th. For women for the fracture outcome, 33.72% of incident cases fell in the top 10th and 31.89% for men. For women for the hip fracture outcome, 52.69% of incident cases fell in the top 10th and 54.20% of cases for men.
Validation of the fracture clinical risk score against FRAX
We calculated a hip fracture score using the FRAX algorithm for 454 499 women aged 40-85 and 424 336 men in the validation cohort. The D statistic for hip fracture for the FRAX algorithm was 2.26 (2.21 to 2.30) for women and 2.22 (2.14 to 2.30) for men. The FRAX algorithm explained 54.83% (54.43% to 55.12%) of the variation in women and 54.07% (52.10% to 53.65%) in men. The ROC value for the FRAX algorithm was 0.845 for women and 0.817 for men.
We recalculated the validation statistics for the QFractureScores restricting the population to patients aged 40-85. The D statistic for hip fracture was 2.37 (2.32 to 2.42) for women and 2.39 (2.30 to 2.48) for men. The QFractureScores algorithm explained 57.29% (57.18% to 58.09%) of the variation in women and 57.67% (56.78% to 58.57%) in men.
Table 10 and figure 2 show the calibration statistics for patients aged 40-85⇓ ⇓. FRAX tended to overpredict the risk of hip fracture within each 10th of risk, as shown by the ratio of the predicted risk to the observed risk.
Summary of main findings
A new risk prediction algorithm (the QFractureScore) for estimating the 10 year absolute risk of osteoporotic fracture and hip fracture in men and women shows some evidence of improved discrimination and calibration compared with the FRAX algorithm. We addressed some of research questions highlighted by NICE guidance,6 National Osteoporosis Guidelines,5 and the WHO.11 Our algorithm extends the age range and quantifies additional risk factors not fully taken account of in FRAX, such as falls, type 2 diabetes, cardiovascular disease, use of HRT, menopausal symptoms, and use of tricyclic antidepressants. We also validated the QFractureScores algorithms alongside the FRAX algorithm in a UK primary care population. Given that FRAX was developed in multiple selected cohorts from across the world, the marginally poorer performance is not unexpected.
Our new algorithm does not require any laboratory testing or clinical measurements. All the variables used within our algorithm will either be known to an individual patient or are collected as part of routine clinical practice and recorded within an individual patient’s primary healthcare record. It can be implemented within clinical computer systems used in primary care and used to stratify the practice population by risk on a continuing basis without the need for manual data entry. The QFractureScores could therefore act as a basis for a systematic population based programme to identify high risk patients for further assessment and support the implementation of evolving clinical guidelines in the UK.
These algorithms, like those that predict cardiovascular disease,10 36 rely on routinely collected data and have the advantage that they are well calibrated to the setting in which they can be used and have good levels of discrimination. Assuming the effectiveness and cost effectiveness of suitable interventions found in randomised controlled trials39 40 extend to unselected high risk patients from primary care, the QFractureScores could be used at a population level to identify patients at high risk of fracture who might benefit from more detailed assessment regarding potential interventions to reduce their risk. At the level of the individual patient, the algorithm can be used for self assessment in a web based calculator (www.qfracture.org), which is similar to the website for self assessment of cardiovascular disease derived from the same database (www.qrisk.org). It can help inform patients regarding their absolute individual risk so that they can have better information on which to base treatment decisions. Some interventions can prevent osteoporotic fracture in high risk patients. Daily supplementation with vitamin D3 and calcium reduces rates of hip fracture among high risk older patients in institutional care.41 Bisphosphonates reduce hip and other fracture rates in community dwelling older women aged under 80.42 Hip protectors seem to reduce the incidence of hip fractures in institutional care, provided that compliance and adherence are achieved.43
Hormone replacement therapy and fracture risk
Our study also provides some information on risks associated with different types and doses of HRT. We have shown an overall protective effect of HRT with a decreased risk with unopposed oestrogen. The effect is more marked for vertebral, distal radial, and hip fractures combined rather than hip fracture alone, which is probably because of lower numbers of patients with hip fracture by individual type of HRT. Our findings are consistent with those from the Women’s Health Initiative study and other studies.44 45 46 The loss of the protective effect of HRT on risk of fracture after stopping treatment is consistent with some46 47 but not other studies.48 Our study is larger than previous studies44 49 and has a longer follow-up.44 It also includes a wider age range than in other studies, in which the analysis is restricted to women 10 years beyond the usual age for the menopause and those who have more risk factors and fewer menopausal symptoms.44 Our study population is less likely to be biased than a clinical trial and so the results should generalise well to the general population of women in the UK deciding whether or not to start or continue HRT.
We validated the QFractureScore in an independent sample of general practices from which data had not been used to develop the algorithm. The QFractureScore has good discrimination (that is, ability to separate out those who did and did not subsequently develop a fracture) and explains over 60% of the variation for hip fracture. The D statistic, which is a measure of discrimination appropriate for survival type data, was substantially higher than in our cardiovascular disease algorithm50 and than that reported in some other studies using the D statistic.37 This increases the likelihood that the algorithm will accurately predict risk for an individual patient. This improved performance of the hip fracture algorithm compared with overall fracture and other outcomes is probably because of the strong association between risk of hip fracture and age and might also reflect some stronger associations with other risk factors and a more accurate diagnosis, thus reducing misclassification and potential underestimation of associations.
A potential limitation of our validation might be a degree of overoptimism because, although we used a completely physically discrete set of general practices for the validation, these practices use the same clinical computer system (EMIS) as those used to derive the algorithm. The EMIS system, however, is currently in use in 60% of UK general practices and so the QFractureScore is at least likely to perform well for over half of the UK’s population. A more stringent test of performance would involve practices using a different clinical computer system and such a study using the THIN database is currently under way. Validation of other disease algorithms in the THIN database have shown similar levels of performance to the validation undertaken in a one third sample of the QResearch database.16 36
Comparison with other risk prediction algorithms
While there is consensus on the need to develop more accurate estimates based on absolute as well as relative risk, there is no widely used standard method for assessing risk of hip fracture among primary care populations in the UK.51 Unlike FRAX, the QFractureScore can also be used in younger patients (including those aged 30-39) and can be used to estimate risk at one, two, five, and 10 years rather than just 10 years as with FRAX. Our new algorithm includes traditional variables included in the FRAX algorithm, such as age, smoking, alcohol, rheumatoid arthritis, parental history of hip fracture, and some secondary causes of osteoporosis. In addition, our algorithms include other risk factors as separate variables including type 2 diabetes, recorded history of falls, cardiovascular disease, liver disease, malabsorption, other endocrine disorders, use of tricyclic antidepressants, and type of HRT and menopausal symptoms in women. By including more detailed variables we hypothesise that the QFractureScores will be better at estimating risk for the individual patient by taking account of more information regarding the patient’s history. Our algorithm differs from FRAX because the QFractureScores predict risk among patients without a recorded history of previous fracture, whereas FRAX can be used to predict future risk of fracture in those already known to have a previous fracture.
Our analysis shows some improved discrimination of the QFractureScore compared with the FRAX algorithm for hip fracture, based on the D statistic, which had values that were 0.11 higher in women and 0.17 higher in men. A difference in D statistic of 0.1 or more can indicate an important difference in prognostic separation of survival curves between two risk algorithms.37 It is important to note, however, that the QFractureScore was developed in a primary care population and so would be expected to perform better in this setting.
One potential limitation of the QFractureScores compared with FRAX is that they don’t include measurement of bone mineral density, whereas FRAX has two versions, one with and one without this measurement. This potentially limits the value of the QFractureScores when bone mineral density is known but does mean that the score can be applied without the need for expensive and inconvenient tests to identify high risk patients who might then benefit from further investigation. Another potential limitation of the QFractureScores is that they are more complex than FRAX and might be more difficult to implement. The main use of the QFractureScores, however, is likely to be integrated into general practice clinical computer systems, as well as a web based calculator, where software can automatically extract the necessary variables, perform the calculations, and present the results to the clinician and individuals as appropriate. Also, a score that includes more variables is likely to better predict risk for an individual especially if that person has complex comorbidities. Future research should address whether a targeted approach to case finding based on the QFractureScores results in measurable clinical benefit.
Our algorithm also improves on the recent algorithm based on the Women’s Health Initiative cohort25 as it is estimated over a longer period than five years and includes additional variables, such as HRT, that are known to affect risk of fracture. It also has improved validation statistics; our ROC value for hip fracture in women was 0.890 (0.889 to 0.892), which is substantially higher than the value of 0.80 (0.77 to 0.82) reported in the Women’s Health Initiative.25
Generalisability and measurement of outcomes
A particular strength of our study is its prospective cohort design based on the analysis of a large representative population from a validated database. Our main outcome was hip, vertebral, or distal radial fracture recorded by a clinician on the clinical computer system. Similar studies using similar databases have confirmed the diagnosis of hip fracture on computerised general practice records in over 90% of cases.26 Our rates of hip fracture were similar to those obtained in other similar general practice databases such as GPRD.52 53 54 Our rates tend to be higher than some population based cohorts,25 46 which might be because of some under-ascertainment due to self reported events reported by questionnaire rather than by analysis of data prospectively recorded on the patients’ medical records, as in our study.
In our study, 38% of osteoporotic fractures were hip fractures, which is similar to the figure reported elsewhere.54 There might be under-ascertainment or under-recording of vertebral fracture as it is not always associated with pain and loss of function so the patient might not present to the general practitioner. Failure to identify and record a diagnosis on the computer when a patient is identified is possible and is part of the justification for having a targeted approach.55
Our study has good validity as our hazard ratios for risk of hip fracture with use of corticosteroids, alcohol, smoking, diabetes, and presence of a parental history of osteoporosis were similar to those found in other studies.20 22 23 25 45 56 57 In particular, our analysis supports a dose-response relation for current smokers with lower risks among former smokers.19 20 21 We also found no association between hip fracture and deprivation, which confirmed findings reported elsewhere.58
We have not included bone mineral density as the score is seldom recorded in general practice records, it is likely to be measured in only a selected high risk population, and it is costly to measure. The score could, however, be used to select high risk patients for measurement of bone mineral density as part of their assessment after identification of their high risk status for fracture. Previous fracture was not included as these people have already experienced the outcome of interest, and it could be argued that they are automatically at high risk and all such patients should be managed as high risk patients in a secondary prevention context.
Sources of bias and unmeasured confounding
As with all epidemiological studies, we need to consider potential sources of bias and confounding. Our predictor variables were recorded by clinicians on the clinical computer system before the diagnosis of fracture and so will not have been subject to recall bias. Some of our predictor variables are objective clinical diagnoses (such as type 2 diabetes, cardiovascular disease, or rheumatoid arthritis), others are directly measured values (such as BMI). Several variables, however, are those reported by patients, such as alcohol, smoking, and parental history of hip fracture. As such these variables might be subject to information or reporting bias (patients, for example, might not accurately report their alcohol intake or use of cigarettes or might not be asked about or be aware of a relevant family history). As the QFractureScores are intended for use within general practice clinical computer systems, however, similar conditions will apply and so the variables incorporated in the algorithm have intrinsic face validity. We used the entire eligible population registered with a random two thirds sample of practices contributing to the QResearch database from England and Wales. Consequently, the population is unlikely to be affected by selection bias, in contrast with purpose designed clinical cohorts or clinical trials.46 59
We did not have objective measurements of some other factors that might affect fracture risk, such as physical activity and ethnicity. The former is not reliably recorded on clinical computer systems, and analysis of the latter was limited in this study because of low numbers of elderly patients with fractures from different ethnic groups.
Another potential limitation of our study is that some patients had missing values for alcohol use, BMI, or smoking status. We therefore used the technique of multiple imputation to substitute missing values rather than exclude these patients as this is a less biased approach that makes the most efficient use of available data.32 34 For other variables in the algorithm we assumed that if there was no recorded value of a diagnosis, prescription, or family history then the patient did not have that exposure, which might have led to some misclassification.
Our new risk prediction algorithms for osteoporotic fracture and hip fracture do not require laboratory measurement and can be readily used in primary care or for individual self assessment (www.qfracture.org). The algorithms potentially improve on other algorithms by including additional variables not included in traditional scores. The validation statistics for the hip fracture algorithm suggest that the models are likely to be at least as effective at identifying patients at high risk of hip fracture within primary care as the FRAX algorithm. Further validation studies are needed to test the performance of these algorithms in independent populations that are representative of the setting where the algorithms are likely to be used.
What is already known on this topic
Osteoporotic fracture is a major cause of morbidity, and interventions exist that can help reduce risk of fracture
Several international guidelines suggest a targeted approach for identifying high risk patients likely to benefit from interventions based on a 10 year absolute fracture risk
Risk prediction algorithms tend to perform best when they are developed in the clinical setting in which they will be applied
What this study adds
These new risk prediction algorithms (QFractureScores) for osteoporotic fracture and hip fracture do not require laboratory measurement and so can be used in primary care or for individual self assessment (www.qfracture.org)
The new algorithms include additional variables and were developed in and could be used in large representative primary care populations
The validation statistics, especially for the hip fracture algorithm, suggest that the QFractureScores are likely to be effective at identifying patients at high risk of fracture within primary care in the UK and showed improved performance compared with FRAX
Cite this as: BMJ 2009;339:b4229
We acknowledge the contribution of EMIS and EMIS practices contributing to the QResearch database.
Contributors: JHC designed the study, obtained approvals, reviewed the literature, prepared the data, developed and tested the algorithms, undertook the primary analysis and interpretation, and wrote the first and subsequent draft manuscript. CC contributed to the analysis, development and testing of the algorithms, and interpretation and drafting of the manuscript. JHC is the guarantor.
Funding: This study was funded by David Stables (medical director of EMIS) as part of a larger study examining risks and benefits of HRT.
Competing interests: JHC is codirector of QResearch, a not-for-profit organisation that is a joint partnership between the University of Nottingham and EMIS (leading supplier of IT for 60% of general practices in the UK). EMIS may implement the QFractureScore within its clinical system. JHC is also director of ClinRisk and CC is a consultant statistician for ClinRisk. ClinRisk produces software to ensure the reliable and updatable implementation of clinical risk algorithms within clinical computer systems to help improve patient care. This work and any views expressed within it are solely those of the co-authors and not of any affiliated bodies or organisations.
Ethical approval: The proposal was approved by the QResearch Scientific Board and is therefore approved by the Trent multicentre research ethics committee.
Data sharing: The Read codes used to define the outcomes and further information on multiple imputation are available from the authors. The algorithms have been published as open source software, which is available from www.qfracture.org.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.