Fallacies in Predicting CVD events.
Congratulations to Professor Hippisley-Cox and her team for breaking the shackles of Framingham. My own calculations suggest there is still more than fine tuning to be done though, which in turn raises some other basic issues.
Using discriminant analysis is an efficient way to predict a dichotomous criterion,without requiring transformations. I used the Nottingham teams predictor variables with a sample from New Zealand to produce the results below:
True Negatives 65.7%, True Positives 66% (a noticeable improvement on Framingham's true positives of 54.5% although its true negatives were 75.9%) but the classification/misclassification ratios yielded true to false negative predictions of 73.6 to 1 and true to false positive predictions of 1 to 19.6. Note the change of direction. Both Framingham and Nottingham predict who will not have a CVD incident with only a small error but predict who will have a CVD incident with a lot of error.
There are other variables that can be used to produce similar results. I included diabetes as a variable and Maori because of a known ethnic factor here in NZ and then did a stepwise discriminant calculation. In the presence of these new variables both cholesterols were eliminated as predictors as were blood pressure and smoking but it yielded a true positive prediction of 65.4% with a misclassification rate of 1 to 19.2. No real difference from Nottingham. One has to recognise the canonical correlation here is only 0.141 and because of the size of the sample almost any difference from zero is significant but only 2% of the total variance is accounted for(R squared or 1 - Wilks Lambda = 0.02. Putting it another way as there were only two categories to be predicted (have or have not had CVD incident)the baseline probability is not zero but is 0.5 with a true positive misclassification rate of 1 to 19.2!. Thus the advantage of using Framingham, Nottingham or Discriminant Function is only around 15% above chance and with a misclassification rate similar to chance. Furthermore when investigators use ROC curves to confirm their findings they seem to be unaware that the predictive figure given is a weighted average of the success with both true negatives and true positives, this is alright if the numbers or percentages are the same for both groups. But when the incidence of CVD incidents in the population is so relatively small the ROC results often don't represent the results for true positives at all.
Now the elimination of cholesterols, blood pressure and smoking as predictors by the introduction of diabetes and an ethnic factor suggests that what small correlation there is results from the processes of an illness as a specfic factor and something more holistic associated with ethnic differences rather than being specifically tied to cholesterol etc.. And this is only suggestive. What all three approaches show is that we really dont have an explanation here for the cause of cardiovascular events. At best what we use are some pre-conditions and fairly loose ones at that.
Given the weakness of the evidence it seems uneccessary waste, discomfort for some of our patients and perhaps even danger for others, to base a national prevention scheme on such a flimsy structure.
I am most grateful to Professor Rod Jackson and his team for giving me access to their Predict database. The analysis package was SPSS. Author's email: email@example.com .
Competing interests: None declared
Competing interests: No competing interests