# Cox proportional hazards regression

BMJ 2013; 347 doi: http://dx.doi.org/10.1136/bmj.f4919 (Published 09 August 2013) Cite this as: BMJ 2013;347:f4919- Philip Sedgwick, reader in medical statistics and medical education

- p.sedgwick{at}sgul.ac.uk

Researchers measured the effect on one year mortality of secondary drug prevention for patients with stroke in routine primary care. They used a cohort study design, which incorporated patient data from the health improvement network primary care database. Participants were 12 830 patients aged 50 years or more from 113 general practices. They had all had a stroke between 1995 and 2005 and survived the first 30 days after the stroke. Secondary drug prevention was defined as being prescribed either antihypertensives plus lipid lowering drugs plus antithrombotics or antihypertensives plus lipid lowering drugs.1

Cox proportional hazards regression was used to investigate one year mortality, defined as death from any cause from 31 days after the stroke and within the first year. Univariable and multivariable analyses between one year mortality and secondary drug prevention, sex, socioeconomic deprivation, and age group were performed (table⇓). Socioeconomic deprivation was measured by the Townsend score, which assesses socioeconomic deprivation in families and includes measurement of employment status, overcrowding, car ownership, and owner occupation status.

On average, mortality within the first year was 5.7% for patients receiving secondary drug prevention compared with 11.1% for patients not receiving treatment. Secondary drug prevention was associated with a 50% reduction in mortality risk (adjusted hazard ratio 0.50, 95% confidence interval 0.42 to 0.59).

Which of the following statements, if any, are true?

a) The outcome variable for the Cox proportional hazards regression was continuous

b) The hazard ratio predicts the relative proportions of patients who will have died in the categories of each variable at the end of follow-up

c) It was assumed that for each category of the explanatory variables the hazard of death was constant during follow-up

d) It can be concluded that secondary drug prevention was independently associated with one year mortality

## Answers

Statements *a* and *d* are true, whereas *b* and *c* are false.

The aim of the study was to investigate whether secondary drug prevention for patients with stroke in routine primary care affected one year mortality, defined as all cause mortality from 31 days after the stroke and within the first year. A Cox proportional hazards regression model was used.

Cox proportional hazards regression is similar to other regression methods described in previous questions.2 3 4 The method investigates the association between a dependent variable and one or more predictor variables simultaneously. The outcome variable is “time to event data” or “survival data.” Survival data have been described in a previous question and comprise the time it takes each patient to reach an endpoint. In the example above, the outcome was length of time from 31 days after the stroke until death from any cause. The length of follow-up was one year. The outcome is continuous (*a* is true), and in that respect Cox proportional hazards regression is similar to simple linear regression and multiple regression analyses. However, the distinguishing feature of survival data is that typically some participants do not experience the endpoint before the end of follow-up. In the example above, not all participants would have died within one year of their stroke. The survival time of these patients would have to be right censored, as described in a previous question.5 If a patient died within a year of the stroke, his or her survival time would be described as exact.

The predictor variables in a Cox proportional hazards regression model, sometimes referred to as explanatory variables, can be any mixture of continuous, binary, or categorical variables. In the example above, the explanatory variables were all categorical or binary and included secondary drug prevention, sex, socioeconomic deprivation index, and categorised age. It was assumed that the observations were independent of each other—that is, each participant had only one observation of the dependent and explanatory variables.

The results of a Cox proportional hazards regression are presented as hazard ratios. Hazard ratios, sometimes called relative hazards have been described in a previous question.6 When Cox proportional hazards regression investigates the association between a dependent variable and one predictor variable it is referred to as univariable. When there are two or more predictor variables it is referred to as multivariable. The hazard ratios shown in the “univariable models” column are unadjusted, not having been adjusted for other explanatory variables. They are a result of a series of regression models involving the outcome of one year mortality and each risk factor separately. The hazard ratios shown in the “multivariable model” column resulted from a single Cox proportional hazards regression model, where each risk factor had been adjusted for confounding by the other factors—that is, in effect when all other explanatory variables are held constant.

For each explanatory variable, whether in the univariable or multivariable analysis, the hazard ratio has a reference category, as indicated by the number 1 in the hazard ratio columns. Sometimes the reference category is indicated by (1) instead. The other categories of the variable are compared against the reference category to derive the hazard ratio(s). The hazard ratio for a particular category is the hazard of death from any cause during follow-up for that category divided by the hazard of death from any cause for the reference category. The hazard of death is the probability of death in a time interval divided by the length of the interval, and it therefore represents the rate of death. The study period was divided into very short time intervals, so the hazard of death represented the instantaneous rate of death at any time during follow-up. Hence, the hazard ratio represents the relative instantaneous risk of death during follow-up. For example, the adjusted hazard ratio for secondary drug prevention relative to no prevention was 0.50. Therefore, at any time during follow-up those patients receiving secondary drug prevention had half the risk of death of those patients not receiving prevention. The hazard ratio of 0.5 does not predict the relative proportions of patients who will have died in the secondary drug prevention categories at the end of follow-up (*b* is false) but the relative instantaneous risk of death during follow-up.

The hazard or rate of death for any category may not be constant during the study period (*c* is false). However, when deriving a hazard ratio, it is assumed that the ratio of the rates of death between the two categories is constant—that is, that they are proportional during follow-up. The assumption of proportional hazards underlies the inclusion of any variable in a Cox proportional hazards regression model.

For each hazard ratio the 95% confidence interval for the population hazard ratio is presented, providing an interval estimate for the population parameter. A previous question explained that, if a 95% confidence interval for a population hazard ratio excluded unity, the test of the statistical null hypotheses of no difference in hazard between the categories of the explanatory variable will be rejected in favour of the alternative at the 5% level.7 If the association between an explanatory variable and the outcome is significant after adjusting for confounding, then the explanatory variable is said to be independently associated with the outcome. The association of secondary drug prevention with one year mortality was statistically significant after adjusting for the other explanatory variables, the 95% confidence interval for the population hazard ratio not straddling unity (hazard ratio 0.50, 0.42 to 0.59). Therefore, secondary drug prevention was considered to be independently associated with one year mortality (*d* is true).

Presenting the unadjusted and adjusted hazard ratios side by side is good practice because it allows the reader to determine the effect of confounding. In the example above, the greatest effect of confounding was seen for sex, where adjustment reversed the direction of the apparent association with one year mortality. Before adjustment, women were at greatest risk of death from any cause during follow-up when compared with men (hazard ratio 1.22), whereas after adjustment their risk was reduced (hazard ratio 0.86). Both hazard ratios were statistically significant at the 5% level. Furthermore, after adjustment for confounding, a significant hazard ratio became non-significant for socioeconomic deprivation index 2, whereas a non-significant hazard ratio became significant for socioeconomic deprivation index 5. Confounding was therefore suggested in the association between these categories of the explanatory variable and the outcome variable of one year mortality. However, the difference in size between the unadjusted and adjusted hazard ratios was small, which suggests that the extent of confounding was minimal.

The authors also tested whether there was an interaction between secondary drug prevention and the other explanatory variables to investigate potential differential treatment associations with one year mortality between the categories of a risk factor. They found no evidence that the association between secondary drug prevention and mortality differed between the sexes or age groups. There was some evidence of a modification by the fifth index of the Townsend score. However, there was no trend across the socioeconomic deprivation indices.

## Notes

**Cite this as:** *BMJ* 2013;347:f4919

## Footnotes

Competing interests: None declared.