Intended for healthcare professionals

CCBYNC Open access

Rapid response to:


Treatment for acute anterior cruciate ligament tear: five year outcome of randomised trial

BMJ 2013; 346 doi: (Published 24 January 2013) Cite this as: BMJ 2013;346:f232

Rapid Response:

Re: Treatment for acute anterior cruciate ligament tear: five year outcome of randomised trial

I am grateful, of course, that Wagner and Ranstam quote my paper (1) on Rasch models, but I am less than grateful that they imply that the paper supports their view that the Rasch model “is not a tool for assessing validity”. I suppose the problem is that I have not made myself clear enough, so for the sake of clarity, and because it is always useful to “be on the same page” when we discuss things, I would like to clarify my views.

The Rasch model is an item response theory (IRT) model that makes five assumptions concerning responses to items in a summated scale. First, items are assumed to be unidimensional in the sense that responses to items depend systematically on a single latent variable and not on any other manifest or latent variables. The realizations of the latent variable appear as so-called person parameters in the conditional distribution of item responses, given the latent variable. The second assumption is that item responses are monotonic in the sense that the expected scores on items are monotonically non-decreasing functions of the person parameter. The third is that items are conditionally independent, given the latent variable. Psychometricians refer to this assumption as an assumption of local independence. The fourth assumption is that items are conditionally independent of other variables, given the latent variable. If this assumption is true, psychometricians say that there is no differential item functioning. The final assumption, and the one that sets Rasch models apart from other IRT models, is that the total score summarizing responses to items is statistically sufficient for the person parameter.

The first four assumptions are often referred to as assumptions of internal construct validity in order to set them apart from assumptions of external construct validity, and the many different types of criterion validity (including known-group validity and clinical validity). These assumptions imply that measurement depends on nothing but the latent variable and that measurement is unconfounded. It also implies that applications of the measurements during, for instance, statistical analyses examining the association between the latent variable and other variables are without confounding. Paul Rosenbaum (2) refers to the first four assumptions as requirem¬ents of criterion-related construct validity because it follows from these assumptions (and in particular from the assumption of monotonicity) that all requirements of criterion validity are satisfied so that evidence against criterion validity is also evidence against criterion related construct validity.

The Danish statistician Erling B. Andersen (3) has shown that the Rasch model is the only IRT model that not only satisfies the requirements of criterion-related construct validity, it also yields a sufficient score. Since sufficiency is an extremely strong property providing both superior measurement and better statistical procedures for estimation of parameters and for tests of fit compared to other IRT models, one is well-advised to test whether items of an existing scale meet the requirements of Rasch models. Also, one should attempt to write items for Rasch models when developing a new scale. However, if one is only concerned with validity and not about the other aspects addressed by Rasch models, then one should instead attempt to write items for other types of IRT models, or for models for confirmatory factor analysis. It is important to stress that claims of validity require solid evidence supporting all the assumptions of criterion-related construct validity, whether one is using Rasch models or one of the other types of models.

Assessments of criterion validity (including known-groups validity and clinical validity) are useful but not nearly enough to support claims of validity. They are useful because it follows from the assumption of monotonicity that evidence against criterion validity is also evidence against criterion-related construct validity. Evidence supporting criterion validity is not evidence that measurements – and therefore also applications of measurements – are not confounded. Evidence supporting criterion validity is therefore no excuse for not addressing the issue of construct validity.

Returning to the Rasch model and the comments by Wagner and Ranstam, it should be clear that items fitting a Rasch model satisfy the requirements of criterion-related construct validity. It is therefore wrong when they say that the Rasch model is not a tool for validity. Whether the results of a specific Rasch analysis can be taken as evidence of validity depends on the analyses in question. I am well aware that many examples of Rasch analyses merely consider so-called item-fit statistics, and I would agree that such analyses cannot provide evidence of validity. A careful Rasch analysis on the other hand (see for instance the recent book on “Rasch Models in Health” (4)) addresses all the assumptions on which the model is founded and supports claims of validity, when and if the analyses do not reveal evidence against unidimen¬sionality, monotonicity, local independence, and no DIF. Finally, should evidence against the Rash model emerge, a careful item analysis addressing all the assumptions of the Rasch model will make it possible to distinguish between validity problems and problems caused by inept item writing. As for the Rasch analysis described in the Comins et al. paper (5), I can vouch for the fact that I assisted in performing that analysis, that the analysis addressed issues relating to both validity and the fit to the probability functions of the Rasch model, and that the interpretation of the results of the psychometric assessment are accurate.

My remark in my own paper (1) where I state that it is a common misunderstanding that the main purpose of a Rasch analysis is to check that the total score is a valid measure of the latent trait was a remark meant for users of Rasch models and definitely not a remark suggesting that Rasch models could not be used to assess validity. It was motivated by the fact that I have seen too many papers where users of Rasch models do not appear to understand the notion of sufficiency and therefore do not appreciate that evidence supporting a Rasch model is evidence that goes beyond the common definitions of validity. Therefore, it is not surprising that people who have never used Rasch models also miss this point.

(1) Kreiner, S. (2007) Validity and objectivity. Reflections on the role and nature of Rasch models
Nordic Psychology, 59, 268-298

(2) Rosenbaum, P. (1989). Criterion-related construct validity. Psychometrika, 54, 625-633.

(3) Andersen, E.B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69-81.

(4) Christensen, K.B., Kreiner, S. & Mesbah, M. (eds.) (2013) Rasch Models in Health. London: ISTE and John Wiley & Sons, Inc.

(5) Comins, J., Brodersen, J. Krogsgaard, M. & Beyer, N. (2008). Rasch analysis of the Knee injury and Osteoaarthritis Outcome Score (KOOS): a statistical re-evaluation. Scand J Med Sci Sports, 18, 336-345

Competing interests: No competing interests

21 March 2013
Svend Kreiner
Dept. of Biostatistics, Univ. of Copenhagen
Oster Farimagsgade 5, B, DK-1014, Copenhagen K, Denmark