Intended for healthcare professionals

Rapid response to:

Paper

Childhood predictors of self reported chronic fatigue syndrome/myalgic encephalomyelitis in adults: national birth cohort study

BMJ 2004; 329 doi: https://doi.org/10.1136/bmj.38258.507928.55 (Published 21 October 2004) Cite this as: BMJ 2004;329:941

Rapid Response:

Statistical and theoretical independence, effect modification, father’s health etc.

In the article, The Authors report results from a cohort study
exploring risk factors that may have a role for development of CFS/ME. The
study appears to have problems with the unit of analysis. They initially
state that family related factors, operating in the genetics or in the
environment have been suggested to influence disease development. In the
analysis however, they mainly test the association between maternal health
and the risk of self-reported disease. The results of an interview with
both parents at 10 years age are reported almost parenthetic in the
discussion section. The maternal risks are used to test risks reported
earlier in other studies, which does not fully explain the omission of
reports from both parents, interviewed at 10 years age, from the results
section. The weakness of the study what regards the theoretical
assumptions, in particular the behavioural factors, is also revealed in
the way the Authors use p-values and significance testing as a selection
procedure for inclusion of variables in the multiple logistic regression.
There is no explanation of how the maternal health links to the
development of disease by children and why the father’s health is not
associated with later onset of the disease in adolescent or adult off-
spring. The initial statement that family is an appropriate unit of
analysis is not executed in the statistical analysis.

The use of p-values means that Authors transit from the initially
stated, theoretical hypothesis testing to numerical methods as base for
inclusion or exclusion of risk factors in their regression models.
Bivariate associations are not of interest when performing a multivariable
regression. Methods such as forward or backward selection has been widely
criticised by statisticians not only theoretical grounds but also due to
computational problems, in particular flowed standard error of the mean
affecting confidence intervals etc. There is also a problem with external
validity when applying results from selected models to other populations
of relevance for further hypothesis testing (Mantel, 1970; Sun et al,
1996; Sribney link, Derksen and Keselman, 1992, etc.).

The ground for my critique here is mainly the discrete use of various
variables in the analysis, solely on the ground that they were earlier
suggested as explanatory, without any real behavioural theory as a base
for inclusion or exclusion. In addition, in the regression models the
Authors include measurements made by various time points during the
subjects life, which pose problems with statistical independence, probably
associated mutually sine behavioural factors often are affecting the life
of an individual during long periods of time and not occasionally (problem
of repeated measurements). They do not report interaction testing or
effect modifications, which mean that the lack of statistical independence
may bias their final results.

Confidence intervals in diagnostics and performance assessment

Another reflection is also here in place, as in the behavioural
scales the Authors use 1 SD above or under the sample mean as indicative
of illness, disorders or problems. This means that in many cases the
values above or under this cut-off not significantly will differ from the
mean at 95%CI. In the previous issue of the BMJ, the Keogh B et al discuss
measurement of outcomes (cardiothoracic surgery) from a professional
perspective. In order to secure that no “innocent” individual surgeon will
be accused for poor performance, the profession agreed to use 99.99%CI,
that is 4SD, for analysis of aggregated three years values per surgeon.
This threshold means that “an outlier is likely to be real – there is less
than 1 in 10 000 chance that the society would assert that any particular
surgeon with average case mix did not meet its standard”. The reason to
use 99.99%CI for safety purposes in cardiac surgery is to take into
account the variation due to heterogeneity of individual patients.
Obviously, another view on safety is applied by paediatricians and
psychiatrists relying of self-report of patients and relatives that also
constitute an important source of information in operational diagnostics.
There is an obvious discrepancy between thresholds scales used for
diagnostic purposes and those used in the context of performance
assessment. What does it say on how the medical profession views the
safety of patients? A diagnosis of psychiatric or behavioural disorder may
“kill” a person but not the same quick way as surgery. The mortality due
to false diagnosis occurs slowly, over long periods of time in opposite to
surgery and the patients demonstrate probably more heterogeneity what
regards behaviour as compared to biological variation. The limits of
99.99%CI “become very wide at lower volumes during shorter period of time
than three years, opening the way to accusations of professional
protectionism for surgeons with lower volumes.” – Who protects patients
from diagnosis labelling and unnecessary interventions?

Reference
Keogh B, Spiegelhalter, Bailey A, Roxburgh J, Magee P, Hilton C. The
legacy of Bristol: public disclosure of individual surgeons’ results. BMJ
2004;329:450-4.

Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to
screen risk factors for use in multivariable analysis. J Clin Epidemiol
1996;49(8):907-16.

Sribney B. What are some of the problems with stepwise regression?
http://www.stata.com/support/faqs/stat/stepwise.html, accessed 7/24/98.

Derksen S, Keselman HJ. Backward, forward and stepwise automated
subset selection algorithms: frequency of obtaining authentic and noise
variables. Br J Math Stat Psychol 1992;45:265-82.

Mantel, N. Why stepdown procedures in variable selection? N
Technometrics 1970;623-5.

Competing interests:
None declared

Competing interests: No competing interests

28 October 2004
Grazyna Teresa Adamiak
PhD, MA, MH&W
117 50 STOCKHOLM