It is my experience that Benford’s law is of very limited
use in the detection of fabrication or falsification in medical research data.
It is of unquestioned value in financial fraud, including health claims data,
but research data are of a different nature. For example systolic blood
pressures in a very large number of patients in a typical randomised trial may
all have the first digit as 1. Their cholesterol values may have no first
digits of 1 and almost certainly none of 2. These are clear departures from
Benford’s law but are definitely not examples of fabrication of falsification.
If all the variables are taken together then still the
pattern does not conform to Benford’s law. In neither of the trials studied in
our paper do any of the variables taken singly show even a remote fit to the
distribution suggested by Benford. Taken together the 5 variables in common
between the trials do not show such a fit, so that lack of fit to Benford’s law
is no evidence of fabrication or falsification in this situation.
For example from the MRC trial, the pattern for the five
variables considered is shown in the table;
First digit
Frequency
%
Benford’s %
1
1,774
42.34
30.1
2
131
3.13
17.6
3
4
0.1
12.5
4
87
2.08
9.7
5
333
7.95
7.9
6
537
12.82
6.7
7
547
13.05
5.8
8
431
10.29
5.1
9
346
8.26
4.6
The requirement is that the data must have a range that
covers at least two orders of magnitude. This often applies in financial data
but only rarely, if ever, in medical research data. If a very large number of
variables were taken together then the fit will be rather better, but the
problem is that the selection of variables can affect the distribution of first
digit even when many are chosen. The argument can always be that not enough
variables have been considered for Benford’s law to be applicable.
Incidentally, as with many such things, Newcomb, not Benford,
was the original discoverer of the law.
Rapid Response:
Response on Benford's Law
It is my experience that Benford’s law is of very limited
use in the detection of fabrication or falsification in medical research data.
It is of unquestioned value in financial fraud, including health claims data,
but research data are of a different nature. For example systolic blood
pressures in a very large number of patients in a typical randomised trial may
all have the first digit as 1. Their cholesterol values may have no first
digits of 1 and almost certainly none of 2. These are clear departures from
Benford’s law but are definitely not examples of fabrication of falsification.
If all the variables are taken together then still the
pattern does not conform to Benford’s law. In neither of the trials studied in
our paper do any of the variables taken singly show even a remote fit to the
distribution suggested by Benford. Taken together the 5 variables in common
between the trials do not show such a fit, so that lack of fit to Benford’s law
is no evidence of fabrication or falsification in this situation.
For example from the MRC trial, the pattern for the five
variables considered is shown in the table;
First digit
Frequency
%
Benford’s %
1
1,774
42.34
30.1
2
131
3.13
17.6
3
4
0.1
12.5
4
87
2.08
9.7
5
333
7.95
7.9
6
537
12.82
6.7
7
547
13.05
5.8
8
431
10.29
5.1
9
346
8.26
4.6
The requirement is that the data must have a range that
covers at least two orders of magnitude. This often applies in financial data
but only rarely, if ever, in medical research data. If a very large number of
variables were taken together then the fit will be rather better, but the
problem is that the selection of variables can affect the distribution of first
digit even when many are chosen. The argument can always be that not enough
variables have been considered for Benford’s law to be applicable.
Incidentally, as with many such things, Newcomb, not Benford,
was the original discoverer of the law.
Competing interests:
None declared
Competing interests: