Confidentiality of personal health information used for researchBMJ 2006; 333 doi: https://doi.org/10.1136/bmj.333.7560.196 (Published 20 July 2006) Cite this as: BMJ 2006;333:196
All rapid responses
It is good to see interesting growing in the privacy provided by anonymisation. However, this problem - known to computer scientists as "inference security" - is very much harder than it looks. It has been the subject of study for almost thirty years, initially by people working with census data; most recently it has been in the news after AOL published search data for 658,000 users. Although their names had been removed, much personal data leaked.
Most supposedly "anonymous" medical records in the UK are not anonymous at all, as they contain postcode and date of birth. This identifies over 97% of patients - those who escape are, broadly speaking, young identical twins plus some soldiers, students and prisoners. Even postcode plus year of birth will identify the great majority of people. Many records also contain an NHS number; in this context it makes little difference whether the NHS number is encrypted or not. In other words, the difference between "anonymous" and "pseudonymous" data is often nugatory.
There are applications where anonymisation can be made to work; a good example was the IMS Health system that became the subject of the Source Informatics case. That system was tightly designed for a single purpose, and its design security was carefully reviewed. In general applications, however, it is very much harder. The anonymity mechanism proposed for the Icelandic health database (which consisted essentially of replacing names with an encrypted social security number) was nowhere near sufficient (declaration of interest - I reviewed the Source Informatics system for the BMA and the Idelandic system for the Icelandic Medical Association).
The Source Informatics judgment, and data protection law, allow researchers to dispense with consent if records are properly anonymised. However, for most practical research purposes, they cannot be. Researchers should either learn enough about inference security to protect privacy adequately in their application, or where that's impossible either obtain consent or go through PIAG to invoke section 60 of the Health and Social Care Act 2001. Simply removing the names and addresses is almost never going to be enough.
Callous attitudes towards laboratory animals in the 1970s spawned the animal liberation movement, and the cavalier treatment of pathology specimens led to the Alder Hey scandal. British medical researchers should be wary of impatient attitudes towards data protection, which have the potential of dealing a further blow to the standing and the effectiveness of research.
Links: see here for a description of the Source Informatics system, and our security group blog for recent comments on inference security and health together with links to resources on inference security.
The author advised the BMA on safety and privacy of clinical information systems from 1995-6 and the Icelandic Medical Association in 1998.
Competing interests: No competing interests