Confidentiality of personal health information used for research
BMJ 2006; 333 doi: https://doi.org/10.1136/bmj.333.7560.196 (Published 20 July 2006) Cite this as: BMJ 2006;333:196Data supplement
Posted as supplied by the authors
Key tips for managing the confidentiality of personal health data within a clinical research project
TIP 1
A research project should clarify the basis on which health record information is being used:
- as part of clinical audit or health service management;
- through explicit consent, e.g. as part of a clinical trial;
- unconsented (e.g. secondary use);
- to be held as a repository for other researchers to use.
In case 1, identified data may be used within the healthcare team but not disclosed to any external party.
In case 2, identified data may be used, but only disclosed within the terms of the consent that has been granted.
In case 3, the data should ideally be anonymised or pseudonymised as far as is practicable while preserving the integrity and usefulness of the data.
In case 4, there needs to be clear definition of what type of research might or might not be supported, and how access will be controlled, regulated, and reviewed.
There needs to be public information available on the project, its use of data, and how it will be protected, in order to meet the ‘fair processing’ principles of the 1998 Data Protection Act. Depending on the consent process, this needs to be given or made available to all from whom consent is sought, or to those who might be or become data subjects if consent is not specifically sought (so they can at least object to their data being used).
The records of deceased patients are not covered by Data Protection legislation, but good practice would suggest that confidentiality should still be respected for these records. Data should still be held securely, de-identified
TIP 2
All research groups utilising personal or de-identified data ought to establish a formal policy for how such data will be handled during and after the study period. This should in summary specify:
- WHY the data may be accessed/used;
- WHO is permitted to see the information;
- WHAT classes of data may be accessed;
- HOW the data is protected and accessed;
- until WHEN they may be accessed, and what should happen to the data afterwards.
- from one data source, longitudinally (e.g. to study disease progress or outcomes);
- from multiple sources (e.g. to combine hospital and GP records on each subject);
- within families (e.g. to study inherited conditions).
- Show only the data that is needed by the user (especially don’t show social identifiers just because they are there)
- Use ‘data masking’ (e.g. show age band, rather than the full date of birth)
- Show aggregate data rather than individual data values
- Data perturbation, where representative data-values are used, maintaining the statistical profile of the data-set, but not the individual record values – needs to be used with care
- Use/avoidance of proxy data – using ‘social deprivation’ indicators rather than providing postcode
- Use of system identifiers (unique values) rather than social identifiers (e.g. name & address) wherever possible
- Encryption of identifiers to maintain ‘uniqueness’ but prevent cross-linking
The policy must clarify how these answers relate to any consents that have or are being obtained, and which aspects should be included in any participant information leaflets etc.
TIP 3
Before designing the research database and ways in which data might be pseudonymised, research groups need to decide if there will be a requirement for multiple datasets to be linkable:
- from one data source, longitudinally (e.g. to study disease progress or outcomes);
- from multiple sources (e.g. to combine hospital and GP records on each subject);
- within families (e.g. to study inherited conditions).
The research group will need to determine if anonymisation of the clinical data will be satisfactory, and in particular if masking or removal of potentially disclosing data items will reduce the utility of the data too much. If anonymisation is not suitable, or if linkage or re-identification might be required, consider pseudonymising (key-coding) the database instead.
TIP 4
Consider if the data can be divided into personal and contact information and clinical and scientific information. If so, try to partition the data to separate these, with different access privileges, and if possible stored separately.
Similarly, consider if the database can be partitioned to separate more sensitive from less sensitive clinical data.
Consider if key-coding is a useful option for your research.
If so, you will need to decide who should be the key holder (an internal or external party).
Define the situations in which key reversal (re-identification) should take place, and who might govern and audit such decisions, including what happens to the keys and the data after the project has ended.
TIP 5
An example sensitivity classification might categorise data items on their ability to uniquely identify an individual (i.e. how many persons will share the same value in the database):
5. (most sensitive) subject unique identifiers, core demographic descriptors and identifiers of healthcare professionals, lab numbers etc.
4. descriptions potentially rich in social context, including narratives such as clinic letters and discharge summaries
3. identifying photographs, video, voice recordings, genomic and genetic sequences, and quite disclosing information such as dates (e.g. date of birth)
2. coded clinical data (e.g. diagnoses, observations, symptoms), measurements and test results
1. (least sensitive) masked or aggregated data (e.g. age-band, average body weight during 2005)
TIP 6
If the number of people needing access to the data is large, and some of the data items being stored are quite disclosing, consider if protection can be provided by limiting the access of some users to aggregate result sets instead of raw data, and/or if some data items can be masked on extraction.
Where possible, and provided that costs are not prohibitive,use privacy-enhancing techniques:
- Show only the data that is needed by the user (especially don’t show social identifiers just because they are there)
- Use ‘data masking’ (e.g. show age band, rather than the full date of birth)
- Show aggregate data rather than individual data values
- Data perturbation, where representative data-values are used, maintaining the statistical profile of the data-set, but not the individual record values – needs to be used with care
- Use/avoidance of proxy data – using ‘social deprivation’ indicators rather than providing postcode
- Use of system identifiers (unique values) rather than social identifiers (e.g. name & address) wherever possible
- Encryption of identifiers to maintain ‘uniqueness’ but prevent cross-linking
TIP 7
Check the employment contract of your research staff. If appropriate clauses are already in place regarding the handling of confidential data, and there are sanctions that can be applied in case of a breach, then this is probably enough.
If not, or if multi-site data sharing is to take place, consider drafting a project letter along the lines of those in Web Extra 5. For multi-site data sharing, the PI at each site should also sign a letter confirming that all relevant staff have signed the appropriate letter.
TIP 8
Log all back-ups and copies taken so that they are properly controlled and protected, as well as being destroyed at the end of the project.
Example of the Icelandic Database
An example of the problems that linking databases can cause is the genetic research databank in Iceland, which was established through the Health Sector Database Act dating from December 1998 (Gertz 2004). This piece of legislation enabled the establishment of a database linking three different databases, consisting of the existing medical and genealogical records of the entire Icelandic population, living and dead, as well as a tissue bank containing samples of every living Icelander. Informed consent was required for taking the blood samples, but not for the inclusion of the medical and genealogical records. Rather, an opt-out system was established, whereby citizens had to apply and sign a form to remove their medical and family records from the database. All participants in the databases were going to remain anonymous, which was made feasible through one-way encryption. The name and addresses would be completely omitted from the database and the patient’s identity number would be encrypted. However, marital status, education, profession, municipality of residence and age of the person as well as specific diseases would be transferred to the database. Due to the size of the Icelandic population, which results in a very limited number of births a year, and the fact that the creation of new jobs for Icelanders was one of the incentives of the Health Sector Database project, the employees of the Health Sector Database would mostly be Icelanders. That, however, increases the probability of an employee recognising individuals from the richness of data entered into the database to a considerable degree, thus making the data in the database personally identifiable and accordingly both the terms of the Icelandic Constitution and international treaties regarding the handling of personally identifiable data applied. Accordingly, on 27 November, 2003, the Icelandic Supreme Court declared the Health Sector Database Act to be unconstitutional due to a breach of the privacy clause in the Icelandic constitution.
Providing access to the courts
A researcher acting as guardian of a dataset was asked to disclose raw data to the court (Inskip 1996). She explained that she was bound by the confidentiality agreements made with the study participants and could not do this. A court order was then issued. The court order was challenged and it was modified to include various conditions that restricted the disclosure and thus restricted the breaches of confidentiality. This however increased the workload for the researcher as every piece of paper had to be anonymised so that court only received ID numbers. A restriction was placed into the court order such that no names of individuals, organisations or locations identified from the documents should be revealed in court. However, in court, the place of diagnosis of one of the cases was given, with the result that the name of the individual would have been obvious to many. While it appeared that no deliberate attempt at identification had been made, this was in fact a breach of the terms of the court order. In this particular instance it appeared that the researchers took the confidentiality issues more seriously than those in the legal profession. Work with the legal profession and others is necessary to ensure that confidentiality is taken seriously at all levels and that confidentiality agreements with study participants are honoured as far is as reasonably possible within the constraints of the requirements of the court.
Accreditation
While informed consent is arguably the best practice, scenarios exist where informed consent is not possible, such as the secondary use of data, where individuals cannot be contacted for re-consenting any more. Another safeguard could be found in mandatory researcher accreditation to legitimise access to health records and related personal data for an ethically-approved research project. For this, national minimum standards would need to be defined and agreed upon; subsequently an accreditation programme would need to be developed.
The methodology could consist of web-based training on confidentiality, which would not be overly time-consuming for researchers, but would ensure a solid basis of knowledge in patient confidentiality issues. For example, the NIH requires those applying for grants to complete an annual web-based training for which a certificate is obtained. This can be found on:
http://137.187.172.160/CBTs/EthicsModules/login.asp
The training covers a wide range of ethics issues, rather than just confidentiality, and it would need adapting to make it appropriate for researchers working in Britain. It does, however, provide a starting point for the idea of web-based training and certification in such issues. Overall responsibility would have to rest with a single body. Ideally, the information about the researcher having achieved accreditation should be stored in a central register, to be administered by the body that issues the accreditation. In the case of serious breach of professional performance this body needs to have the power to penalise researchers. Regarding penalties, a serious breach of confidentiality would result in the researcher losing his/her accreditation and being struck off the register.
Example of staff confidentiality letter
[Headed note paper]
[Date]
Dear xxxxx,
I am writing to inform you of your obligations with respect to the confidential handling of clinical and administrative information originating from any of the clinical sites participating in the ZZZZ project.
I acknowledge that when conducting research, developing software and analysing data you might have to access such confidential data in relation to patients and staff. You must be are aware of the importance of observing and protecting patient and staff confidentiality when you are visiting a site or accessing their data by other means, directly or indirectly, even if it appears to be pseudonymous or anonymous.
You must limit access to such information to that strictly necessary to carry out tasks appropriate to the ZZZZ project and to keep any such information confidential. When you obtain copies of data for ZZZZ purposes you must only do so within the scope of the project, keeping such data secure and returning or destroying it as soon as possible. You must destructively erase any data held on hard disks as soon as practicable. Paper copies of reports and test printouts must be destroyed as soon as possible, preferably by use of a shredder.
You must be aware of the importance of respecting the confidentiality of personal health data and aware that summary dismissal is the likely consequence of failing to do so.
(Signed by head of department)
-------------------------------------------------------------------------------------------------------------
I confirm that I have read this letter, of which I have retained a copy, that I understand my obligations, that I agree to meet them and understand the consequences of not doing so.
(Signed by employee)
Date
Related articles
- Editor's Choice Published: 20 July 2006; BMJ 333 doi:10.1136/bmj.333.7560.0-f
- This Week In The BMJ Published: 20 July 2006; BMJ 333 doi:10.1136/bmj.333.7560.0-d
- Analysis And Comment Published: 27 July 2006; BMJ 333 doi:10.1136/bmj.333.7561.255
- Research Published: 12 March 2009; BMJ 338 doi:10.1136/bmj.b866
- Editorial Published: 10 August 2006; BMJ 333 doi:10.1136/bmj.333.7563.315
- Editorial Published: 10 February 2001; BMJ 322 doi:10.1136/bmj.322.7282.309
- Editorial Published: 13 December 2007; BMJ 335 doi:10.1136/bmj.39421.544063.BE
- Analysis And Comment Published: 10 August 2006; BMJ 333 doi:10.1136/bmj.333.7563.349
- Editorial Published: 07 February 2008; BMJ 336 doi:10.1136/bmj.39409.633576.BE
See more
- When your patient is a survivor of tortureBMJ November 09, 2016, 355 i5019; DOI: https://doi.org/10.1136/bmj.i5019
- Popular measure in California ballot targets drug pricesBMJ October 28, 2016, 355 i5830; DOI: https://doi.org/10.1136/bmj.i5830
- European drug agency launches website giving open access to trial dataBMJ October 21, 2016, 355 i5700; DOI: https://doi.org/10.1136/bmj.i5700
- Industry sponsorship hits the headlinesBMJ October 19, 2016, 355 i5585; DOI: https://doi.org/10.1136/bmj.i5585
- Beyond open data: realising the health benefits of sharing dataBMJ October 10, 2016, 355 i5295; DOI: https://doi.org/10.1136/bmj.i5295
Cited by...
- Written informed consent and selection bias in observational studies using medical records: systematic review
- Improving access to research data in Europe
- Assuring the confidentiality of shared electronic health records
- Selection bias resulting from the requirement for prior consent in observational research: a community cohort of people with ischaemic heart disease
- Governance of research that uses identifiable personal data
- Balancing potential risks and benefits of using confidential data
- Consent for the use of personal medical data in research