Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2004;328:1478 (19 June), doi:10.1136/bmj.328.7454.1478
Nick Black, professor of health services research1, Marian Barker, research assistant1, Mary Payne, research fellow1
1 Health Services Research Unit, Department of Public Health and Policy, London School of Hygiene and Tropical Medicine, London WC1E 7HT
Correspondence to: N Black nick.black{at}lshtm.ac.uk
Design Cross sectional survey, with interviews with database custodians and search of electronic bibliographic database (PubMed).
Studies reviewed 105 clinical databases across the United Kingdom.
Results Clinical databases existed in all areas of health care, but their distribution was unevencancer and surgery were better covered than mental health and obstetrics. They varied greatly in age, size, growth rate, and geographical areas covered. Their scope (and thus their potential uses) and the quality of the data collected also varied. The latter was not associated with any organisational characteristics. Despite impressive achievements, many faced substantial financial uncertainty. Considerable scope existed for improvements: greater use of nationally approved codes; more support from relevant professional organisations; greater involvement by nurses, allied health professionals, managers, and laypeople in database management teams; and more attention to data security and ensuring patient confidentiality. With some notable exceptions, the audit and research potential of most databases had not been realised: half the databases had each produced only four or fewer peer reviewed research articles.
Conclusions At least one clinical database support unit is needed in the United Kingdom to provide assistance in organisation and management, information technology, epidemiology, and statistics. Without such an initiative, the variable picture of databases reported here is likely to persist and their potential not be realised.
In 2001 we created the Directory of Clinical Databases (www.docdat.org), which allows, for the first time, access to descriptions of the clinical databases that exist in the United Kingdom, including independent reports of their quality, and allows us to explore which organisational and managerial features of databases are associated with high quality and to make recommendations for improvements.10 11
We identified databases through inquiries to government health departments, royal colleges and specialist associations, and pharmaceutical companies; searches of previous reviews, research publications, and the internet; and word of mouth. Each entry was based on an in depth interview (usually by telephone) with the custodian of the database. Interviews lasted 30-60 minutes and followed a structured format. The principal aspects included were the geographical area covered, the length and periodicity of data collection, the number of patients or episodes collected to date, the use of nationally approved NHS codes, linkage to other databases, security and confidentiality of the data, the feasibility of ad hoc analyses, use of the data for audit and research, approval from professional bodies, the composition of any management team, and sources of funding.
In addition, the interviewer independently assessed 10 aspects of database quality using a tested instrument.9 Five of these relate to data quality (completeness of recruitment, completeness of data, use of explicit definitions for variables, independence of observations of outcomes, and extent to which data are validated). The interviews were supplemented by an analysis of a bibliographic database (PubMed) to ascertain the number of research papers in peer reviewed journals that had made use of the databases.
Analysis of databases
We included all 105 databases for which a full entry existed in DoCDat in August 2003. We analysed the databases to describe the clinical areas they covered, their organisation and management, how data security and confidentiality were managed, what the databases were used for, and the quality of the data, and to explore any associations between characteristics of databases and the five dimensions of data quality. We tested the statistical significance of any associations with the
2 test.
Organisation and management of databases
Most of the databases (66%) covered one or more UK nations (England, Scotland, Wales, Northern Ireland) (table 1). The rest were restricted to a region of one of these countries. Most (81%) had collected data continuously since being established, while most of the rest had been set up for a one off period.
|
Although half the databases used nationally approved codes to identify patients (that is, the NHS number), less than a third used approved institutional codes, and only 16% used clinician codes. A third had obtained explicit approval from the relevant clinical or professional body, such as a royal college.
The group or team managing a database varied in size and in composition. While almost all included doctors (95%), only 42% included nurses, and 26% included allied health professionals (such as physiotherapists). Most recognised the need for technical and methodological input from epidemiologists (69%), statisticians (79%), and information technology specialists (63%). In contrast, only a minority saw a need for representation from managers (32%) and laypeople (24%).
Funding came from a variety of sources. Most received some funds from the public sector (mainly the Department of Health or the NHS), though sometimes this was only pump priming to get the database established. Three other sourcesprivate sector, subscriptions from participating healthcare providers, and charitieseach provided finance for 10-20%. A few databases (5%) reported no funding. The distribution of funding partly reflects the absence from DoCDat of databases owned by private companies, despite our attempts to include them.
|
Five of the databases were established over 50 years ago (table 2), four being cancer registries and one a longitudinal birth cohort. However, most databases were much more recent, with over half starting since 1990. While there is evidence that the establishment of new databases has accelerated in recent years, our figures refer only to those still functioning in 2003. It is likely that some databases established in earlier decades have since stopped, giving an underestimate of the incidence of new ones in that period. Information for the most recent years is also likely to be an underestimate because of delays in identifying new databases.
|
The databases varied considerably in size, from data on a few hundred patients or episodes of care to over 130 million (table 2). Not surprisingly, smaller databases tended to have been established more recently: the five smallest had all started in the previous five years. The three largest databases recorded hospital admissions for an entire country (Hospital Episode Statistics, Scottish Morbidity Record, and Patient Episode Database for Wales). The growth rate of databases with continuous recruitment also varied considerably. Five databases acquired fewer than 100 patients or episodes of care a year. These tended to cover rare conditions (such as acromegaly or motor neurone disease). In contrast, the national hospital inpatient databases accumulated more than half a million new episodes a year.
Security and confidentiality of data
Most databases (71%) stored their data on a computer connected to an external network, albeit with robust firewalls to prevent intruders (table 3). Back up versions of the database were usually on disks or CD Roms (65%), although 30% of databases backed up to another computer with external connections. Almost 60% also retained data on paper forms.
|
Ideally, data should be reversibly anonymised (by using key codes), so as to minimise the risk of disclosing individual identities12 but to maximise possible use of the database.13 This was true for only 33%. Of the remainder, 12% were irreversibly anonymised and 55% contained patient identifiers. For most databases (72%), the patients had not been informed that personal data were being collected, and for 88%, signed consent was not obtained.
Uses of databases
Irrespective of the purposes for which the databases were established, most could be used for all four of the principal applications cited in the introduction. Most (75%) allowed ad hoc analyses to be conducted centrally, where the data were aggregated and stored, but only 58% allowed such analyses to be conducted locally by the providers of the data (table 4). Of the 95 databases that identified the healthcare providers, 40% produced audit reports on individual providers, but only 29% provided multicentre comparative reports.
|
The use of databases for research was also patchy. About a third were unable to provide a bibliography of peer reviewed journal articles based on use of their database. Our search of PubMed revealed that about a quarter of the databases had not been used for any articles and a further quarter had been used in fewer than five articles each. In contrast, eight databases had each been the basis of over 100 articles. These data probably underestimate the research output as PubMed does not facilitate searches by database name and some authors may fail to mention the database they used. In addition, some articles may have appeared in journals not indexed by PubMed, and some recently established databases would not be expected to have generated any research output yet. However, we may also have overestimated the output as some articles, while authored by database custodians and associates, may not have made use of the database in question.
Quality of the data
We measured the quality of the data in the databases against five criteria (table 5). Over half (57%) of the databases recruited at least 90% of eligible people or episodes. This underestimates the true prevalence of good databases, because up to a third of database custodians did not know their recruitment proportion. Similarly, over 40% did not know the level of completeness of their data. Of those that did, about 70% reported high levels of completeness.
|
About half the databases did not use explicit definitions for most of the variables collected. Of the 94 databases that included an outcome variable, 64% either used an independent, blinded observer (that is, one who was unaware of the intervention undergone) or this was unnecessary as the outcome was objective (usually survival). The data in almost all databases (92%) were subjected to range and consistency checks. In 27% some form of external validation (such as comparison with medical records) was also conducted.
Associations between characteristics of databases and data quality
Only one of the eight organisational characteristics we examined was significantly associated with database quality (table 6): databases that informed patients they were being included were more likely to have complete data. Given that 40 associations were tested, one statistically significant one (at P = 0.016) may have occurred by chance. Seven other associations were significant at the 10% level (P = 0.1): national databases were associated with better validation but poorer assessment of outcome than regional databases; approval from a professional body was associated with better validation; routine linkage was associated with better recruitment and validation; and having an epidemiologist or statistician in the management team was associated with better recruitment and validation.
|
Considerable scope exists for improvements: greater use of nationally approved codes; more support from relevant professional organisations; greater involvement by nurses, allied health professionals, managers, and laypeople in database management teams; and more attention to data security and ensuring patient confidentiality (something that most database custodians are currently addressing in meeting the requirements of the Patient Information Advisory Group14). With some notable exceptions, the audit and research potential of most of the databases was not being realised.
Study limitations
Although this review provides critical information on UK clinical databases for the first time, it is limited in three ways. Firstly, we may have missed some key databases. In particular, those included were restricted to the public sector, despite our attempts to gain access to information about databases held by pharmaceutical companies. Secondly, our information was largely self reported by database custodians. Despite careful interrogation, some accounts may be unjustifiably favourable. Thirdly, the lack of statistically significant associations between organisational characteristics and data quality (with one exception) may reflect the small sample size.
|
Implications of results
Considerable effort and resources are expended by clinicians, methodologists, computing staff, and others to create and maintain clinical databases in Britain. Many database custodians work in isolation with little contact or support from others engaged in similar activities. The need for improvements in database organisation, data quality, and data use is widely recognised among these highly committed individuals and groups. The inception, in March 2003, of a forum for database custodians to exchange experiences and help one another has both highlighted these needs and started to meet them. However, without resources to promote and support the development of databases, such measures can have only limited impact.
Just as clinicians are not expected to be able to organise and carry out randomised trials without the support of clinical trials units, so we should not expect high quality databases to be developed without some dedicated specialised support. This could be met by the establishment of at least one clinical database support unit. This could provide advice and assistance in organisation and management, information technology, epidemiology, and statistics. Without such an initiative, the variable picture of databases reported here is likely to persist, and their potential will not be realised.
Contributors: NB conceived and designed the study, wrote the paper, and acts as guarantor. MB and MP carried out the analyses and commented on the paper.
Funding: DoCDat was funded by the National Centre for Health Outcomes Development.
Competing interests: The authors created and maintain DoCDat.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
Israeli students are refusing to perform intimate examinations on anaesthetised women without their informed consent.