Intended for healthcare professionals


The market in healthcare data

BMJ 2015; 351 doi: (Published 04 November 2015) Cite this as: BMJ 2015;351:h5897
  1. Ruth Gilbert, professor of clinical epidemiology12,
  2. Harvey Goldstein, professor of social statistics23,
  3. Harry Hemingway, professor of clinical epidemiology1
  1. 1 Farr Institute of Health Informatics Research, University College London, UK
  2. 2UCL Institute of Child Health, London, UK
  3. 3Centre for Multilevel Modelling, University of Bristol, UK
  1. Correspondence to: R Gilbert r.gilbert{at}

Seriously inhibiting research

Healthcare generates a vast rainforest of data that could be used more widely for research if barriers to access could be overcome. One barrier is cost. Government and commercial traders in the precious hardwoods of healthcare data charge five or six figure sums per year for data access.1 2 Other barriers include the slow, tortuous application processes that are driven by rigid rules, often arbitrarily applied. These barriers obstruct access to more affordable data from the Health and Social Care Information Centre (HSCIC), the Office for National Statistics, the Healthcare Quality Improvement Partnership, and Public Health England.

A fundamental problem with all these data suppliers is the lack of incentives to provide data for research. Although the research might save lives through improving services or developing new tests or treatments, denying access carries no penalties. HSCIC has a growing backlog of more than 200 applications,3 but queues for data from the other suppliers are rarely recorded. We see little attempt to quantify the research not done, the evidence forgone, and the benefits not realised because of these barriers.

New markets for data are emerging. Recent pressures to join up local services are creating opportunities for researchers to access some data more easily and cheaply. Local linkages of individual patient data from primary care and hospital are being developed to support integrated services.4 5 Some initiatives also link to social care and community healthcare data, some tap into rich detailed hospital data that are unlikely ever to be nationally concentrated at HSCIC, and several make the data available for research or service evaluation.6 7 Disadvantages include variation in data quality and processing, which may limit comparability across initiatives. Also, standardising data governance across so many suppliers is challenging.

Is this mixed economy of market and central control likely to support sustainable use of healthcare data for research and for more efficient services? As the debacle demonstrated, the ecosystem of healthcare data is fragile and can be damaged for all users of data if public support falters. Carter et al argue that public support for using healthcare data requires a social licence that is founded on the potential risks being outweighed by the benefits to society.8 Yet, if research cannot be done because of prohibitive costs or data access times, this benefit cannot be realised.

What must change to realise the benefits of research from healthcare data? Firstly, research based on healthcare data has to drive a learning health system. Linked general practice and hospital data, or genomics data, are not just commodities to be sold. Priorities for public benefit include research behind data suppliers’ firewalls on how to improve data quality and link up data more accurately and efficiently so that they are more usable for running services and research.9 There is also an urgent need for research to guide the increasing use of healthcare data for direct patient care. Service innovations such as online access by patients to their general practice records,10 use of linked general practice and hospital records by commissioners to stratify risk and target certain patient groups, identification of vulnerable children through the child protection information sharing project,11 and notifications of female genital mutilation12 all have media and policy appeal, but rigorous evidence is needed to determine whether benefits outweigh harms for the whole population.

Secondly, we need greater transparency about which data are provided for research use, what they are used for, by whom, and how results might benefit health, healthcare, and society. HSCIC outshines other data suppliers by publishing a list of data recipients.13 Some primary care data suppliers publish bibliographies of studies using their data.1 2 It would be more transparent if all data suppliers did both. More openness is needed about privacy breaches, which are more common in health services than in non-health areas14 but very rare in health research. Transparent monitoring of privacy breaches is vital to reassure the public about innovations, such as the new legal duty to share data between health and adult social care, including the voluntary sector.15

Thirdly, financial rewards, and possibly penalties, are needed to drive wider, faster data access for research. Infrastructure funding could help to build capacity, encourage innovation, reduce rigid rulings, and help to make healthcare data affordable for all public sector researchers. Infrastructure funding from the National Institute for Health Research has made NHS services ready for trials and other studies.16 Could the same be done for the healthcare data market? Minor changes in NIHR funding rules, such as extending activity based funding to include studies using routine healthcare data, would help researchers to access data from hospitals. Why not also reward data suppliers for every timely, high quality data extract provided for research? Data suppliers need incentives to more effectively steward the data rainforest so that society can reap the full benefits.


Cite this as: BMJ 2015;351:h5897


  • Competing interests: We have read and understood BMJ policy on declaration of interests and declare that HH has received funding from AstraZeneca for a database study in electronic health records.

  • Provenance and peer review: Commissioned; not externally peer reviewed.