- Ben Y Reis, assistant professor12,
- Isaac S Kohane, professor12,
- Kenneth D Mandl, associate professor12
- 1Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Children’s Hospital Boston, Boston, MA, USA
- 2Harvard Medical School, Boston, MA
- Correspondence to: B Y Reis, 1 Autumn St, Room 540.1, Boston, MA 02115
- Accepted 26 May 2009
Objective To determine whether longitudinal data in patients’ historical records, commonly available in electronic health record systems, can be used to predict a patient’s future risk of receiving a diagnosis of domestic abuse.
Design Bayesian models, known as intelligent histories, used to predict a patient’s risk of receiving a future diagnosis of abuse, based on the patient’s diagnostic history. Retrospective evaluation of the model’s predictions using an independent testing set.
Setting A state-wide claims database covering six years of inpatient admissions to hospital, admissions for observation, and encounters in emergency departments.
Population All patients aged over 18 who had at least four years between their earliest and latest visits recorded in the database (561 216 patients).
Main outcome measures Timeliness of detection, sensitivity, specificity, positive predictive values, and area under the ROC curve.
Results 1.04% (5829) of the patients met the narrow case definition for abuse, while 3.44% (19 303) met the broader case definition for abuse. The model achieved sensitive, specific (area under the ROC curve of 0.88), and early (10-30 months in advance, on average) prediction of patients’ future risk of receiving a diagnosis of abuse. Analysis of model parameters showed important differences between sexes in the risks associated with certain diagnoses.
Conclusions Commonly available longitudinal diagnostic data can be useful for predicting a patient’s future risk of receiving a diagnosis of abuse. This modelling approach could serve as the basis for an early warning system to help doctors identify high risk patients for further screening.
Despite the critical importance of historical data in medical decision making1 2 3 and the growing amount of longitudinal data available in electronic health record systems, clinicians often do not have the time or the resources to reliably access, absorb, and review all the information available to them during brief consultations.4 5 6 7 Even with unlimited time and resources, assimilating all available information is a difficult task. Furthermore, Bodenheimer et al describe the “tyranny of the urgent”—where the brief patient-doctor visit allows time to deal with only acute situations, rather than optimise long term care.8 As a result, much of the electronic health information might not be properly interpreted, used, or even accessed, leading to potential missed diagnoses of certain clinical conditions.
One such condition is domestic abuse,9 10 11 which is often difficult to diagnose from a single encounter and might go unrecognised for long periods of time as it is masked by acute conditions that form the basis of clinical visits.11 12 13 14 Typically, after a diagnosis of abuse is made, a retrospective review of the longitudinal record reveals a discernable pattern of diagnoses suggestive of abuse. Domestic abuse is the most common cause of non-fatal injury to women in the United States9 and accounts for more than half the murders of women every year.15 It affects women and men and involves up to 16% of US couples a year,16 with estimates of lifetime prevalence as high as 54%11 and lifetime risk of injury as high as 22%.9 As undetected abuse can result in serious injury and fatality, it is critical that those at risk should be identified as early as possible.12 17 18
Studies have shown that screening for domestic abuse, along with appropriate follow-up,14 19 20 can be beneficial for early detection, treatment, and prevention of future violence, and carries few if any adverse effects.12 14 17 21 22 For example, one study used screening to identify 528 women as victims of intimate partner violence, of whom 443 (84%) agreed to speak to an advocate, 234 (54%) accepted case management follow-up, and 115 (49%) reported that they no longer believed they were at risk of violence from their abuser three to six weeks later.22 Studies have also shown that both abused and non-abused patients favour routine screening.12 23 24 As a result, the American Medical Association and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) have recommended routine screening for domestic abuse in the healthcare setting.11 14 25 A recent report from the BMA (British Medical Association) urged doctors and healthcare professionals to be more vigilant for signs of domestic abuse.26 Even though some do not call for universal screening,27 many still emphasise the importance of identifying and screening high risk patients.28
Screening for domestic abuse is particularly important in the emergency department, where victims are most often encountered.29 A three year study found that over 80% of those who experienced domestic abuse reported to the emergency department, with visits tending to peak in the month of the incident.30 The overall prevalence of domestic abuse in patients presenting to the emergency department is about 2-7.2%.21 They often present there because of limited access to traditional healthcare services, unwillingness or inability to discuss the subject with their own physician, or embarrassment or inability to present to social services outside the emergency department.11 The critical role played by emergency department clinicians in detecting domestic abuse has led to specific calls for heightened awareness for domestic abuse in presenting patients.11 31
Despite the growing evidence and official recommendations, actual screening rates remain low in practice,16 23 25 30 32 33 resulting in many missed cases of abuse, with only 5-30% of domestic abuse cases being successfully identified in the emergency department.16 30 34 McLeer et al describe a “systems failure” in the protection of abused patients that leaves many of those passing through emergency departments unidentified and untreated.35 In addition to low screening rates, barriers to detection include clinicians’ limited encounters with the abused patients, a clinical focus on acute conditions rather than on long term issues, a lack of special training in recognising abuse, a fear of offending the patient, and a lack of resources, staff, and procedures necessary for handling abuse cases.11 14 33 36 Barriers related to the patient include their reluctance to talk, lack of awareness of provider’s role, confidentiality concerns, and the attempts of patients and others to conceal abuse by offering deceptive oral histories at the time of the encounter.11 14 37
Screening tools and scoring systems developed to assist doctors in detecting domestic abuse,11 18 21 38 whether in paper form21 or through computerised screening,38 are becoming more common. Some of these tools use clinical indicators such as the nature and anatomical site of injury, but these have limited predictive value.39 The greatest limitation of current screening approaches is that they rely on information collected from the patient during the current clinical encounter and do not take advantage of the growing amounts of longitudinal data stored in electronic health information systems.
We evaluated the usefulness of commonly available longitudinal medical information for predicting a patient’s risk of receiving a future diagnosis of abuse. We developed intelligent histories—Bayesian models aimed at predicting the risk of an individual receiving a future diagnosis based on that individual’s diagnostic history.
Our modelling approach could form the basis for an early warning system that monitors longitudinal health data for long term indicators of abuse risk and alerts clinicians when high risk patients are identified. As a first step towards this goal, we describe a prototype risk visualisation we are developing to provide clinicians with instant overviews of longitudinal medical histories and related risk profiles at the point of care. In conjunction with alerts for high risk patients, this could enable clinicians to rapidly review and act on all available historical information by identifying important risk factors and long term trends.
We analysed longitudinal diagnostic histories of patients aged over 18 who had at least four years between their earliest and latest diagnoses recorded in an anonymised state-wide claims database covering six years of admissions to hospital, stays at hospitals for observation, and emergency department encounters. Some 561 216 patients met the inclusion criteria, having a total of 16 785 977 diagnoses among them.
Cases of abuse were identified according to ICD-9 (international classification of diseases, ninth revision) diagnostic codes, by using two different case definitions. The first, narrow case definition included all codes that explicitly refer to abuse (table 1⇓). The second, broader case definition included the above codes, plus codes associated with intentional assault and injury (table 2⇓). Similar case definitions based on ICD-9 codes have been previously validated as capturing over 95% of intentional injury cases.40
In total, 5829 patients (1.04%) met the narrower case definition, with 511 659 diagnoses among them (average of 87.8 diagnoses per patient), and 555 387 patients did not meet the narrower case definition, with 16 774 318 diagnoses among them (average of 30.2 diagnoses per patient). Some 19 303 patients (3.44%) met the broader case definition, with 1 156 325 diagnoses among them (average of 59.9 diagnoses per patient), and 541 913 patients did not meet the broader case definition, with 15 629 652 diagnoses among them (average of 28.8 diagnoses per patient).
We developed Bayesian models to estimate a patient’s risk of receiving a future diagnosis of abuse based on the diagnostic history. We used naive Bayesian classifiers,41 an established modelling approach that assumes independence between the various features (diagnoses and other variables) used to classify the cases (patients) into different classes (low versus high risk of receiving a future diagnosis of abuse). Complete details of the model can be found in the technical appendix on bmj.com.
In summary, patients meeting the inclusion criteria were randomly assigned to a training set used to train the model (two thirds) or to a testing set used to validate it (one third). To account for sex specific differences in risk, we trained separate models for men and women. After training, we calculated a “partial risk score” for each diagnosis—the higher the partial risk score, the more predictive the diagnosis was of abuse. In addition to diagnoses, the model also incorporated the average number of visits a year recorded for the patient over the study period. This average number of visits, v, was categorised into one of six groups: v≤1, 1<v≤2, 2<v≤4, 4<v≤6, 6<v≤10, or v>10, and a partial risk score was calculated for each group.
We used the testing set, containing the remaining third of the patients, to validate the model. The model was applied retrospectively to the diagnostic histories of each patient in the testing set, analysing the data for each patient one visit at a time in chronological order and generating an “overall risk score” for the patient at the time of each new visit based on the sum of all the partial risk scores for that patient. These overall risk scores were interpreted with empirical thresholds determined according to desired specificity levels, and the corresponding sensitivity and timeliness levels were measured. To systematically gauge the actual trade-off between different levels of sensitivity and specificity in the testing set, the thresholds were set with the testing set. In an operational setting, users can set thresholds in advance based on the training set. In such a case, differences between the testing and training data might lead to a difference between desired specificity levels and actual specificity levels achieved.
In predicting the risk of patients receiving future abuse diagnoses, the intelligent history models achieved an area under the ROC curve of 0.88 for the narrower case definition and 0.82 for the broader case definition. Figure 1⇓ shows the sensitivity versus the false alarm rate. Table 3⇓ shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common.
The model could detect high levels of risk of abuse far in advance of the first diagnosis of abuse recorded in the system (fig 2⇓). The model detected risk of abuse an average of 10-30 months in advance, depending on the chosen level of specificity.
Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3⇓ shows that partial risk score rises with the average number of visits a year. An increase in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger (steeper slope) among women than among men.
Next, we examined the risks associated with different categories of illness. Figure 4⇓ shows the distribution of partial risk scores in each of 12 general clinical categories. (For visualisation purposes, the diagnoses were grouped into 12 general clinical categories, based on the clinical classification software (CCS)42 published by the Agency for Healthcare Research and Quality (see table A on bmj.com). These categories were used for visualisation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category.
We also examined sex based differences in risk profiles. Figure 5⇓ shows a “treemap”43 visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of the rectangle for each diagnostic category indicates the prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest) for the category as a whole. Several interesting trends became evident when we compared the risks for certain diagnostic categories between the two sexes (table 4)⇓. While more abused men have alcohol related disorders, alcohol related disorders are more predictive of abuse in women than they are in men. Similarly, poisoning and injuries due to external causes are more predictive of abuse in women than they are in men. On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women.
We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6⇓ shows two sample visualisations of individual patients’ histories designed to allow rapid interpretation by a clinician. Each bar represents a diagnosis, with time proceeding chronologically from the top to the bottom along the y axis. Each graph begins with the first encounter recorded for the patient (top) and ends with the first recorded diagnosis of abuse (bottom). The diagnoses are grouped into the 12 general clinical categories described above.42 The bars also represent the partial risk score assigned by the model to the particular diagnosis. For the patient in the top panel, a high risk of abuse would have been detected 27 months before the first diagnosis of abuse was recorded, given a target specificity of 95%. For the patient in the lower panel, this lead time would have been 34 months.
Principal findings and interpretation
Longitudinal diagnostic data commonly available in electronic health information systems can be valuable for predicting a patient’s risk of receiving a future diagnosis of abuse. Unlike previous approaches to estimating risk,11 18 21 38 our approach examines longitudinal information rather than focusing exclusively on information collected during the present visit.
We found significant differences in longitudinal patterns of diagnoses between abused and non-abused individuals, and these differences can be used for early identification—up to years in advance—of individuals at high risk for receiving a future diagnosis of abuse. Certain broad categories of diagnoses, like psychological related conditions, were highly associated with risk of abuse. This is noteworthy as screening rates in practice have actually been found to be lower among patients presenting with psychological conditions compared with other conditions.32
Risk characteristics of specific diagnoses varied across sexes, and it is therefore useful to construct separate sex specific models of abuse risk. Abused patients had a higher average number of visits a year,15 and that this metric can be useful for differentiating between high and low risk patients.
Strengths and limitations of the study
We used a state-wide dataset covering six years of admissions to hospital, observation stays in hospital, and encounters in emergency departments. Any visits taking place outside this state, beyond this time period, or in a different care setting were not included. As a result, certain diagnoses that would have helped or hindered in identifying high risk patients might not be recorded in the dataset, thus affecting the results for that patient. Furthermore, certain people might have received a diagnosis of abuse that was not recorded in the dataset, and these people might have been misclassified as not meeting the case definition or as meeting the case definition at a different time than they actually did. Our dataset did include comprehensive coverage of all encounters in emergency departments in the state. As described above, the emergency department is where abused patients are most often encountered,29 30 and such encounters are considered most critical for detecting abuse.11 31 Thus we consider there is sufficient coverage for a reasonable analysis to take place.
Our case definition includes codes highly specific for abuse, assault, and intentional injury. As with all real world data, however, some visits might have been miscoded. Such omissions and inaccuracies in the data might reduce the performance of the model, but the demonstration of the utility of this approach using real world data has the potential to catalyse additional efforts in generating accurate diagnostic coding for each care episode.
Depending on the case definition used and the desired levels of specificity, the model can yield low to moderate positive predictive values (up to 14.4% for the narrow case definition and 18.9% for the broader case definition, see table 3). This is to be expected with conditions having a low prevalence (in the present case, 1.04% with the narrow definition and 3.44% with the broad definition), as the positive predictive value is directly proportional to prevalence of the condition being detected. These levels could be clinically useful in settings where the model is being used to identify patients for whom standard screening should be performed, especially when screening rates in practice remain below desired levels.16 23 25 30 32 33
We focused on predicting the risk of future diagnoses of abuse, and the model is trained on patients who have been diagnosed in a clinical setting. Potential differences between cases of abuse that typically get diagnosed versus those cases that typically do not get diagnosed might serve as an important bias and might hinder the model’s ability to detect the latter. As mentioned above, however, domestic abuse often goes undiagnosed or is diagnosed only after considerable delay. Given the current high levels of underdiagnosis, it is likely that use of the model in a clinical setting would lead to the detection of some of the cases that are currently not typically diagnosed. The effect of implementing such a model in clinical practice is an important empirical question for future research.
Differences in care and coding practices might affect the generalisability of models from one health environment to another. We therefore recommend the training of a specific model for each healthcare environment. We expect the modelling approach to be generalisable to other settings inside and outside the US, as the minimal set of data elements (ICD-9 codes, dates of visits) used by the model are commonly stored throughout many countries with electronic medical record systems or claims systems. In countries that do not yet have electronic medical record systems, these models would be difficult to implement, though with time, electronic medical record systems are being deployed more widely throughout the world.
Our goal was to predict a patient’s risk of receiving a future diagnosis of abuse, based on the patient’s longitudinal diagnostic record to date. This prediction can help care givers to identify individuals who fall into either of two categories: those who may be currently experiencing abuse but have yet to be diagnosed and those who are not yet experiencing abuse but are at a high risk of being abused in the future. Currently, the model does not differentiate between these two types, though this is an important area for future research, as such a differentiation might enable explicit attempts to estimate time to event.
Further aspects are worthy of future study. Currently, the risk associated with each diagnosis is modelled separately. More complex models can be developed to explicitly incorporate the relations between multiple diagnostic codes—for example, the presence of diagnosis A together with diagnosis B might be more or less predictive of abuse risk than the combination of the individual risks of A or B alone.
While the present analysis relied on claims data, the structured information and text available in more comprehensive electronic health information systems can provide a richer substrate for future intelligent history models. Explicitly modelling temporality, such as the order in which visits occurred and the intervals of time between certain diagnoses, might further improve performance.
With proper integration into the clinical workflow, the intelligent history could aid the already overloaded clinician in identifying high risk patients who warrant further in-depth screening by the clinician. Such screening must always take place in the context of proper training for physicians in handling abuse and an environment that offers appropriate resources and referrals for abused patients.14 19 20 It is important to emphasise that an early warning system based on intelligent history models would not be intended for making the diagnosis of abuse but rather for identifying patients who are at high risk of receiving a future abuse diagnosis and therefore warrant screening. This is especially important in settings where screening rates in practice remain below desired levels.16 23 25 30 32 33
Potential next steps towards the development of an early warning system for clinicians would include automation of the intelligent history as a service-oriented tool, and rigorous design work on the human interface to refine and test the numerical and visual presentation in creating an early warning system for clinicians. The approach would work as follows. A patient’s longitudinal medical history accumulates over time inside an electronic health record system. Whenever new information is recorded for the patient, the intelligent histories model re-analyses the information accumulated to date to estimate the patient’s risk of receiving a future diagnosis of abuse. The patient’s physician is notified if the patient is at high risk of abuse. The physician uses the visualisation to quickly review the patient’s past diagnoses and identify important long term trends in the patient’s history. The risk estimate, together with the high level view of the patient’s diagnostic history, enables the physician to make a better informed decision about whether to proceed with further screening of the patient. In this way, the intelligent histories model could improve screening by helping physicians to identify high risk patients who might otherwise be missed.
In conclusion, our findings suggest that the vast quantities of longitudinal data accumulating in electronic health information systems present an untapped opportunity for improving medical screening and diagnosis. In addition to the direct implications for prediction of risk of abuse, the general modelling framework presented here has far reaching potential implications for automated screening of other clinical conditions where longitudinal historical information can be useful for estimating clinical risk.
What is already known on this topic
Domestic violence is a dangerous condition that is difficult to detect, and screening rates are low
Diagnostic histories might be useful in identifying patients who are at high risk of abuse, but physicians typically do not have time to thoroughly review this information during the course of a clinical visit
What this study adds
Longitudinal medical information commonly available in electronic health systems can be useful for predicting the risk of a patient receiving a future diagnosis of abuse
The Bayesian models used can serve as the basis for a future early warning system that could help doctors to identify high risk patients for further screening
Cite this as: BMJ 2009;339:b3677
We thank Karen Olson for preparing the dataset for analysis.
Contributors: BYR designed the study, developed the models, analysed the results, wrote the manuscript, and is guarantor. ISK contributed to study design and writing the manuscript and advised on clinical issues. KDM contributed to study design and writing the manuscript and advised on clinical issues.
Funding: This work was supported by the US Centers for Disease Control and Prevention (grant R01 PH000040) and the National Library of Medicine (grants R01 LM009879, R01 LM007677, and G08LM009778). The funders have no involvement with the research.
Statement of independence of researchers from funders: The authors and the research are completely independent of the funders.
Competing interests: None declared.
Ethical approval: This study was approved by the institutional review board approval.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.