Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Jeff Luck a Veterans
Administration, Greater Los Angeles Healthcare System, 11 301 Wilshire Blvd, Los Angeles, CA 90073, USA, b Institute for Global Health,
74 New Montgomery St, San Francisco, CA 94105, USA Correspondence
to: John W Peabody
peabody{at}psg.ucsf.edu
| |
Abstract |
|---|
|
|
|---|
Objective:
To assess the validity of standardised
patients to measure the quality of physicians' practice.
Design:
Validation study of standardised patients' assessments. Physicians saw unannounced standardised patients presenting with common outpatient conditions. The standardised patients
covertly tape recorded their visit and completed a checklist of quality
criteria immediately afterwards. Their assessments were compared
against independent assessments of the recordings by a trained medical
records abstractor.
Setting:
Four general internal medicine primary care clinics in California.
Participants:
144 randomly selected consenting physicians.
Main outcome measures:
Rates of agreement between the
patients' assessments and independent assessment.
Results:
40 visits, one per standardised patient,
were recorded. The overall rate of agreement between the standardised patients' checklists and the independent assessment of the audio transcripts was 91% (
=0.81). Disaggregating the data by medical condition, site, level of physicians' training, and domain (stage of
the consultation) gave similar rates of agreement. Sensitivity of the
standardised patients' assessments was 95%, and specificity was 85%.
The area under the receiver operator characteristic curve was 90%.
Conclusions:
Standardised patients' assessments seem
to be a valid measure of the quality of physicians' care for a variety of common medical conditions in actual outpatient settings. Properly trained standardised patients compare well with independent assessment of recordings of the consultations and may justify their use as a
"gold standard" in comparing the quality of care across sites or
evaluating data obtained from other sources, such as medical records
and clinical vignettes.
|
What is already known on this topic
However, validating standardised patients' measurements of quality of care in actual primary practice is more difficult and has not been done in a prospective study What this study adds
|
| |
Introduction |
|---|
|
|
|---|
Standardised patients are increasingly used to assess the
quality of medical practice.1-4 They offer the advantage
of measuring quality while completely controlling for variation in case
mix.
5 6
Although standardised patients have long been
used to evaluate medical students and residents, their use in actual
clinical settings is relatively new.7 Validating the use
of standardised patients to measure quality in the actual practice
setting is, however, challenging and to our knowledge has not been
done. Direct observation in the clinic is difficult for a variety of
reasons, including cost, a potential Hawthorne effect (physicians
performing better under observation), and ethical problems linked to
informed consent. We did a validation study to determine whether
standardised patients perform as well in the clinic as they do in
medical education settings.3
| |
Methods |
|---|
|
|
|---|
Setting
The study sites were four general internal medicine primary
care clinics in California. All staff physicians, teaching physicians,
and second or third year residents were eligible. Of the 163 eligible
physicians, 144 consented to see standardised patients at some time
during the 1999-2000 and 2000-1 academic years. We used the sampling
function of Stata to randomly select consenting physicians to whom
standardised patients would present with one of eight different
clinical cases, two cases each for four common outpatient conditions
(box).
|
Clinical scenarios portrayed by standardised patients
|
Training of standardised patients
We trained 45 professional actors, approximately six per case
scenario, as standardised patients. The training protocol involved
several steps and is described in detail elsewhere.5 The
actors were trained to complete a checklist of 35-45 items that might
be performed or discussed by the physician (see bmj.com). The actors
completed the checklist immediately after the visit by marking each
item as done or not done. Checklist items were based on quality
measurement criteria derived from national guidelines on specific
conditions and were arrived at by expert panel review and a modified
Delphi technique.
Audio recording of visits
Of the 45 trained actors we successfully recorded 40, using a
digital "pen" recorder concealed on the patient. Each actor was
recorded once. In 27 of the 40 successfully recorded visits the
physicians reported that they had detected the standardised patients.
The number of visits was similar across study sites, conditions, and
physicians' level of training. To minimise potential variation in
performance, we asked the actors to wear the recorder for visits that
were not recorded. A trained medical records abstractor scored each
transcribed recording using the same quality criteria as in the
standardised patients' checklist.
Analysis
A total of 1258 quality measurement items were compared. The
items were aggregated into four domains corresponding to stages of a
visit: history taking, physical examination, diagnosis, and treatment
and management. We calculated the percentage of items in agreement
between the standardised patients' checklists and the recording
transcripts. We calculated
values to further quantify the degree of
agreement. Percentage agreement and
values were disaggregated by
condition, site, physicians' level of training, and domain. A
calibration curve was constructed to assess variation across actors.
Sensitivity and specificity were calculated for each visit and for all
visits combined, taking the audio recording as "truth" in the calculation.
| |
Results |
|---|
|
|
|---|
The overall rate of agreement between corresponding items on the
standardised patients' checklists and the recording transcripts was
91% (
=0.81) (table 1). Rates of agreement varied little by
medical condition, site, level of physicians' training, or domain. The
figure shows the variation among standardised patients. This
calibration curve plots the percentage of checklist items done by the
physician as noted by the standardised patient against the
corresponding percentage indicated by the audio recording of that
visit. Points cluster closely along the plotted regression line, which
has an intercept of 0.4% and a slope of 1.03. (Perfect calibration
would yield a line with intercept of 0% and a slope of
1.00.)
|
|
Sensitivity of standardised patients' assessments, compared against the audio recording transcripts, was 95%, and specificity was 85% (table 2). About two thirds of the items where the two methods disagreed were reported as done by the standardised patient but determined to be not done according to the transcript. The area under a receiver operator characteristic curve, constructed by plotting the sensitivity and specificity values for each recorded visit, was 90% (see bmj.com).
|
| |
Discussion |
|---|
|
|
|---|
Although patients and physicians alike desire improved quality, accurate measurement of quality remains problematic. Comparisons of quality across physicians and sites are hampered by imperfect adjustments for variation in case mix. Also, the underlying data on quality are of uncertain validity, because of logistical and ethical difficulties in directly observing physicians while they care for patients. Measurement of quality has therefore relied largely on medical records, which at best are incomplete and at worst falsified. 8 9 Standardised patients, despite being costly to train and implement, overcome the first problem by providing presentations that are perfectly adjusted for case mix. They may also be able to overcome the second problem, if their validity in the outpatient setting can be shown.
Many studies have turned to standardised patients when highly accurate measures of quality are needed.10 Standardised patients are particularly well suited for cross system comparisons, such as comparing general practice with walk-in care or for assessing quality for potentially sensitive conditions such as sexually transmitted infections and HIV.11-13
Standardised patients are already considered the criterion standard for evaluating competence in specialties and have become part of national certification examinations in the United States. And while the accuracy of standardised patients is assumed to be high, it has not been prospectively evaluated. 14 15
We found that standardised patients were well calibrated to actual recordings of clinical encounters. No apparent systematic bias was seen by medical condition, site, level of physicians' training, or domain of the encounter. Intermethod reliability was uniformly high. Standardised patients showed excellent sensitivity, specificity, and operating characteristics.
Limitations of the study
We assessed only verbal communication. In future studies doctors
may consent to unannounced visits that are video recorded. We did not
measure within-actor variation. In the medical education setting such
variation is managed by using standardised physicians to calibrate the
standardised patients.16 Such results show that
performances by a standardised patient are consistent from visit to
visit. We believe from anecdotal evidence that this was the case in our
study as well but have not measured it objectively.
Another issue that merits further study is how accurately standardised
patients can measure quality through a single encounter
or even a
short series of visits. Some studies suggest that a "first visit
bias" may skew assessment of quality, since chronic diseases typically necessitate several visits and ongoing follow
up.1 We deliberately used clinical scenarios that required
immediate interventions, and we are separately analysing those items
(particularly preventive care) that could be postponed to a future
visit. Future research might assess how well standardised patients'
measurements of quality for a few selected cases can comprehensively
assess an individual physician's overall
competence.
5 17 18
Setting standards
Using standardised patients to measure quality raises the question
of how to set standards for what is considered adequate clinical
competence. Panels of expert judges have been shown to be reliable for
setting standards.19 The expert judges seem to use a
compensatory model, where very good performance on some cases
compensates for performing poorly on other cases.20 Analysis of the receiver operator characteristics of standardised patients has also been used to set standards in performance assessments of students at examination level. Receiver operator characteristic analysis shows that standardised patients can differentiate between disparate levels of competence
for example, accurately discriminating between second and fourth year medical students.
21 22
Conclusions
We believe standardised patients are particularly useful to
validate innovative methods of quality measurement, such as
computerised clinical vignettes. Vignettes, like standardised patients,
inherently control for case mix variation; and, once validated against
actual clinical practice, vignettes can be more widely used because
they are cheaper and do not require subterfuge.23 Ultimately, accurate and affordable measurements of clinical practice underlie any effort to provide better quality for
patients.24
| |
Acknowledgments |
|---|
JL is assistant professor at the UCLA School of Public Health. JWP holds positions with the Veterans Affairs San Francisco Medical Center (staff physician), UCSF Department of Epidemiology and Biostatistics (associate professor), UCLA School of Public Health (associate professor), and RAND (senior social scientist). We thank the actors and the nurses, physicians, and staff at the study sites for their participation and Greer Rothman for preparation of the manuscript.
Contributors: See bmj.com
| |
Footnotes |
|---|
Funding: This research was funded by Grant IIR 98118-1 from the Veterans Affairs Health Services Research and Development Service. From July 1998 to June 2001 JWP was the recipient of a senior research associate career development award from the Department of Veterans Affairs.
Competing interests: None declared.
The full version of this article
appears on bmj.com
| |
References |
|---|
|
|
|---|
| 1. |
Beullens J, Rethans JJ, Goedhuys J, Buntinx F.
The use of standardized patients in research in general practice.
Fam Pract
1997;
14:
58-62 |
| 2. | Rethans JJ, Martin E, Metsemakers J. To what extent do clinical notes by general practitioners reflect actual medical practice? A study using simulated patients. Br J Gen Pract 1994; 44: 153-156[Web of Science][Medline]. |
| 3. | Kopelow ML, Schnabl GK, Hassard TH, Tamblyn RM, Klass DJ, Beazley G, et al. Assessing practicing physicians in two settings using standardized patients. Acad Med 1992; 67(10 suppl): S19-S21[Web of Science][Medline]. |
| 4. | Woodward CA, McConvey GA, Neufeld V, Norman GR, Walsh A. Measurement of physician performance by standardized patients. Med Care 1985; 23: 1019-1027[CrossRef][Web of Science][Medline]. |
| 5. | Glassman PA, Luck J, O'Gara EM, Peabody JW. Using standardized patients to measure quality: evidence from the literature and a prospective study. Jt Comm J Qual Improv 2000; 26: 644-653[Medline]. |
| 6. |
Carney PA, Dietrich AJ, Freeman Jr DH, Mott LA.
The periodic health examination provided to asymptomatic older women: an assessment using standardized patients.
Ann Intern Med
1993;
119:
129-135 |
| 7. | Badger LW, deGruy F, Hartman J, Plant MA, Leeper J, Ficken R, Templeton B, et al. Stability of standardized patients' performance in a study of clinical decision making. Fam Med 1995; 27: 126-131[Medline]. |
| 8. | Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med 2000; 108: 642-649[CrossRef][Web of Science][Medline]. |
| 9. | Dresselhaus T, Luck J, Peabody JW. The ethical problem of false positives: a prospective evaluation of physician reporting in the medical record. J Med Ethics 2002; 5: 291-294. |
| 10. |
Grant C, Nicholas R, Moore L, Salisbury C.
An observational study comparing quality of care in walk-in centres with general practice and NHS Direct using standardised patients.
BMJ
2002;
324:
1556 |
| 11. | Russell NK, Boekeloo BO, Rafi IZ, Rabin DL. Unannounced simulated patients' observations of physician STD/HIV prevention practices. Am J Prev Med 1992; 8: 235-240[Web of Science][Medline]. |
| 12. | Russell NK, Boekeloo BO, Rafi IZ, Rabin DL. Using unannounced simulated patients to evaluate sexual risk assessment and risk reduction skills of practicing physicians. Acad Med 1991; 66: 87-95. |
| 13. | O'Hagan JJ, Botting CH, Davies LJ. The use of a simulated patient to assess clinical practice in the management of a high risk asthmatic. N Z Med J 1989; 102: 252-254[Web of Science][Medline]. |
| 14. |
Norman GR, Davis DA, Lamb S, Hanna E, Caulford P, Kaigas T.
Competency assessment of primary care physicians as part of a peer review program.
JAMA
1993;
270:
1046-1051 |
| 15. |
Norcini JJ, Blank LL, Arnold GK, Kimball HR.
The mini-CEX (clinical examination exercise): a preliminary investigation.
Ann Intern Med
1995;
123:
795-799 |
| 16. | Finlay IG, Scott NC, Kinnersley P. The assessment of communication skills in palliative medicine: a comparison of the scores of examiners and simulated patients. Med Educ 1995; 29: 424-429[Web of Science][Medline]. |
| 17. |
Dauphinee WD.
Assessing clinical performance: where do we stand and what might we expect.
JAMA
1995;
274:
741-743 |
| 18. | Gordon JJ, Saunders NA, Hennrikus D, Sanson-Fisher RW. Interns' performances with simulated patients at the beginning and the end of the intern year. J Gen Inter Med 1992; 7: 57-62[Web of Science][Medline]. |
| 19. | Ross L, Clauser B, Margolis MJ, Orr NA, Klass D. An expert judgment approach to setting standards for a standardized-patient examination. Acad Med 1996; 71(10 suppl): S4-S6[CrossRef][Web of Science][Medline]. |
| 20. | Margolis MJ, De Champlain AF, Klass DJ. Setting examination-level standards for a performance-based assessment of physicians' clinical skills. Acad Med 1998; 73(10 suppl): S114-S116[Web of Science][Medline]. |
| 21. | Colliver JA, Barnhart AJ, Marcy ML, Verhulst SJ. Using a receiver operating characteristic (ROC) analysis to set passing standards for a standardized-patient examination of clinical competence. Acad Med 1994; 69(10 suppl): S37-S39[Web of Science][Medline]. |
| 22. | Van der Vleuten CP, Norman GR, De Graff E. Pitfalls in the pursuit of objectivity: issues of reliability. Med Educ 1991; 25: 110-118[Web of Science][Medline]. |
| 23. |
Peabody JW, Luck J, Glassman P, Dresselhaus TR, Lee M.
Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality.
JAMA
2000;
283:
1715-1722 |
| 24. |
Fihn SD.
The quest to quantify quality.
JAMA
2000;
283:
1740-1742 |
(Accepted 1 August 2002)