BMJ 2002;325:679-682 ( 28 September )

Papers

Using standardised patients to measure physicians' practice: validation study using audio recordings

Jeff Luck, assistant professor aJohn W Peabody, deputy director b

a Veterans Administration, Greater Los Angeles Healthcare System, 11 301 Wilshire Blvd, Los Angeles, CA 90073, USA, b Institute for Global Health, 74 New Montgomery St, San Francisco, CA 94105, USA

Correspondence to: John W Peabody
peabody{at}psg.ucsf.edu


    Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References

Objective: To assess the validity of standardised patients to measure the quality of physicians' practice.
Design: Validation study of standardised patients' assessments. Physicians saw unannounced standardised patients presenting with common outpatient conditions. The standardised patients covertly tape recorded their visit and completed a checklist of quality criteria immediately afterwards. Their assessments were compared against independent assessments of the recordings by a trained medical records abstractor.
Setting: Four general internal medicine primary care clinics in California.
Participants: 144 randomly selected consenting physicians.
Main outcome measures: Rates of agreement between the patients' assessments and independent assessment.
Results: 40 visits, one per standardised patient, were recorded. The overall rate of agreement between the standardised patients' checklists and the independent assessment of the audio transcripts was 91% (kappa =0.81). Disaggregating the data by medical condition, site, level of physicians' training, and domain (stage of the consultation) gave similar rates of agreement. Sensitivity of the standardised patients' assessments was 95%, and specificity was 85%. The area under the receiver operator characteristic curve was 90%.
Conclusions: Standardised patients' assessments seem to be a valid measure of the quality of physicians' care for a variety of common medical conditions in actual outpatient settings. Properly trained standardised patients compare well with independent assessment of recordings of the consultations and may justify their use as a "gold standard" in comparing the quality of care across sites or evaluating data obtained from other sources, such as medical records and clinical vignettes.

What is already known on this topic
Standardised patients are valid and reliable reporters of physicians' practice in the medical education setting

However, validating standardised patients' measurements of quality of care in actual primary practice is more difficult and has not been done in a prospective study

What this study adds
Reports of physicians' quality of care by unannounced standardised patients compare well with independent assessment of the consultations




    Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References

Standardised patients are increasingly used to assess the quality of medical practice.1-4 They offer the advantage of measuring quality while completely controlling for variation in case mix. 5 6 Although standardised patients have long been used to evaluate medical students and residents, their use in actual clinical settings is relatively new.7 Validating the use of standardised patients to measure quality in the actual practice setting is, however, challenging and to our knowledge has not been done. Direct observation in the clinic is difficult for a variety of reasons, including cost, a potential Hawthorne effect (physicians performing better under observation), and ethical problems linked to informed consent. We did a validation study to determine whether standardised patients perform as well in the clinic as they do in medical education settings.3


    Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References

Setting
The study sites were four general internal medicine primary care clinics in California. All staff physicians, teaching physicians, and second or third year residents were eligible. Of the 163 eligible physicians, 144 consented to see standardised patients at some time during the 1999-2000 and 2000-1 academic years. We used the sampling function of Stata to randomly select consenting physicians to whom standardised patients would present with one of eight different clinical cases, two cases each for four common outpatient conditions (box).
Clinical scenarios portrayed by standardised patients

  • Chronic obstructive pulmonary disease with a mild exacerbation and history of hypertension
  • Chronic obstructive pulmonary disease with an exacerbation associated with productive sputum, slight fever, and past history of hypertension
  • Type 2 diabetes with limited preventive care in the past and untreated hypercholesterolaemia
  • Poorly controlled type 2 diabetes and early renal damage
  • Congestive heart failure secondary to long standing hypertension and non-compliance with treatment
  • New onset amaurosis fugax in patient with multiple risk factors
  • Depression in an older patient with no other major clinical illness
  • Depression complicated by substance abuse

Training of standardised patients
We trained 45 professional actors, approximately six per case scenario, as standardised patients. The training protocol involved several steps and is described in detail elsewhere.5 The actors were trained to complete a checklist of 35-45 items that might be performed or discussed by the physician (see bmj.com). The actors completed the checklist immediately after the visit by marking each item as done or not done. Checklist items were based on quality measurement criteria derived from national guidelines on specific conditions and were arrived at by expert panel review and a modified Delphi technique.

Audio recording of visits
Of the 45 trained actors we successfully recorded 40, using a digital "pen" recorder concealed on the patient. Each actor was recorded once. In 27 of the 40 successfully recorded visits the physicians reported that they had detected the standardised patients. The number of visits was similar across study sites, conditions, and physicians' level of training. To minimise potential variation in performance, we asked the actors to wear the recorder for visits that were not recorded. A trained medical records abstractor scored each transcribed recording using the same quality criteria as in the standardised patients' checklist.

Analysis
A total of 1258 quality measurement items were compared. The items were aggregated into four domains corresponding to stages of a visit: history taking, physical examination, diagnosis, and treatment and management. We calculated the percentage of items in agreement between the standardised patients' checklists and the recording transcripts. We calculated kappa  values to further quantify the degree of agreement. Percentage agreement and kappa  values were disaggregated by condition, site, physicians' level of training, and domain. A calibration curve was constructed to assess variation across actors. Sensitivity and specificity were calculated for each visit and for all visits combined, taking the audio recording as "truth" in the calculation.




    Results
Top
Abstract
Introduction
Methods
Results
Discussion
References

The overall rate of agreement between corresponding items on the standardised patients' checklists and the recording transcripts was 91% (kappa =0.81) (table 1). Rates of agreement varied little by medical condition, site, level of physicians' training, or domain. The figure shows the variation among standardised patients. This calibration curve plots the percentage of checklist items done by the physician as noted by the standardised patient against the corresponding percentage indicated by the audio recording of that visit. Points cluster closely along the plotted regression line, which has an intercept of 0.4% and a slope of 1.03. (Perfect calibration would yield a line with intercept of 0% and a slope of 1.00.)


                              
View this table:
[in this window]
[in a new window]
 

Table 1. Agreement (%) between standardised patients' assessments and audio recordings of consultations



View larger version (15K):
[in this window]
[in a new window]
 
Percentage of items on checklist done by physician, as rated by standardised patients and as indicated by audio recordings of visits

Sensitivity of standardised patients' assessments, compared against the audio recording transcripts, was 95%, and specificity was 85% (table 2). About two thirds of the items where the two methods disagreed were reported as done by the standardised patient but determined to be not done according to the transcript. The area under a receiver operator characteristic curve, constructed by plotting the sensitivity and specificity values for each recorded visit, was 90% (see bmj.com).


                              
View this table:
[in this window]
[in a new window]
 

Table 2.  Sensitivity and specificity of standardised patients' assessments, with respect to audio recordings of consultations




    Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References

Although patients and physicians alike desire improved quality, accurate measurement of quality remains problematic. Comparisons of quality across physicians and sites are hampered by imperfect adjustments for variation in case mix. Also, the underlying data on quality are of uncertain validity, because of logistical and ethical difficulties in directly observing physicians while they care for patients. Measurement of quality has therefore relied largely on medical records, which at best are incomplete and at worst falsified. 8 9 Standardised patients, despite being costly to train and implement, overcome the first problem by providing presentations that are perfectly adjusted for case mix. They may also be able to overcome the second problem, if their validity in the outpatient setting can be shown.

Many studies have turned to standardised patients when highly accurate measures of quality are needed.10 Standardised patients are particularly well suited for cross system comparisons, such as comparing general practice with walk-in care or for assessing quality for potentially sensitive conditions such as sexually transmitted infections and HIV.11-13

Standardised patients are already considered the criterion standard for evaluating competence in specialties and have become part of national certification examinations in the United States. And while the accuracy of standardised patients is assumed to be high, it has not been prospectively evaluated. 14 15

We found that standardised patients were well calibrated to actual recordings of clinical encounters. No apparent systematic bias was seen by medical condition, site, level of physicians' training, or domain of the encounter. Intermethod reliability was uniformly high. Standardised patients showed excellent sensitivity, specificity, and operating characteristics.

Limitations of the study
We assessed only verbal communication. In future studies doctors may consent to unannounced visits that are video recorded. We did not measure within-actor variation. In the medical education setting such variation is managed by using standardised physicians to calibrate the standardised patients.16 Such results show that performances by a standardised patient are consistent from visit to visit. We believe from anecdotal evidence that this was the case in our study as well but have not measured it objectively.

Another issue that merits further study is how accurately standardised patients can measure quality through a single encounter---or even a short series of visits. Some studies suggest that a "first visit bias" may skew assessment of quality, since chronic diseases typically necessitate several visits and ongoing follow up.1 We deliberately used clinical scenarios that required immediate interventions, and we are separately analysing those items (particularly preventive care) that could be postponed to a future visit. Future research might assess how well standardised patients' measurements of quality for a few selected cases can comprehensively assess an individual physician's overall competence. 5 17 18

Setting standards
Using standardised patients to measure quality raises the question of how to set standards for what is considered adequate clinical competence. Panels of expert judges have been shown to be reliable for setting standards.19 The expert judges seem to use a compensatory model, where very good performance on some cases compensates for performing poorly on other cases.20 Analysis of the receiver operator characteristics of standardised patients has also been used to set standards in performance assessments of students at examination level. Receiver operator characteristic analysis shows that standardised patients can differentiate between disparate levels of competence---for example, accurately discriminating between second and fourth year medical students. 21 22

Conclusions
We believe standardised patients are particularly useful to validate innovative methods of quality measurement, such as computerised clinical vignettes. Vignettes, like standardised patients, inherently control for case mix variation; and, once validated against actual clinical practice, vignettes can be more widely used because they are cheaper and do not require subterfuge.23 Ultimately, accurate and affordable measurements of clinical practice underlie any effort to provide better quality for patients.24



    Acknowledgments

JL is assistant professor at the UCLA School of Public Health. JWP holds positions with the Veterans Affairs San Francisco Medical Center (staff physician), UCSF Department of Epidemiology and Biostatistics (associate professor), UCLA School of Public Health (associate professor), and RAND (senior social scientist). We thank the actors and the nurses, physicians, and staff at the study sites for their participation and Greer Rothman for preparation of the manuscript.

Contributors: See bmj.com

    Footnotes

Funding: This research was funded by Grant IIR 98118-1 from the Veterans Affairs Health Services Research and Development Service. From July 1998 to June 2001 JWP was the recipient of a senior research associate career development award from the Department of Veterans Affairs.

Competing interests: None declared.

The full version of this article appears on bmj.com


    References
Top
Abstract
Introduction
Methods
Results
Discussion
References

1. Beullens J, Rethans JJ, Goedhuys J, Buntinx F. The use of standardized patients in research in general practice. Fam Pract 1997; 14: 58-62[Abstract/Free Full Text].
2. Rethans JJ, Martin E, Metsemakers J. To what extent do clinical notes by general practitioners reflect actual medical practice? A study using simulated patients. Br J Gen Pract 1994; 44: 153-156[Web of Science][Medline].
3. Kopelow ML, Schnabl GK, Hassard TH, Tamblyn RM, Klass DJ, Beazley G, et al. Assessing practicing physicians in two settings using standardized patients. Acad Med 1992; 67(10 suppl): S19-S21[Web of Science][Medline].
4. Woodward CA, McConvey GA, Neufeld V, Norman GR, Walsh A. Measurement of physician performance by standardized patients. Med Care 1985; 23: 1019-1027[CrossRef][Web of Science][Medline].
5. Glassman PA, Luck J, O'Gara EM, Peabody JW. Using standardized patients to measure quality: evidence from the literature and a prospective study. Jt Comm J Qual Improv 2000; 26: 644-653[Medline].
6. Carney PA, Dietrich AJ, Freeman Jr DH, Mott LA. The periodic health examination provided to asymptomatic older women: an assessment using standardized patients. Ann Intern Med 1993; 119: 129-135[Abstract/Free Full Text].
7. Badger LW, deGruy F, Hartman J, Plant MA, Leeper J, Ficken R, Templeton B, et al. Stability of standardized patients' performance in a study of clinical decision making. Fam Med 1995; 27: 126-131[Medline].
8. Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med 2000; 108: 642-649[CrossRef][Web of Science][Medline].
9. Dresselhaus T, Luck J, Peabody JW. The ethical problem of false positives: a prospective evaluation of physician reporting in the medical record. J Med Ethics 2002; 5: 291-294.
10. Grant C, Nicholas R, Moore L, Salisbury C. An observational study comparing quality of care in walk-in centres with general practice and NHS Direct using standardised patients. BMJ 2002; 324: 1556[Abstract/Free Full Text].
11. Russell NK, Boekeloo BO, Rafi IZ, Rabin DL. Unannounced simulated patients' observations of physician STD/HIV prevention practices. Am J Prev Med 1992; 8: 235-240[Web of Science][Medline].
12. Russell NK, Boekeloo BO, Rafi IZ, Rabin DL. Using unannounced simulated patients to evaluate sexual risk assessment and risk reduction skills of practicing physicians. Acad Med 1991; 66: 87-95.
13. O'Hagan JJ, Botting CH, Davies LJ. The use of a simulated patient to assess clinical practice in the management of a high risk asthmatic. N Z Med J 1989; 102: 252-254[Web of Science][Medline].
14. Norman GR, Davis DA, Lamb S, Hanna E, Caulford P, Kaigas T. Competency assessment of primary care physicians as part of a peer review program. JAMA 1993; 270: 1046-1051[Abstract/Free Full Text].
15. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical examination exercise): a preliminary investigation. Ann Intern Med 1995; 123: 795-799[Abstract/Free Full Text].
16. Finlay IG, Scott NC, Kinnersley P. The assessment of communication skills in palliative medicine: a comparison of the scores of examiners and simulated patients. Med Educ 1995; 29: 424-429[Web of Science][Medline].
17. Dauphinee WD. Assessing clinical performance: where do we stand and what might we expect. JAMA 1995; 274: 741-743[Abstract/Free Full Text].
18. Gordon JJ, Saunders NA, Hennrikus D, Sanson-Fisher RW. Interns' performances with simulated patients at the beginning and the end of the intern year. J Gen Inter Med 1992; 7: 57-62[Web of Science][Medline].
19. Ross L, Clauser B, Margolis MJ, Orr NA, Klass D. An expert judgment approach to setting standards for a standardized-patient examination. Acad Med 1996; 71(10 suppl): S4-S6[CrossRef][Web of Science][Medline].
20. Margolis MJ, De Champlain AF, Klass DJ. Setting examination-level standards for a performance-based assessment of physicians' clinical skills. Acad Med 1998; 73(10 suppl): S114-S116[Web of Science][Medline].
21. Colliver JA, Barnhart AJ, Marcy ML, Verhulst SJ. Using a receiver operating characteristic (ROC) analysis to set passing standards for a standardized-patient examination of clinical competence. Acad Med 1994; 69(10 suppl): S37-S39[Web of Science][Medline].
22. Van der Vleuten CP, Norman GR, De Graff E. Pitfalls in the pursuit of objectivity: issues of reliability. Med Educ 1991; 25: 110-118[Web of Science][Medline].
23. Peabody JW, Luck J, Glassman P, Dresselhaus TR, Lee M. Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality. JAMA 2000; 283: 1715-1722[Abstract/Free Full Text].
24. Fihn SD. The quest to quantify quality. JAMA 2000; 283: 1740-1742[Free Full Text].

(Accepted 1 August 2002)


© BMJ 2002

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to StumbleUpon StumbleUpon   Add to Technorati Technorati    What's this?

Relevant Articles

Standardised patients are a valid measure of quality of care
BMJ 2002 325: 0. [Full Text] [PDF]

Communications and emotions
Robert Buckman
BMJ 2002 325: 672. [Extract] [Full Text] [PDF]

This article has been cited by other articles:

  • Kelly, F. S, Williams, K. A, Benrimoj, S. I (2009). Does Advice from Pharmacy Staff Vary According to the Nonprescription Medicine Requested?. The Annals of Pharmacotherapy 43: 1877-1886 [Abstract] [Full text]  
  • Weaver, M R, Myaya, M, Disasi, K, Regoeng, M, Matumo, H N, Madisa, M, Puttkammer, N, Speilberg, F, Kilmarx, P H, Marrazzo, J M (2008). Routine HIV testing in the context of syndromic management of sexually transmitted infections: outcomes of the first phase of a training programme in Botswana. Sex. Transm. Infect. 84: 259-264 [Abstract] [Full text]  
  • Leon, F. R., Lundgren, R., Jennings, V. (2008). Provider Selection of Evidence-Based Contraception Guidelines in Service Provision: A Study in India, Peru, and Rwanda. Eval Health Prof 31: 3-21 [Abstract]  
  • Hepner, K. A., Rowe, M., Rost, K., Hickey, S. C., Sherbourne, C. D., Ford, D. E., Meredith, L. S., Rubenstein, L. V. (2007). The Effect of Adherence to Practice Guidelines on Depression Outcomes. ANN INTERN MED 147: 320-329 [Abstract] [Full text]  
  • Leon, F. R., Arevalo, M., Lundgren, R., Jennings, V., Huapaya, A., Panfichi, R. (2007). Four Criteria to Evaluate Providers' Service-Delivery Response to New Contraceptive Introduction. Eval Rev 31: 364-390 [Abstract]  
  • Giesen, P., Ferwerda, R., Tijssen, R., Mokkink, H., Drijver, R., van den Bosch, W., Grol, R. (2007). Safety of telephone triage in general practitioner cooperatives: do triage nurses correctly estimate urgency?. Qual Saf Health Care 16: 181-184 [Abstract] [Full text]  
  • Eagles, J. M., Calder, S. A., Wilson, S., Murdoch, J. M., Sclare, P. D. (2007). Simulated patients in undergraduate education in psychiatry. Psychiatr. Bull. 31: 187-190 [Full text]  
  • Klass, D. (2007). Assessing Doctors at Work -- Progress and Challenges. NEJM 356: 414-415 [Full text]  
  • Jorg, F., Borgers, N., Schrijvers, A. J. P., Hox, J. J. (2006). Variation in Long-Term Care Needs Assessors' Willingness to Support Clients' Requests for Admission to a Residential Home: A Vignette Study. J Aging Health 18: 767-790 [Abstract]  
  • Mularski, R. A., Asch, S. M., Shrank, W. H., Kerr, E. A., Setodji, C. M., Adams, J. L., Keesey, J., McGlynn, E. A. (2006). The Quality of Obstructive Lung Disease Care for Adults in the United States as Measured by Adherence to Recommended Processes. Chest 130: 1844-1850 [Abstract] [Full text]  
  • Asch, S. M., Kerr, E. A., Keesey, J., Adams, J. L., Setodji, C. M., Malik, S., McGlynn, E. A. (2006). Who is at greatest risk for receiving poor-quality health care?. NEJM 354: 1147-1156 [Abstract] [Full text]  
  • Berner, E. S., Houston, T. K., Ray, M. N., Allison, J. J., Heudebert, G. R., Chatham, W. W., Kennedy, J. I. Jr., Glandon, G. L., Norton, P. A., Crawford, M. A., Maisiak, R. S. (2006). Improving Ambulatory Prescribing Safety with a Handheld Decision Support System: A Randomized Controlled Trial. J. Am. Med. Inform. Assoc. 13: 171-179 [Abstract] [Full text]  
  • Leon, F. R., Rios, A., Zumaran, A. (2005). Training x Trainee Interactions in a Family Planning Intervention. Eval Rev 29: 576-590 [Abstract]  
  • Franks, P., Fiscella, K., Shields, C. G., Meldrum, S. C., Duberstein, P., Jerant, A. F., Tancredi, D. J., Epstein, R. M. (2005). Are Patients' Ratings of Their Physicians Related to Health Outcomes?. Ann Fam Med 3: 229-234 [Abstract] [Full text]  
  • Kravitz, R. L., Epstein, R. M., Feldman, M. D., Franz, C. E., Azari, R., Wilkes, M. S., Hinton, L., Franks, P. (2005). Influence of Patients' Requests for Direct-to-Consumer Advertised Antidepressants: A Randomized Controlled Trial. JAMA 293: 1995-2002 [Abstract] [Full text]  
  • Asch, S. M., McGlynn, E. A., Hogan, M. M., Hayward, R. A., Shekelle, P., Rubenstein, L., Keesey, J., Adams, J., Kerr, E. A. (2004). Comparison of Quality of Care for Patients in the Veterans Health Administration and Patients in a National Sample. ANN INTERN MED 141: 938-945 [Abstract] [Full text]  
  • Peabody, J. W., Luck, J., Glassman, P., Jain, S., Hansen, J., Spell, M., Lee, M. (2004). Measuring the Quality of Physician Practice by Using Clinical Vignettes: A Prospective Validation Study. ANN INTERN MED 141: 771-780 [Abstract] [Full text]  
  • Kerr, E. A., McGlynn, E. A., Adams, J., Keesey, J., Asch, S. M. (2004). Profiling The Quality Of Care In Twelve Communities: Results From The CQI Study. Health Aff (Millwood) 23: 247-256 [Abstract] [Full text]  
  • McGlynn, E. A., Asch, S. M., Adams, J., Keesey, J., Hicks, J., DeCristofaro, A., Kerr, E. A. (2003). The Quality of Health Care Delivered to Adults in the United States. NEJM 348: 2635-2645 [Abstract] [Full text]  
  • Buckman, R. (2002). Communications and emotions. BMJ 325: 672-672 [Full text]  



Access jobs at BMJ Careers
Whats new online at Student 

BMJ