Do general practitioners act consistently in real practice when they meet the same patient twice? examination of intradoctor variation using standardised (simulated) patients

Jan-Joost Rethans; Lars Saebu

doi:10.1136/bmj.314.7088.1170

General Practice

Do general practitioners act consistently in real practice when they meet the same patient twice? examination of intradoctor variation using standardised (simulated) patients

BMJ 1997; 314 doi: https://doi.org/10.1136/bmj.314.7088.1170 (Published 19 April 1997) Cite this as: BMJ 1997;314:1170

Jan-Joost Rethans, assistant professora,
Lars Saebu, general practitionerb

^a Centre for Research on Quality Assurance in General Practice, Department of General Practice, University of Limburg, PO Box 616, 6200 MD Netherlands
^b Department of General Practice, University of Trondheim, Norway

Correspondence to: Dr Rethans

Accepted 17 February 1997

Abstract

Objective: To assess the variation within individual general practitioners facing the same problem twice in actual practice under unbiased conditions.

Design: General practitioners were consulted during normal surgery hours by a standardised patient portraying a patient with angina pectoris. Six weeks later the same general practitioners were consulted again by a similar standardised patient portraying a similar case. The patients reported on the consultations.

Setting: Trondheim, Norway.

Subjects: Of 87 general practitioners invited by letter, 28 (32%) agreed to participate without hesitation; nine others (10%) wanted more information before consenting. From these 24 were selected and visited.

Main outcome measures: Number of actions undertaken from a guideline in both rounds of consultations. Duration of consultations.

Results: The mean (range, interquartile range) guideline score, total score, and duration of consultation were not significantly different between the first and second patient encounters for the group as a whole. For individual doctors the mean (SD) difference was −0.09 (3.36) for the guideline score, 0.30 (8.1) for the total score, and −0.87 (9.01) for consultation time.

Conclusions: The study shows that assessment of performance in real practice for a group of general practitioners is consistent from the first round of consultations to the second round. However, significant variation occurs in performance of individual physicians.

Key messages

Variation in the performance of doctors is a potential problem in ensuring patients receive agreed best standards of care
This study assesses the intradoctor variation in treating two standardised patients presenting with similar conditions in real practice
For a group of general practitioners performance in the two consultations was consistent
The performance of individual doctors differed when facing the same problem twice

Introduction

Variation between doctors is a reflection of the individual's art of medicine but may also be a threat to the scientific basis of practice.1 Variation in performance may be studied between countries,2 regions,3 hospitals,4 practices, and doctors.5 6 To try to minimise the variation between doctors national bodies have produced guidelines for good medical practice, both for medical specialties and general practice.7

Variation of performance is an important consideration in assessment of competence of general practitioners. The performance of doctors varies across different medical problems.8 For example, a doctor's performance in dealing with a patient with a urinary tract infection does not predict his or her performance with a patient with diarrhoea. This phenomenon has been labelled content specificity9 and is one of the main reasons why doctors are examined on different areas of medicine and with different problems.10

When assessing doctors' management of a single problem we need to know whether the doctor consistently performs to the assessed standard. Intradoctor (or intraobserver) variation may lead to different results when a doctor is faced with an identical problem twice. Few studies have addressed this problem, and their results are ambiguous. When medical students and specialists were presented with a clinical problem twice by standardised patients the correlation was only 0.60 between the two presentations.11 With medical students test-retest reliability on the same station of an objective structured clinical examination was 0.66-0.88.8 In a study with two independent clinical assessments by a single clinician (three months apart) of the same set of 100 fundus photographs, 88 of 100 patients received identical assessment.12 Repetition of identical tasks by medical students within the same exam did not improve their scores.13 However, these studies were run in examination laboratory settings and may be biased since the subjects knew they were being tested and were likely to recognise the second presentation. In addition, performance under examination circumstances may differ from performance in practice.14 To overcome these problems we did a study to find out whether and to what extent intradoctor variation–that is the variation within doctors facing a similar problem twice–in real life general practice exists under unbiased conditions.

Subjects and methods

We used standardised patients for this study because this method has proved to be reliable, valid, feasible, and acceptable in general practice.15 16 A standardised role of an elderly patient with angina pectoris was constructed. The role focused on the medical history with no abnormal physical signs and normal laboratory and electrocardiographic findings. Two healthy women, aged 69 and 70, were selected as standardised patients and paid to participate. They signed written consent to keep all medical and personal information about the general practitioners in the project strictly for research purposes.

The patients were trained to present a standardised complaint and to score history taking, physical and laboratory examination, instructions given to the patient, treatment, and follow up against a guideline on managing angina pectoris. This guideline was based on relevant general practice literature (such as the guidelines of the Dutch College of General Practitioners) and discussed with two experienced general practitioners and an experienced cardiologist.17 The guideline contained only items considered necessary to manage angina pectoris as presented by the standardised patients.

To ensure the reliability and consistency of scoring by the standardised patients we used standard procedures.18 19 In brief, reports of standardised patients during training (before and between the first and second round) were compared with reports of a panel of doctors about the same consultation. These reliability and consistency κ scores were 0.85 (maximum κ=1.0). Several scores were used to assess the performance of the general practitioners. Firstly, a guideline score–that is, the number of items of the guideline performed by the general practitioner in a consultation. Secondly, a total score–that is, all items (guideline plus non-guideline items) performed by a general practitioner in a consultation. Patients also recorded the duration of visits in minutes using a wristwatch with stopwatch facilities.

One year before the actual visits all 87 general practitioners in Trondheim, Norway, were informed by letter about the objectives of the study and invited to give written acceptance of standardised patients into their practices. The dates, number, and content of the visits were not mentioned. For budgetary reasons it was decided beforehand that 24 general practitioners would participate.

Patients took their original health insurance identifying papers and enlisted in the practices of the selected general practitioners by using techniques reported earlier.16 20 The general practitioners were visited by the standardised patients in two rounds in March and May 1994. Patient A visited 12 of them in the first round and the other 12 in the second round, while patient B visited the doctors in the reverse order. All participating general practitioners were presented with similar standardised presentations twice.

The Wilcoxon signed rank test (paired design) was used to look for differences in the doctors' performances in the first and second round. To assess intradoctor variation, the scores of individual doctors on the two rounds were analysed by the Bland and Altman method.21 The Wilcoxon signed rank test (paired design) was used to assess whether the two standardised patients showed any consistent difference in the way they scored for consultations for the guideline score (the most important score).

Results

Of the 87 doctors asked to participate, 53 (61%) replied. Twenty-eight (32%) answered yes without any further information; nine others asked for more information before agreeing.We selected 24 doctors from those that agreed. After a visit in the second round one general practitioner reported having detected the patient. This left 23 general practitioners and 46 visits for analysis.

Table 1) shows the performance of general practitioners for each item of the guideline in each consultation. Table 2) gives the guideline and total scores and consultation times in the two rounds. We found no significant difference between the first and second round for any of the items or scores assessed. However, to assess intradoctor variation the scores of individual physicians during the first round have to be compared with their individual scores during the second round. This is indicated by the standard deviations in table 2). For example, the standard deviation of the guideline score is 3.36, suggesting that the average within doctor difference for number of guideline items scored is around 3; the average inconsistency in total score is around 8 and the average difference in length of consultation around 9 minutes. These data indicate substantial intradoctor variation between the two rounds. Means (interquartile range) of the guideline scores for the two standardised patients were 16.22 (14 to 19) and 16.04 (14 to 18). These were not significantly different by Wilcoxon signed rank test (paired design), suggesting the two patients showed no consistent differences.

Table 1

Number of general practitioners performing items listed in the guideline for angina pectoris during consultations with two standardised patients

View this table:

Table 2

Mean number of actions scored by standardised patients for consultations with 23 general practitioners and mean differences between two consultations

View this table:

Discussion

We believe that this is the first study of intradoctor variation in real practice using standardised patients presenting similar problems. This design is the only way to ensure subjects do not know they are being observed, thus removing an important source of bias. In examination or test settings subjects would easily spot the second presentation.

Clearly, this study has some limitations. There were only 23 general practitioners and only one standardised problem was presented twice, resulting in 46 consultations. However, the few studies set in examination conditions that have used more comparisons have produced ambiguous results. Getting funding for a larger study incorporating more patients and comparisons would be difficult until a pilot study such as this one has been done. Only 32% of the doctors approached agreed to participate without further hesitation, which may mean that the participants reflect a more competent sample of general practitioners.

We believe, however, that our results are valid as the doctors were unaware that they were being assessed. The results show that the assessment of performance was consistent from the first round of consultations to the second round. This means that anyone wanting to give feedback to a group of practitioners on their management of a particular problem would probably need to do only one assessment. However, for assessment of performance of a single physician the results are quite different. We found appreciable intradoctor variation in the management of the two patients. Analysis showed that the personality of the two standardised patients had no effect on the results. A further study using more problems and more presentations of the same problem would give a better indication of whether intradoctor variation is a problem. This may in turn lead to reassessment of the way cases of sampled for examination and licensing of doctors and for quality assessment.

Does the variation matter

A further question is to what extent the intradoctor variation found in this study is a problem? Different scorings (for example a weighted score) may have resulted in different results. The panel which constructed the guideline thought all guideline items were essential and therefore distinguished only between these and non-guideline items. Earlier studies with standardised patients that used more differentiated scores (obligatory, intermediate, and superfluous items) found no differences between these scores.14 Our data should act as a stimulus for careful thinking about differentiated scores of guidelines. Some may argue that only evidence based items are important to record in this type of study, but in general practice this might result in only one or two items per case. All other items are then reflections of the individual performance of a doctor.

To try to find an explanation for the differences in the results of individual general practitioners between the two consultations we carried out some secondary analyses–for example, to determine if there were different outcomes for visits before or after lunch. These analyses all gave negative results. Our data showed two consultations of 40 minutes, which is unusually long. Although we do not know exactly what happened in these consultations, it seems likely that the doctor received a telephone call during these visits. Since these 40 minutes conversations could have a relatively large effect in the inconsistency in the duration of consultation we performed the same calculations for the duration of visits without these consultations and by substituting the 40 minutes by 30 minutes (30 minutes being the second longest consultation). Although the standard deviations were reduced to 5.78 (without these visits) and to 6.99 (for 30 minutes), the conclusions remained the same. We discussed our results with several groups of general practitioners and received reactions such as “this is just real practice and so it should be” or “on Monday after a sleepless night doctors perform differently from Tuesdays after a good rest.”

In conclusion this study shows that intradoctor variation occurs in day to day practice. The implications of this variation remain undetermined, and documentation of what is really going on in doctors' surgeries remains a great challenge.

Acknowledgements

We thank Arnold Kester (department of biostatistics, University of Limburg) and the General Practitioners Writers Association (in particular Professor Robin Hull) for their help with this paper.

Funding: Norwegian Fund for Quality Assurance (Kvalitetssikringsfondet), grant number 93007.

Conflict of interest: None.

References

1.↵
1. Anderson TF,
2. Mooney G
eds. The challenge of medical practice variation. London: Macmillan, 1990.
2.↵
1. McPherson K,
2. Strong PM,
3. Epstein A,
4. Jones L
. Regional variations in the use of common surgical procedures: within and between England and Wales, Canada and the United States of America. Soc Sci Med 1981;18:273–88.
3.↵
1. Wennberg J
. McPherson K, Caper P. Will payment based on diagnostic-related groups control hospital costs? N Engl J Med 1984;311:295–300.
4.↵
1. McPherson K
. Variation in hospital rates: why and how to study them. In: Ham C, ed. Health care variations: assessing the evidence. London: Kings Fund Institute, 1988:120–34.
5.↵
1. Marinus A
. Inter-doktervariatie in de huisartspraktijk. Meppel: Krips Repro, 1993. (Dissertation with summary in English.)
6.↵
1. Rethans JJ,
2. Sturmans F,
3. Drop R,
4. Vleuten vd C
. Assessment of the performance of general practitioners by the use of standardised (simulated) patients. Br J Gen Pract 1991;41:97–9.
OpenUrl Abstract/FREE Full Text
7.↵
1. Grimshaw J,
2. Russell I
. Achieving health gain through clinical guidelines. l. Developing scientific valid guidelines. Quality in Health Care 1993;2:243–8.
8.↵
1. Roberst J,
2. Norman G
. Reliability and learning from the objective structured clinical examination Med Educ 1990;24:219–23.
9.↵
1. Elstein A,
2. Shulman L,
3. Sprafka S
. Medical problem solving. Cambridge, MA: Harvard University Press, 1978.
10.↵
1. Newble D,
2. Jolly B,
3. Wakeford R
eds. The certification and recertification of doctors. Issues in the assessment of clinical competence. Cambridge: Cambridge University Press, 1994.
11.↵
1. Norman GR,
2. Tugwell P,
3. Feightner JW,
4. Muzzin LJ,
5. Jacoby LL
. Knowledge and clinical problem-solving. Med Educ 1985;19:344–56.
12.↵
1. Aoki N,
2. Horibe H,
3. Ohno Y,
4. Hayakawa N,
5. Kondo R
. Epidemiological evaluation of funduscopic findings in cerebral diseases. III. Observer variability and reproducibility for funduscopic findings. Jpn Circ J 1977:41:11.
13.↵
1. Hodder RV,
2. Rivington RN,
3. Calcutt LE,
4. Hart IR
. The effectiveness of immediate feedback during the objective structured clinical examination. Med Educ 1989;23:184–8.
OpenUrl PubMed Web of Science
14.↵
1. Rethans JJ,
2. Sturmans F,
3. Drop R,
4. Vleuten vd C,
5. Hobus P
. Does competence of general practitioners predict their performance? Comparison between examination setting and actual practice. BMJ 1991;303:1377–80.
15.↵
1. Vleutenvd CPM,
2. Swanson DB
. Assessment of clinical skills with standardised patients: state of the art. Teaching and Learning in Medicine 1990;2:58–76.
16.↵
1. Rethans JJ,
2. Drop R,
3. Sturmans F,
4. Vleuten vd C
. A method for introducing standardised (simulated) patients into general practice consultations. Br J Gen Pract 1991;41:94–6.
OpenUrl Abstract/FREE Full Text
17.↵
1. Rutten FM,
2. Bohnen AM,
3. Hufman P,
4. Bruinsma M,
5. Leerink HJG,
6. Strootman FA,
7. et al
. NHG standard Angina Pectoris. Huisurts Wet 1994;37:398–406.
18.↵
1. McClure CI,
2. Gall EP,
3. Meredith KE,
4. Gooden MA,
5. Boyer JT
. Assessing clinical judgement with standardised patient. J Fam Pract 1985;20:457–64.
19.↵
1. Rethans JJ,
2. Boven van CPA
. Simulated patients in general practice: a different look at the consultation. BMJ 1987;294:809–12.
20.↵
1. Saebu L,
2. Rethans JJ,
3. Johannessen T,
4. Westin S.
. Standardiserte pasienter I allmennpraksis. En ny metode for kvalitetssikring I Norge. Tidsskr Nor Laegefor 1995;115:3117–9.
OpenUrl PubMed
21.↵
1. Bland JM,
2. Altman DG
. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;i:307–10.

Article tools

0 responses

Respond to this article
Print
Alerts & updates
Article alerts
Please note: your email address is provided to the journal, which may use this information for marketing purposes.

Log in or register:

Username *

Password *

Register for alerts

If you have registered for alerts, you should use your registered email address as your username
Citation tools
Download this article to citation manager

Jan-Joost Rethans, Lars Saebu

Rethans J, Saebu L. Do general practitioners act consistently in real practice when they meet the same patient twice? examination of intradoctor variation using standardised (simulated) patients BMJ 1997; 314 :1170 doi:10.1136/bmj.314.7088.1170

BibTeX (win & mac)Download

EndNote (tagged)Download

EndNote 8 (xml)Download

RefWorks Tagged (win & mac)Download

RIS (win only)Download

MedlarsDownload

Help

If you are unable to import citations, please contact technical support for your product directly (links go to external sites):

EndNote

ProCite

Reference Manager

RefWorks

Zotero
Request permissions

Topics

[1] 1.↵
Anderson TF,
Mooney G
eds. The challenge of medical practice variation. London: Macmillan, 1990.

[2] Anderson TF,

[3] Mooney G

[4] 2.↵
McPherson K,
Strong PM,
Epstein A,
Jones L
. Regional variations in the use of common surgical procedures: within and between England and Wales, Canada and the United States of America. Soc Sci Med 1981;18:273–88.

[5] McPherson K,

[6] Strong PM,

[7] Epstein A,

[8] Jones L

[9] 3.↵
Wennberg J
. McPherson K, Caper P. Will payment based on diagnostic-related groups control hospital costs? N Engl J Med 1984;311:295–300.

[10] Wennberg J

[11] 4.↵
McPherson K
. Variation in hospital rates: why and how to study them. In: Ham C, ed. Health care variations: assessing the evidence. London: Kings Fund Institute, 1988:120–34.

[12] McPherson K

[13] 5.↵
Marinus A
. Inter-doktervariatie in de huisartspraktijk. Meppel: Krips Repro, 1993. (Dissertation with summary in English.)

[14] Marinus A

[15] 6.↵
Rethans JJ,
Sturmans F,
Drop R,
Vleuten vd C
. Assessment of the performance of general practitioners by the use of standardised (simulated) patients. Br J Gen Pract 1991;41:97–9.
OpenUrl Abstract/FREE Full Text

[16] Rethans JJ,

[17] Sturmans F,

[18] Drop R,

[19] Vleuten vd C

[20] 7.↵
Grimshaw J,
Russell I
. Achieving health gain through clinical guidelines. l. Developing scientific valid guidelines. Quality in Health Care 1993;2:243–8.

[21] Grimshaw J,

[22] Russell I

[23] 8.↵
Roberst J,
Norman G
. Reliability and learning from the objective structured clinical examination Med Educ 1990;24:219–23.

[24] Roberst J,

[25] Norman G

[26] 9.↵
Elstein A,
Shulman L,
Sprafka S
. Medical problem solving. Cambridge, MA: Harvard University Press, 1978.

[27] Elstein A,

[28] Shulman L,

[29] Sprafka S

[30] 10.↵
Newble D,
Jolly B,
Wakeford R
eds. The certification and recertification of doctors. Issues in the assessment of clinical competence. Cambridge: Cambridge University Press, 1994.

[31] Newble D,

[32] Jolly B,

[33] Wakeford R

[34] 11.↵
Norman GR,
Tugwell P,
Feightner JW,
Muzzin LJ,
Jacoby LL
. Knowledge and clinical problem-solving. Med Educ 1985;19:344–56.

[35] Norman GR,

[36] Tugwell P,

[37] Feightner JW,

[38] Muzzin LJ,

[39] Jacoby LL

[40] 12.↵
Aoki N,
Horibe H,
Ohno Y,
Hayakawa N,
Kondo R
. Epidemiological evaluation of funduscopic findings in cerebral diseases. III. Observer variability and reproducibility for funduscopic findings. Jpn Circ J 1977:41:11.

[41] Aoki N,

[42] Horibe H,

[43] Ohno Y,

[44] Hayakawa N,

[45] Kondo R

[46] 13.↵
Hodder RV,
Rivington RN,
Calcutt LE,
Hart IR
. The effectiveness of immediate feedback during the objective structured clinical examination. Med Educ 1989;23:184–8.
OpenUrl PubMed Web of Science

[47] Hodder RV,

[48] Rivington RN,

[49] Calcutt LE,

[50] Hart IR

[51] 14.↵
Rethans JJ,
Sturmans F,
Drop R,
Vleuten vd C,
Hobus P
. Does competence of general practitioners predict their performance? Comparison between examination setting and actual practice. BMJ 1991;303:1377–80.

[52] Rethans JJ,

[53] Sturmans F,

[54] Drop R,

[55] Vleuten vd C,

[56] Hobus P

[57] 15.↵
Vleutenvd CPM,
Swanson DB
. Assessment of clinical skills with standardised patients: state of the art. Teaching and Learning in Medicine 1990;2:58–76.

[58] Vleutenvd CPM,

[59] Swanson DB

[60] 16.↵
Rethans JJ,
Drop R,
Sturmans F,
Vleuten vd C
. A method for introducing standardised (simulated) patients into general practice consultations. Br J Gen Pract 1991;41:94–6.
OpenUrl Abstract/FREE Full Text

[61] Rethans JJ,

[62] Drop R,

[63] Sturmans F,

[64] Vleuten vd C

[65] 17.↵
Rutten FM,
Bohnen AM,
Hufman P,
Bruinsma M,
Leerink HJG,
Strootman FA,
et al
. NHG standard Angina Pectoris. Huisurts Wet 1994;37:398–406.

[66] Rutten FM,

[67] Bohnen AM,

[68] Hufman P,

[69] Bruinsma M,

[70] Leerink HJG,

[71] Strootman FA,

[72] et al

[73] 18.↵
McClure CI,
Gall EP,
Meredith KE,
Gooden MA,
Boyer JT
. Assessing clinical judgement with standardised patient. J Fam Pract 1985;20:457–64.

[74] McClure CI,

[75] Gall EP,

[76] Meredith KE,

[77] Gooden MA,

[78] Boyer JT

[79] 19.↵
Rethans JJ,
Boven van CPA
. Simulated patients in general practice: a different look at the consultation. BMJ 1987;294:809–12.

[80] Rethans JJ,

[81] Boven van CPA

[82] 20.↵
Saebu L,
Rethans JJ,
Johannessen T,
Westin S.
. Standardiserte pasienter I allmennpraksis. En ny metode for kvalitetssikring I Norge. Tidsskr Nor Laegefor 1995;115:3117–9.
OpenUrl PubMed

[83] Saebu L,

[84] Rethans JJ,

[85] Johannessen T,

[86] Westin S.

[87] 21.↵
Bland JM,
Altman DG
. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;i:307–10.

[88] Bland JM,

[89] Altman DG

Do general practitioners act consistently in real practice when they meet the same patient twice? examination of intradoctor variation using standardised (simulated) patients

Abstract

Key messages

Introduction

Subjects and methods

Results

Discussion

Does the variation matter

Acknowledgements

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

Do general practitioners act consistently in real practice when they meet the same patient twice? examination of intradoctor variation using standardised (simulated) patients

Abstract

Key messages

Introduction

Subjects and methods

Results

Discussion

Does the variation matter

Acknowledgements

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information