This text is posted as supplied by the author
Table A Inter-rater Reliability | ||
Question |
Inter-rater Correlations* | |
|
Development Data Set |
Validation Data Set |
Q1 Formulate Question |
0.98 |
0.89 |
Q2 Sources |
0.95 |
0.94 |
Q3 Searching |
0.89 |
0.92 |
Q4 Study Design |
0.90 |
0.96 |
Q5 Relevance |
0.76 |
0.84 |
Q6 Internal Validity |
0.85 |
0.72 |
Q7 Effect Size |
0.91 |
0.84 |
Q1 thru Q7 |
0.98 |
0.97 |
* All correlations are significant at p<.0.001 level.
Posted as supplied by author
Table B Item Analyses | |||
Question |
Item-Total Correlation* |
Item Difficulty† |
Item Discrimination‡ |
Q1 Question |
0.67 |
0.57 |
0.73 |
Q2 Sources |
0.47 |
0.68 |
0.41 |
Q3 Study Design |
0.58 |
0.41 |
0.59 |
Q4 Search |
0.71 |
0.63 |
0.68 |
Q5 Relevance |
0.50 |
0.34 |
0.45 |
Q6 Internal Validity |
0.61 |
0.73 |
0.68 |
Q7 Effect |
0.75 |
0.35 |
0.86 |
Q8 Sensitivity |
0.51 |
0.72 |
0.64 |
Specificity |
0.65 |
0.54 |
0.86 |
Positive Predictive Value |
0.56 |
0.55 |
0.73 |
Negative Predictive Value |
0.58 |
0.50 |
0.77 |
Likelihood Ratio Positive |
0.62 |
0.36 |
0.73 |
Q9 Absolute Risk Reduction |
0.63 |
0.59 |
0.77 |
Relative Risk Reduction |
0.60 |
0.42 |
0.68 |
Number Needed to Treat |
0.66 |
0.58 |
0.77 |
Q10 Confidence Interval |
0.75 |
0.55 |
0.82 |
Q11 Best Study Design, Diagnosis |
0.56 |
0.24 |
0.55 |
Q12 Best Study Design, Prognosis |
0.53 |
0.61 |
0.64 |
* r-values all significant at p <0 .001.
†Values represent proportion of scores that exceeded "passing" for each item.
‡Values represent the difference in proportions of test takers answering correctly between those scoring in the upper 27% on total score and those scoring in the lower 27%. See text for detail.
Posted as supplied by author
Table C Proportions of those Passing by Group | |||
Question |
Novice % Pass |
Expert % Pass |
Chi Square p-Value |
Q1 Question |
27% |
80% |
<0.001 |
Q2 Sources |
61% |
75% |
0.129 |
Q3 Study Design |
17% |
64% |
<0.001 |
Q4 Search |
44% |
80% |
0.001 |
Q5 Relevance |
30% |
37% |
0.494 |
Q6 Internal Validity |
56% |
91% |
<0.001 |
Q7 Effect |
12% |
58% |
<0.001 |
Q8 Sensitivity |
60% |
84% |
0.018 |
Specificity |
33% |
76% |
<0.001 |
Positive Predictive Value |
40% |
71% |
0.006 |
Negative Predictive Value |
35% |
66% |
0.007 |
Likelihood Ratio Positive |
15% |
58% |
<0.001 |
Q9 Absolute Risk Reduction |
33% |
87% |
<0.001 |
Relative Risk Reduction |
10% |
76% |
<0.001 |
Number Needed to Treat |
30% |
87% |
<0.001 |
Q10 Confidence Intervals |
26% |
86% |
<0.001 |
Q11 Best Study Design, Diagnosis |
10% |
39% |
0.004 |
Q12 Best Study Design, Prognosis |
41% |
83% |
<0.001 |
Posted as supplied by author
Fresno Test of Evidence Based Medicine
Grading Rubrics (Form A)
The practice of Evidence-Based Medicine (EBM) involves some basic knowledge and skills related to searching and evaluating medical literature. This UCSF-Fresno Medical Education tool is designed to assess the level at which you are already utilizing EBM skills. Please complete the entire test in one sitting. There are 7 short answer questions, 2 questions that require a series of mathematical calculations, and three fill-in-the-blank questions. Allow yourself at least 30 minutes to complete the test.
Answer questions 1-4 based on the following clinical scenarios:
- You have just seen Lydia who recently delivered a healthy baby. She plans to breastfeed, but also wants to start oral contraception. You generally prefer to prescribe combination oral contraceptives (estrogen + progesterone) but you have been told that these might more negatively affect her breastmilk production than progesterone only pills.
- John is an 11 year old boy who presents with primary enuresis. He has grown frustrated with the inconvenience and embarrassment of his problem. You have excluded the possibility of urinary tract anomalies and infection as possible causes. You consider recommending a bedwetting alarm, but a colleague tells you he thinks they’re "worthless" and suggests that you treat with imiprimine or desmopressin.
- Write a focused clinical question for each of these patient encounters that will help you organize a search of the clinical literature for an answer and choose the best article from among those you find.
Scoring Rubric for breast-feeding/contraception question. (When in doubt, consider whether what is written will contribute to an optimally specific search of the clinical literature. )
|
Population |
Intervention |
Comparison |
Outcome |
Excellent (3 pts) |
Multiple relevant descriptors e.g., "post partum woman," "breast feeding/lactating mother" or "breastfeeding mom desiring contraception," or "breast fed newborn" Note: "breastfeeding woman" is considered two descriptors. |
Includes specific intervention of interest; e.g. combined contraceptives (estrogen and progesterone), or specific individual components of contraception such as "estrogen" |
Identifies specific alternative of interest since pt. wants to use oral contraception e.g. progesterone only contraception
|
Outcome that is objective and meaningful to patient e.g. infant growth rate, number of lactation "drop outs," or maternal satisfaction with infant satiety or milk flow |
Strong (2 pts) |
One appropriate descriptor as above examples e.g. "woman," or "infant" or "breastfeeding" |
Mentions contraception or type of intervention, e.g. oral contraceptives , or hormones |
Mentions a specific comparison group e.g. placebo, or a specific form of contraception, or mothers not taking OCP’s |
Non-specific outcome e.g. "milk" or "breast feeding" OR Disease oriented outcome such as milk volume without accompanying measure of clinical relevance e.g. "milk volume" or "chemical composition of milk" or "breastmilk production" |
Limited (1 point) |
A single general descriptor unlikely to contribute to search e.g. "patient" |
Mentions intervention but unlikely to contribute to search e.g. "methods" "options" "treatments" |
Mentions comparison but unlikely to contribute to search e.g. "compared to other methods" (Note: Using a plural non-specific term, e.g. "various treatment options," should only be counted once, in the Intervention column) |
Reference to outcome, but so general as to be unlikely to contribute to search e.g. "effects" "change the outcome" |
Not Evident (0 pts) |
None of the above present |
None of the above present |
None of the above present |
None of the above present |
Scoring Rubric for bedwetting question. (When in doubt, consider whether what is written will contribute to an optimally specific search of the clinical literature.)
|
Population |
Intervention |
Comparison |
Outcome |
Excellent (3 pts) |
Multiple relevant descriptors e.g., "boy with primary enuresis" specific age group, gender, exclude infection or anatomic anomalies Note: "primary enuresis" is considered two descriptors. |
Specific intervention of interest; e.g. bedwetting alarm |
Identifies specific alternative treatment of interest e.g. "desmopressin acetate" or "imipramine" or "anti-depressants" |
Outcome that is objective and meaningful to patient e.g. dry nights |
Strong (2 pts) |
One appropriate descriptor as above examples e.g. "enuresis" or "child" |
Mentions type of intervention without specifics e.g. "Behavior modification" |
Mentions a specific comparison group e.g. "placebo" or "medical treatment" or "no treatment" |
Disease oriented outcome without accompanying measure of clinical relevance e.g. "urine output" |
Limited (1 pt) |
A single general descriptor unlikely to contribute to search e.g. "patient" |
Mentions intervention but unlikely to contribute to search e.g. "methods" "options" "treatment" |
Mentions comparison but unlikely to contribute to search e.g. "compared to other methods" Note: Using a plural non-specific term, e.g. "various treatment options" should only be counted once, in the Intervention column. |
Reference to outcome, but so general as to be unlikely to contribute to search e.g., "effective," "improvement," "success" "change the outcome" |
Not Evident (0 pts) |
None of the above present |
None of the above present |
None of the above present |
None of the above present |
- Where might clinicians go to find an answer to questions like these? Name as many possible types or categories of information sources as you can. You may feel that some are better than others, but discuss as many as you can to demonstrate your awareness of the strengths and weaknesses of common information sources in clinical practice. Describe the most important advantages and disadvantages for each type of information source you list.
|
Variety of Sources |
Convenience |
Clinical Relevance |
Validity |
Excellent (6 points) |
At least four types of sources listed. Types include:
|
Discussion includes at least 2 specific issues related to convenience, or mentions the same issue while discussing two different sources. Issues may include:
|
Discussion includes at least 2 specific issues related to relevance, or mentions the same issue while discussing two different sources. Issues may include:
|
Discussion includes at least 2 specific issues related to validity, or mentions the same issue while discussing two different sources. Issues may include:
|
Strong (4 points) |
Three types of sources listed. |
Includes 1 specific issue/explanation related to convenience |
Includes 1 specific issue/explanation related to relevance |
Includes 1 specific issue/explanation related to validity |
Limited (2 points) |
Two types of sources listed. |
Mentions convenience involved in using one or more source, but without explanation e.g. "convenient" or "easy" or "difficult" |
Mentions relevance of using one or more source, but without explanation e.g. "relevant" |
Mentions validity of using one or more source, but without explanation e.g. "good" "junk" |
Not Evident |
No variety. Only one source listed, or all sources of same type. |
No mention of convenience |
No mention of relevance |
No mention of validity |
- Choose to focus on one of the clinical scenarios (breastfeeding and oral contraceptives, or bedwetting alarm). What type of study (study design) would best be able to address this question? Why?
|
Study Design |
Justification |
Excellent (12 pts) |
Names one of the best sources: Randomized Controlled Trial or Randomized Trial, Systematic Review or Meta-Analysis of RCTS, Randomized, Double Blinded Clinical Trial |
Includes well-reasoned justification that reflects understanding of the importance of randomization and/orblinding. Explicitly connects randomization to reduction of confounding and/or blinding to observer or measurement bias. e.g. "An RCT will attempt to avoid any bias which would influence the outcome of the study through randomization" OR "best suited for therapy questions because it reduces bias and controls for confounding factors." |
Strong (9 pts) |
Describes but does not call by name one of the best sources as above e.g. "comparing two groups, one gets treatment, other gets placebo…" |
Justification is present, and touches on issues related to randomization and/or blinding, but less clearly articulated e.g. "groups should be similar" or "try to eliminate confounding factors" or "avoid selection bias" or "to be objective" or "to eliminate bias" |
Limited (6 pts) |
Describes or names a less desirable study design: e.g. "Cohort study"or "Prospective clinical trial" or meta-analysis of such studies, "longitudinal" or "prospective" |
Justification is present, and raises legitimate issues unrelated to randomization or blinding, such as cost effectiveness, ethical concerns, recall bias. May mention randomization or blinding but without explanation. (e.g. "best in a random and blind setting") e.g. "impossible to recruit women to get a placebo instead of birth control" or "chart reviews provide lots of data without much cost" |
Minimal (3 pts) |
Describes or names a poor study design to answer a treatment question: e.g. case control, cross sectional study, case report, "retrospective" Or describes a study with insufficient detail to identify a design: e.g. a comparative study |
Attempted justification, but arguments are non-specific and do not demonstrate understanding of the relationship between the design and various threats to validity May mention randomization or blinding but without explanation. (e.g. "best in a random and blind setting") e.g. "to ensure quality" or "to reduce potential conflicts" or "to compare" |
Not Evident (0 pts) |
None of above present |
None of above present |
- If you were to search Medline for original research on one of these questions, describe what your search strategy would be. Be as specific as you can about which topics and search categories (fields) you would search. Explain your rationale for taking this approach. Describe how you might limit your search if necessary and explain your reasoning.
|
Search Terms |
Tags |
Delimiters |
Excellent (8 pts) |
3 or more terms that reflect patient, intervention, comparison, and outcome (PICO) being considered |
Description of search strategy reflects understanding that articles in database are indexed by more than one field. Discusses one or more field/index/tag by name (MeSH, Title Word, Publication Title, language, Keyword, author, Journal title, etc.) and provides plausible rationale for search strategy using 1 or more of these indices e.g. "keyword is less specific than MESH" |
Describes more than one approach to limiting search (e.g., "limit to human" or "adult" or "English"), names a specific publication type, or describes of Clinical Queries in PubMed, or the use of Boolean operators or search combinations or includes terms related to an optimal study design (e.g. randomized) or suggests use of subheadings * NOTE: If the subject includes the name of the index when describing a delimiter (e.g. "check language as English") then we give credit for a tag as well as a method of delimiting. |
Strong (6 pts) |
2 terms from PICO |
Names 1 or more field or index category but does not provide plausible defense of search strategy based on this knowledge e.g. "I would do a keyword search…" |
Describes only 1 common method of limiting search |
Limited (3 pts) |
1 term from PICO |
NA |
NA |
Not evident (0 pts) |
Not present |
No evident understanding that articles "tagged" by different fields or indices |
No valid techniques for limiting a search listed |
- When you find a report of original research on these questions, what characteristics of the study will you consider to determine if it is relevant? Include examples. (Questions 6 and 7 will ask how to determine if the study is valid, and how important the findings are....for this question, focus on how to determine if it is really relevant to your practice.) (Questions 5-7 address critical review of literature divided into relevance, validity, and magnitude of effect size. These may be arbitrary subdivisions of the process of critical review. Therefore respondents may describe issues of relevance in answers to any of these 3 questions. Consider the responses to all 3 questions as one response when applying the criteria in the following rubric.)
|
The Question |
Description of Subjects |
Excellent (12 points) |
Well-reasoned and thoughtful discussion of the relevance of the independent and dependent variables used in the study including examples/specific reasons. May refer to:
e.g. "the test might work but if my practice can’t afford to buy the machine it doesn’t matter"
e.g. "did they measure dry nights after a week or after several months?" or "should measure infant growth, not just amount of milk produced"
|
Includes both:
e.g. "were the patients similar to mine in terms of age and race?" or "was it a hospital or clinic sample like my patients?" or "did patients have same level of disease severity as my patient?" or "did selection or inappropriate inclusion criteria result in a study population that differs from mine by race, age,etc" |
Strong (9 points) |
Less thoughtful discussion of the relevance of the independent and dependent variables used in the study. May include specific concepts or examples without clear rationale. May refer to:
e.g. "is it feasible?" or "can I actually use it?"
e.g. "look for patient-oriented outcomes" or "does the outcome matter to my patient?"
|
Includes one but not both:
e.g. "is the patient like mine?" or "education level of population" |
Limited (5 points) |
Response implies considerationof how well the study addresses the question at hand, but offers little discussion about why this may be important e.g. "what are the variables?"; "does it answer my question?"; "the outcome measure"; "the purpose of the study"; "will it impact my practice?"; "what type of OCP was used in the study?"; "length of follow-up" |
Response implies consideration of the study subjects, but offers no discussion of the connection between study subjects and target population or specific characteristics of the sample e.g. "is it an appropriate sample?" or "what was the response or participation rate?" or "what were the exclusion criteria?" or "selection bias" or "setting" or "where study was conducted" |
Not Evident (0 pts) |
No discussion of the research question and variables used to answer it. |
No discussion of the characteristics of the research subjects. |
6. When you find a report of original research on these questions, what characteristics of the study will you consider to determine if its findings are valid? Include examples(You've already addressed relevance, and question 7 will ask how to determine the importance of the findings...for this question, focus on the validity of the study.) (Questions 5-7 address critical review of literature divided into relevance, validity, and magnitude of effect size. These may be arbitrary subdivisions of the process of critical review. Therefore respondents may describe issues of validity in answers to any of these 3 questions. Consider the responses to all 3 questions as one response when applying the criteria in the following rubric.)
|
Internal Validity |
Excellent (24 pts) |
Lists or describes at least 5 issues important to internal validity, such as:
|
Strong (18 points) |
Identifies 3-4 specific issues as above. |
Limited (10 pts) |
Identifies 2 specific issues as above. |
Minimal (5 points) |
Mentions internal validity or lists one specific concept from examples above. |
Not Evident (0 pts) |
None of the above present |
7. When you find a report of original research on these questions, what characteristics of the findings will you consider to determine their magnitude and significance? Include examples. (You’ve already addressed relevance and validity…for this question, focus on how to determine the size and meaning of an effect reported in the study.) (Questions 5-7 address critical review of literature divided into relevance, validity, and magnitude of effect size. These may be arbitrary subdivisions of the process of critical review. Therefore respondents may describe issues of magnitude and significance in answers to any of these 3 questions. Consider the responses to all 3 questions as one response when applying the criteria in the following rubric.)
|
Magnitude |
Statistical Significance |
Excellent (12 pts) |
Response must clearly discuss both:
|
Well-reasoned and thoughtful discussion of the indices of statistical significance, including at least 2 specific examples of important related concepts such as:
|
Strong (9 pts) |
Response discusses one but not both:
|
Lists more than one concept (as above) with insufficient or absent discussion (e.g. "p-value and confidence intervals") OR Lists and discusses only one concept (e.g. "p-value less than <.05") |
Limited (5 pts) |
Response only suggests consideration of clinical significance or size of effect. (e.g. "does it matter?" "will it impact my practice") |
Mentions need to assess statistical significance or names only one concept from above without further discussion (e.g. "p-values") |
Not Evident (0 pts) |
None of the above present |
None of the above present |
8. A recent study of the diagnostic accuracy of arterial blood gas in diagnosis of pulmonary embolus included 212 patients with suspected pulmonary embolus, 49 of whom were subsequently determined to have pulmonary embolus. Of those with pulmonary embolus 41 had abnormal alveolar-arterial oxygen gradient ((A-a)DO2). Of the 163 patients determined not to have pulmonary embolus, 118 had abnormal (A-a)DO2.
(4 points each)
- Based on these results, the sensitivity of (A-a)DO2 for pulmonary embolus is .837__OR 41/49_
- Based on these results, the specificity of (A-a)DO2 for pulmonary embolus is .276__OR 45/163 _
- Based on these results, the positive predictive value of (A-a)DO2 for pulmonary embolus is .258_OR 41/159 OR 41/(41+118)
- Based on these results, the negative predictive value of (A-a)DO2 for pulmonary embolus is .849__OR 45/53 OR 45/(8+45)
- Based on these results, the likelihood ratio positive for an abnormal (A-a)DO2 for pulmonary embolus is 1.156 OR .84/(1-.28)
- 9. A recent randomized trial of found that 29% of diabetics with coronary heart disease (CHD) treated with pravastatin suffered a recurrent coronary event during 5 years of follow-up, while 37% of the placebo group suffered recurrent coronary events.
(4 points each)
- The absolute risk reduction for recurrent events is 8% OR .37-.29
- The relative risk reduction for recurrent events is 22% OR (.37-.29)/.37 OR .08/.37 OR 1-(.29/.37)
- The number needed to treat (NNT) to prevent one recurrent event is 12.5 OR 1/.08 OR 1/(.37-.29)
10. The recent HERS study compared women on estrogen supplements to women on placebo. Results revealed a relative risk of venous thromboembolic events is 2.89 for the women on estrogen. This suggests that estrogen treatment poses a coronary risk, but we wonder if this difference is statistically significant, so we look at the confidence interval. Give an example of a confidence interval that would support the conclusion that the rate of venous thromboembolic events was indeed (statistically) different for these two treatment groups. ___ (anything that encompasses 2.89 and doesn’t include 1.0) __ (4 points)
11. Which study design is best for a study about diagnosis? cross-sectional study OR "comparison of test with gold standard"
(4 points)
12. Which study design is best for a study about prognosis? cohort studies OR "prospective" OR "longitudinal"
(4 points)