# Quantifying possible bias in clinical and epidemiological studies with quantitative bias analysis: common approaches and limitations

BMJ 2024; 385 doi: https://doi.org/10.1136/bmj-2023-076365 (Published 02 April 2024) Cite this as: BMJ 2024;385:e076365- Jeremy P Brown, doctoral researcher1,
- Jacob N Hunnicutt, director2,
- M Sanni Ali, assistant professor1,
- Krishnan Bhaskaran, professor1,
- Ashley Cole, director3,
- Sinead M Langan, professor1,
- Dorothea Nitsch, professor1,
- Christopher T Rentsch, associate professor1,
- Nicholas W Galwey, statistics leader4,
- Kevin Wing, assistant professor1,
- Ian J Douglas, professor1

^{1}Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK^{2}Epidemiology, Value Evidence and Outcomes, R&D Global Medical, GSK, Collegeville, PA, USA^{3}Real World Analytics, Value Evidence and Outcomes, R&D Global Medical, GSK, Collegeville, PA, USA^{4}R&D, GSK Medicines Research Centre, GSK, Stevenage, UK

- Correspondence to: J P Brown jeremy.brown{at}lshtm.ac.uk (or @jeremy_pbrown on X)

- Accepted 12 February 2024

Bias in epidemiological studies can adversely affect the validity of study findings. Sensitivity analyses, known as quantitative bias analyses, are available to quantify potential residual bias arising from measurement error, confounding, and selection into the study. Effective application of these methods benefits from the input of multiple parties including clinicians, epidemiologists, and statisticians. This article provides an overview of a few common methods to facilitate both the use of these methods and critical interpretation of applications in the published literature. Examples are given to describe and illustrate methods of quantitative bias analysis. This article also outlines considerations to be made when choosing between methods and discusses the limitations of quantitative bias analysis.

Bias in epidemiological studies is a major concern. Biased studies have the potential to mislead, and as a result to negatively affect clinical practice and public health. The potential for residual systematic error due to measurement bias, confounding, or selection bias is often acknowledged in publications but is seldom quantified.1 Therefore, for many studies it is difficult to judge the extent to which residual bias could affect study findings, and how confident we should be about their conclusions. Increasingly large datasets with millions of patients are available for research, such as insurance claims data and electronic health records. With increasing dataset size, random error decreases but bias remains, potentially leading to incorrect conclusions.

Sensitivity analyses to quantify potential residual bias are available.234567 However, use of these methods is limited. Effective use typically requires input from multiple parties (including clinicians, epidemiologists, and statisticians) to bring together clinical and domain area knowledge, epidemiological expertise, and a statistical understanding of the methods. Improved awareness of these methods and their pitfalls will enable more frequent and effective implementation, as well as critical interpretation of their application in the medical literature.

In this article, we aim to provide an accessible introduction, description, and demonstration of three common approaches of quantitative bias analysis, and to describe their potential limitations. We briefly review bias in epidemiological studies due to measurement error, confounding, and selection. We then introduce quantitative bias analyses, methods to quantify the potential impact of residual bias (ie, bias that has not been accounted for through study design or statistical analysis). Finally, we discuss limitations and pitfalls in the application and interpretation of these methods.

### Summary points

Quantitative bias analysis methods allow investigators to quantify potential residual bias and to objectively assess the sensitivity of study findings to this potential bias

Bias formulas, bounding methods, and probabilistic bias analysis can be used to assess sensitivity of results to potential residual bias; each of these approaches has strengths and limitations

Quantitative bias analysis relies on assumptions about bias parameters (eg, the strength of association between unmeasured confounder and outcome), which can be informed by substudies, secondary studies, the literature, or expert opinion

When applying, interpreting, and reporting quantitative bias analysis, it is important to transparently report assumptions, to consider multiple biases if relevant, and to account for random error

## Types of bias

All clinical studies, both interventional and non-interventional, are potentially vulnerable to bias. Bias is ideally prevented or minimised through careful study design and the choice of appropriate statistical methods. In non-interventional studies, three major biases that can affect findings are measurement bias (also known as information bias) due to measurement error (referred to as misclassification for categorical variables), confounding, and selection bias.

Misclassification occurs when one or more categorical variables (such as the exposure, outcome, or covariates) are mismeasured or misreported.8 Continuous variables might also be mismeasured leading to measurement error. As one example, misclassification occurs in some studies of alcohol consumption owing to misreporting by study participants of their alcohol intake.910 As another example, studies using electronic health records or insurance claims data could have outcome misclassification if the outcome is not always reported to, or recorded by, the individual’s healthcare professional.11 Measurement error is said to be differential when the probability of error depends on another variable (eg, differential participant recall of exposure status depending on the outcome). Errors in measurement of multiple variables could be dependent (ie, associated with each other), particularly when data are collected from one source (eg, electronic health records). Measurement error can lead to biased study findings in both descriptive and aetiological (ie, cause-effect) non-interventional studies.12

Confounding arises in aetiological studies when the association between exposure and outcome is not solely due to the causal effect of the exposure, but rather is partly or wholly due to one or more other causes of the outcome associated with the exposure. For example, researchers have found that greater adherence to statins is associated with a reduction in motor vehicle accidents and an increase in the use of screening services.13 However, this association is almost certainly not due to a causal effect of statins on these outcomes, but more probably because attitudes to precaution and risk that are associated with these outcomes are also associated with adherence to statins.

Selection bias occurs when non-random selection of people or person time into the study results in systematic differences between results obtained in the study population and results that would have been obtained in the population of interest.1415 This bias can be due to selection at study entry or due to differential loss to follow-up. For example, in a cohort study where the patients selected are those admitted to hospital in respiratory distress, covid-19 and chronic obstructive pulmonary disease might be negatively associated, even if there was no association in the overall population, because if you do not have one condition it is more likely you have the other condition in order to be admitted.16 Selection bias can affect both descriptive and aetiological non-interventional studies.

## Handling bias in practice

All three biases should ideally be minimised through study design and analysis. For example, misclassification can be reduced by the use of a more accurate measure, confounding through measurement of all relevant potential confounders and their subsequent adjustment, and selection bias through appropriate sampling from the population of interest and accounting for loss to follow-up. Other biases should also be considered, for example, immortal time bias through the appropriate choice of time zero, and sparse data bias through collection of a sample of sufficient size or by the use of penalised estimation.1718

Even with the best available study design and most appropriate statistical analysis, we typically cannot guarantee that residual bias will be absent. For instance, it is often not possible to perfectly measure all required variables, or it might be either impossible or impractical to collect or obtain data on every possible potential confounder. For instance, studies conducted using data collected for non-research purposes, such as insurance claims and electronic health records, are often limited to the variables previously recorded. Randomly sampling from the population of interest might also not be practically feasible, especially if individuals are not willing to participate.

To ignore potential residual biases can lead to misleading results and erroneous conclusions. Often the potential for residual bias is acknowledged qualitatively in the discussion, but these qualitative arguments are typically subjective and often downplay the impact of any bias. Heuristics are frequently relied on, but these can lead to an misestimation of the potential for residual bias.19 Quantitative bias analysis allows both authors and readers to assess robustness of study findings to potential residual bias rigorously and quantitatively.

## Quantitative bias analysis

When designing or appraising a study, several key questions related to bias should be considered (box 1).20 If, on the basis of the answers to these questions, there is potential for residual bias(es), then quantitative bias analysis methods can be considered to estimate the robustness of findings.

### Key questions related to bias when designing and appraising non-interventional studies

Misclassification and measurement error: Are the exposure, outcome, and covariates likely to be measured and recorded accurately?

Confounding: Are there potential causes of the outcome, or proxies for these causes, which might differ in prevalence between exposure groups? Are these potential confounders measured and controlled through study design or analysis?

Selection bias: What is the target population? Are individuals in the study representative of this target population?

Many methods for quantitative bias analysis exist, although only a few of these are regularly applied in practice. In this article, we will introduce three straightforward, commonly applied, and general approaches1: bias formulas, bounding methods, and probabilistic bias analysis. Alternative methods are also available, including methods for bias adjustment of linear regression with a continuous outcome.72122 Methods for dealing with misclassification of categorical variables are outlined in this article. Corresponding methods for sensitivity analysis to deal with mismeasurement of continuous variables are available and are described in depth in the literature.2324

## Bias formulas

We can use simple mathematical formulas to estimate the bias in a study and to estimate what the results would be in the absence of that bias.425262728 Commonly applied formulas, along with details of available software to implement methods listed, are provided in the appendices. Some of these methods can be applied to the summary results (eg, risk ratio), whereas other methods require access to 2×2 tables or participant level data.

These formulas require us to specify additional information, typically not obtainable from the study data itself, in the form of bias parameters. Values for these parameters quantify the extent of bias present due to confounding, misclassification, or selection.

Bias formulas for unmeasured confounding generally require us to specify the following bias parameters: prevalence of the unmeasured confounder in the unexposed individuals, prevalence of the unmeasured confounder in the exposed individuals (or alternatively the association between exposure and unmeasured confounder), and the association between unmeasured confounder and outcome.42829

These bias formulas can be applied to the summary results (eg, risk ratios, odds ratios, risk differences, hazard ratios) and to 2×2 tables, and they produce corrected results assuming the specified bias parameters are correct. Generally, the exact bias parameters are unknown so a range of parameters can be entered into the formula, producing a range of possible bias adjusted results under more or less extreme confounding scenarios.

Bias formulas for misclassification work in a similar way, but typically require us to specify positive predictive value and negative predictive value (or sensitivity and specificity) of classification, stratified by exposure or outcome. These formulas typically require study data in the form of 2×2 tables.730

Bias formulas for selection bias are applicable to the summary results (eg, risk ratios, odds ratios) or to 2×2 tables, and normally require us to specify probabilities of selection into the study for different levels of exposure and outcome.25 When participant level data are available, a general method of bias analysis is to weight each individual by the inverse of their probability of selection.31Box 2 describes an example of the application of bias formulas for selection bias.

### Application of bias formulas for selection bias

In a cohort study of pregnant women investigating the association between lithium use (relative to non-use) and cardiac malformations in liveborn infants, the observed covariate adjusted risk ratio was 1.65 (95% confidence interval 1.02 to 2.68).32 Only liveborn infants were selected into the study; therefore, there was potential for selection bias if differences in the termination probabilities of fetuses with cardiac malformations existed between exposure groups.

Because the outcome is rare, the odds ratio approximates the risk ratio, and we can apply a bias formula for the odds ratio to the risk ratio. The bias parameters are selection probabilities for the unexposed group with outcome S_{01}, exposed group with outcome S_{11}, unexposed group without outcome S_{00}, and exposed group without outcome S_{10}:

OR_{BiasAdj} = OR_{Obs} × ((S_{01}×S_{10}) ÷ (S_{00}×S_{11}))

(Where OR_{BiasAdj }is the bias adjusted odds ratio and OR_{Obs}is the observed odds ratio.)

For example, if we assume that the probability of terminations is 30% among the unexposed group (ie, pregnancies with no lithium dispensation in first trimester or three months earlier) with malformations, 35% among the exposed group (ie, pregnancies with lithium dispensation in first trimester) with malformations, 20% among the unexposed group without malformations, and 25% among the exposed group without malformations, then the bias adjusted odds ratio is 1.67.

OR_{BiasAdj} = 1.65 × ((0.7×0.75) ÷ (0.65×0.8)) = 1.67

In the study, a range of selection probabilities (stratified by exposure and outcome status) were specified, informed by the literature. Depending on assumed selection probabilities, the bias adjusted estimates of the risk ratio ranged from 1.65 to 1.80 (fig 1), indicating that the estimate was robust to this selection bias under given assumptions.

RETURN TO TEXTIt is possible to incorporate measured covariates in these formulas, but specification then generally becomes more difficult because we typically have to specify bias parameters (such as the prevalence of the unmeasured confounder) within stratums of measured covariates.

Although we might not be able to estimate these unknowns from the main study itself, we can specify plausible ranges based on the published literature, clinical knowledge, or a secondary study or substudy. Secondary studies or substudies, in which additional information from a subset of study participants or from a representative external group are collected, are particularly valuable because they are more likely to accurately capture unknown values.33 However, depending on the particular situation, they could be infeasible for a given study owing to data access limitations and resource constraints.

The published literature can be informative if there are relevant published studies and the study populations in the published studies are sufficiently similar to the population under investigation. Subjective judgments of plausible values for unknowns are vulnerable to the viewpoint of the investigator, and as a result might not accurately reflect the true unknown values. The validity of quantitative bias analysis depends critically on the validity of the assumed values. When implementing quantitative bias analysis, or appraising quantitative bias analysis in a published study, study investigators should question the choices made for these unknowns, and report these choices with transparency.

## Bounding methods

Bounding methods are mathematical formulas, similar to bias formulas, that we can apply to study results to quantify sensitivity to bias due to confounding, selection, and misclassification.5343536 However, unlike bias formulas, they require only a subset of the unknown values to be specified. While this requirement seems advantageous, one important disadvantage is that bounding methods generate a bound on the maximum possible bias, rather than an estimate of the association adjusted for bias. When values for all unknown parameters (eg, prevalence of an unmeasured confounder) can be specified and there is reasonable confidence in their validity, bias formulas or probabilistic bias analysis can generally be applied and can provide more information than bounding methods.37

One commonly used bounding method for unmeasured confounding is the E-value.535 By using E-value formulas, study investigators can calculate a bound on the bias adjusted estimate by specifying the association (eg, risk ratio) between exposure and unmeasured confounder and between unmeasured confounder and outcome, while leaving the prevalence of the unmeasured confounder unspecified. The E-value itself is the minimum value on the risk ratio scale that the association between exposure and unmeasured confounder or the association between unmeasured confounder and outcome must exceed to potentially reduce the bias adjusted findings to the null (or alternatively to some specified value, such as a protective risk ratio of 0.8). If the plausible strength of association between the unmeasured confounder and both exposure and outcome is smaller than the E-value, then that one confounder could not fully explain the observed association, providing support to the study findings. If the strength of association between the unmeasured confounder and either exposure or outcome is plausibly larger than the E-value, then we can only conclude that residual confounding might explain the observed association, but it is not possible to say whether such confounding is in truth sufficient, because we have not specified the prevalence of the unmeasured confounder. Box 3 illustrates the use of bounding methods for unmeasured confounding. Although popular, the application of E-values has been criticised, because these values have been commonly misinterpreted and have been used frequently without careful consideration of a specific unmeasured confounder or the possibility of multiple unmeasured confounders or other biases.38

### Application of bounding methods

In a cohort study investigating the association between use of proton pump inhibitors (relative to H2 receptor antagonists) and all cause mortality, investigators found evidence that individuals prescribed proton pump inhibitors were at higher risk of death after adjusting for several measured covariates including age, sex, and comorbidities (covariate adjusted hazard ratio 1.38, 95% confidence interval (CI) 1.33 to 1.44).39 However, unmeasured differences in frailty between users of H2 receptor antagonists and users of proton pump inhibitors could bias findings. Because the prevalence of the unmeasured confounder in the different exposure groups was unclear, the E-value was calculated. Because the outcome was rare at the end of follow-up, and therefore the risk ratio approximates the hazard ratio given proportional hazards,40 the E-value formula, which applies to the risk ratio, was applied to the hazard ratio.

E-value = RR_{Obs} + √(RR_{Obs}×(RR_{Obs}−1))

= 1.38 + √(1.38×(1.38−1))

= 2.10

(Where RR_{Obs} is the observed risk ratio.)

The E-value for the point estimate of the adjusted hazard (1.38) was 2.10. Hence either the adjusted risk ratio between exposure and unmeasured confounder, or the adjusted risk ratio between unmeasured confounder and outcome, must be greater than 2.10 to potentially explain the observed association of 1.38. The E-value can be applied to the bounds of the CI to account for random error. The calculated E-value for the lower bound of the 95% CI (ie, covariate adjusted hazard ratio=1.33) was 1.99. We can plot a curve to show the values of risk ratios necessary to potentially reduce the observed association, as estimated by the point estimate and the lower bound of the CI, to the null (fig 2). An unmeasured confounder with strengths of associations below the blue line could not fully explain the point estimate, and below the yellow line could not fully explain the lower bound of the confidence interval.

Given risk ratios of >2 observed in the literature between frailty and mortality, unmeasured confounding could not be ruled out as a possible explanation for observed findings. However, given that we used a bounding method, and did not specify unmeasured confounder prevalence, we could not say with certainty whether such confounding was likely to explain the observed result. Additional unmeasured or partially measured confounders might have also contributed to the observed association.

RETURN TO TEXT## Probabilistic bias analysis

Probabilistic bias analysis takes a different approach to handling uncertainty around the unknown values. Rather than specifying one value or a range of values for an unknown, a probability distribution (eg, a normal distribution) is specified for each of the unknown quantities. This distribution represents the uncertainty about the unknown values, and values are sampled repeatedly from this distribution before applying bias formulas using the sampled values. This approach can be applied to either summary or participant level data. The result is a distribution of bias adjusted estimates. Resampling should be performed a sufficient number of times (eg, 10 000 times), although this requirement can become computationally burdensome when performing corrections at the patient record level.41

Probabilistic bias analysis can readily handle many unknowns, which makes it particularly useful for handling multiple biases simultaneously.42 However, it can be difficult to specify a realistic distribution if little information on the unknowns is available from published studies or from additional data collection. Commonly chosen distributions include uniform, trapezoidal, triangular, beta, normal, and log-normal distributions.7 Sensitivity analyses can be conducted by varying the distribution and assessing the sensitivity of findings to distribution chosen. When performing corrections at the patient record level, analytical methods such as regression can be applied after correction to adjust associations for measured covariates.43Box 4 gives an example of probabilistic bias analysis for misclassification.

### Application of probabilistic bias analysis

In a cohort study of pregnant women conducted in insurance claims data, the observed covariate adjusted risk ratio for the association between antidepressant use and congenital cardiac defects among women with depression was 1.02 (95% confidence interval 0.90 to 1.15).44

Some misclassification of the outcome, congenital cardiac defects, was expected, and therefore probabilistic bias analysis was conducted. A validation study was conducted to assess the accuracy of classification. In this validation study, full medical records were obtained and used to verify diagnoses for a subset of pregnancies with congenital cardiac defects recorded in the insurance claims data. Based on positive predictive values estimated in this validation study, triangular distributions of plausible values for sensitivity (fig 3) and of specificity of outcome classification were specified and were used for probabilistic bias analysis.

Values were sampled at random 1000 times from these distributions and were used to calculate a distribution of bias adjusted estimates incorporating random error. The median bias adjusted estimate was 1.06, and the 95% simulation interval was 0.92 to 1.22.44 This finding indicates that under the given assumptions, the results were robust to outcome misclassification, because the bias adjusted results were similar to the initial estimates. Both sets of estimates suggested no evidence of association between antidepressant use and congenital cardiac defects.

RETURN TO TEXT## Pitfalls of methods

### Incorrect assumptions

Study investigators and readers of published research should be aware that the outputs of quantitative bias analyses are only as good as the assumptions made. These assumptions include both assumptions about the values chosen for the bias parameters (table 1), and assumptions inherent to the methods. For example, applying the E-value formula directly to a hazard ratio rather than a risk ratio is an approximation, and only a good approximation when the outcome is rare.45

Simplifying assumptions are required by many methods of quantitative bias analysis. For example, it is often assumed that the exposure does not modify the unmeasured confounder-outcome association.4 If these assumptions are not met then the findings of quantitative bias analysis might be inaccurate.

Ideally, assumptions would be based on supplemental data collected in a subset of the study population (eg, internal validation studies to estimate predictive values of misclassification) or, in the case of selection bias, in the source population from which the sample was selected, but additional data collection is not always feasible.7 Validation studies can be an important source of evidence on misclassification, although proper design is important to obtain valid estimates.33

### Multiple biases

If the results are robust to one source of bias, it is a mistake to assume that they must necessarily reflect the causal effect. Depending on the particular study, multiple residual biases could exist, and jointly quantifying the impact of all of these biases is necessary to properly assess robustness of results.34 Bias formulas and probabilistic bias analyses can be applied for multiple biases, but specification is more complicated, and the biases should typically be accounted for in the reverse order from which they arise (appendices 2 and 3 show an applied example).74647 Bounding methods are available for multiple biases.34

### Prespecification

Prespecification of quantitative bias analysis in the study protocol is valuable so that choice of unknown values and choice to report bias analysis is not influenced by whether the results of bias analysis are in line with the investigators expectations. Clearly a large range of analyses is possible, although we would encourage judicious application of these methods to deal with biases judged to be of specific importance given the limitations of the specific study being conducted.

### Accounting for random and systematic error

Both systematic errors, such as bias due to misclassification and random error due to sampling, affect study results. To accurately reflect this issue, quantitative bias analysis should jointly account for random error as well as systematic bias.48 Bias formulas, bounding methods, and probabilistic bias analysis approaches can be adapted to account for random error (appendix 1).

## Reporting

Deficiencies in the reporting of quantitative bias analysis have been previously noted.1484950 When reporting quantitative bias analysis, study investigators should state:

The method used and how it has been implemented

Details of the residual bias anticipated (eg, which specific potential confounder was unmeasured)

Any estimates for unknown values that have been used, with justification for the chosen values or distribution for these unknowns

Which simplifying assumptions (if any) have been made

Quantitative bias analysis is a valuable addition to a study, but as with any aspect of a study, should be interpreted critically and reported in sufficient detail to allow for critical interpretation.

## Alternative methods

Commonly applied and broadly applicable methods have been described in this article. Other methods are available and include modified likelihood and predictive value weighting with regression analyses,515253 propensity score calibration using validation data,5455 multiple imputation using validation data,56 methods for matched studies,3 and bayesian bias analysis if a fully bayesian approach is desired.5758

## Conclusions

Quantitative bias methods provide a means to quantitatively and rigorously assess the potential for residual bias in non-interventional studies. Increasing the appropriate use, understanding, and reporting of these methods has the potential to improve the robustness of clinical epidemiological research and reduce the likelihood of erroneous conclusions.

## Footnotes

Contributors: This article is the product of a working group on quantitative bias analysis between the London School of Hygiene and Tropical Medicine and GSK. An iterative process of online workshops and email correspondence was used to decide by consensus the content of the manuscript. Based on these decisions, a manuscript was drafted by JPB before further comment and reviewed by all group members. JPB and IJD are the guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: No specific funding was given for this work. JPB was supported by a GSK PhD studentship.

Competing interests: All authors have completed the ICMJE uniform disclosure form at https://www.icmje.org/disclosure-of-interest/ and declare: AC, NWG, and JNH were paid employees of GSK at the time of the submitted work; AC, IJD, NWG, and JNH own shares in GSK; AC is currently a paid employee of McKesson Corporation in a role unrelated to the submitted work; JNH is currently a paid employee of Boehringer Ingelheim in a role unrelated to this work; DN is UK Kidney Association Director of Informatics Research; JPB was funded by a GSK studentship received by IJD and reports unrelated consultancy work for WHO Europe and CorEvitas; SML has received unrelated grants with industry collaborators from IMI Horizon, but no direct industry funding; all authors report no other relationships or activities that could appear to have influenced the submitted work.

Provenance and peer review: Not commissioned; externally peer reviewed.