# Understanding P values

BMJ 2014; 349 doi: https://doi.org/10.1136/bmj.g4550 (Published 11 July 2014) Cite this as: BMJ 2014;349:g4550## All rapid responses

*The BMJ*reserves the right to remove responses which are being wilfully misrepresented as published articles.

I understand by the significance level ( denoted as alpha or α) is that it is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% probability of concluding that a difference exists when there is no actual difference.

At one place Dr. Sedgwick writes: Small samples are more likely to result in a type I or II error when hypothesis testing.

Type I error is alpha or α error (which is not described in the endgame as such). Might Dr. Sedgwick clear the difference or analogy between type 1 error and critical level of significance.

**Competing interests: **
No competing interests

**09 November 2015**

We thank Dr. Sedgwick for the endgame on p-value. He has elaborately yet briefly (at the same time), made the concept of p-value understood by us. What I would like to add here is that colloquially we also say that if p-value is less than 0.05, it means the difference observed between the two treatments is real and not by chance alone i.e. the study finds that there is statistically significant difference in the two treatments. Does Dr. Sedgwick endorse this expression?

**Competing interests: **
No competing interests

**08 December 2014**

Sedgwick notes that the P-value does not seem to be well understood (1). However, I think that some of the most important points about the P-value have not been sufficiently emphasised in order to facilitate a better understanding of its interpretation.

The first is that the P-value is a conditional probability - that is it is the probability of getting the data observed or more extreme data if the null hypothesis is true. Another way of stating this is that the P-value is the probability of the data given that the null is true. The conditional clause is very important in the definition, but, because we do not know whether the condition is true or false when we do a statistical test, our P-value cannot be easily interpreted in terms of the level of support it provides for the null hypothesis.

The second important point is that what the researcher would actually like to know after collecting their data is the probability that the null hypothesis is true (or conversely the probability that the alternative hypothesis is true). Another way of stating this is the probability that the null is true given the data is the probability of primary interest. This probability is not the P-value and it cannot be directly estimated. The probability that the null is true given the data depends on the P-value, the power of the study to detect an effect and the prior probability that the null hypothesis is true. Thus not all P-values are equal and equal P-values may be interpreted very differently. For example, a “significant” P-value of 0.01 from a small underpowered study is more likely to be a false positive that the same P-value from a large, well-powered study. Similarly if the null hypothesis is almost certainly true, a P-value of 0.01 is more likely to be a false positive than the same P-value if the null hypothesis is almost certainly false. A simple illustration of the latter is to compare two equally powered studies one of which is testing the hypothesis that star sign is associated with lung cancer risk (null hypothesis is almost certainly true), and the other is testing the hypothesis that smoking is associated with lung cancer risk (null hypothesis almost certainly false). If both studies generate a P-value of 0.01 the first is almost certainly a false positive and the second is almost certainly a true positive, yet both P-values are the same.

These probabilities are all analogous to probabilities relating to the performance of a diagnostic test - sensitivity, specificity, disease prevalence and positive predictive value. Test sensitivity is equivalent to statistical power. Test specificity is equivalent to the P-value. Disease prevalence in the population being tested is equivalent to the prior that the null hypothesis is false. What the clinician wants to know is the probability that the subject has the disease given that the test is positive. This is the positive predictive value and, in hypothesis testing, is equivalent to the probability that the alternative hypothesis is true given the data. We are all comfortable with the fact that the positive predictive value depends on sensitivity, specificity and disease prevalence. For example, if the prevalence of disease is very low, a positive test is likely to be a false positive, even if the test has excellent sensitivity (good power) and specificity (small P-value). P-values can be interpreted in the same way.

1. Sedgwick P. Understanding P values. BMJ 2014:349:g4550

**Competing interests: **
No competing interests

**16 July 2014**

Thank you for this article.

I'm sure that many people don't really understand values. What I find more concerning, however, is that while quite a lot of people do understand - or think they understand - P values, many of them fail to recognise that the P value which is considered the threshold for "significance" should be adjusted (e.g. using the Bonferroni correction) when multiple hypotheses are tested (as they so often are).

**Competing interests: **
No competing interests

## Why give value to P value when it is more than critical level of significance?

In the example given in the endgame the P value is 0.449 which is more than critical level of significance (0.05) and hence it is concluded that the difference was statistically not significant hence null hypothesis is not rejected. Another statement that Dr. Sedgewick has made is that the p-value does not tell about the direction and size of the difference. Here I am bit skeptical. Since P value has some value viz., 0.449 then there should be some interpretation such as, some other sample has p value above .449, say P=0.8, then we may infer that difference is less remarkable than the sample with P=0.449. If this is not the case, then why give value of P, why not just say it is >0.05 and hence insignificant.

Competing interests:No competing interests27 February 2016