Power analysis through simulation in R
Niklas Johannes
Sir Ronald Fisher (1890-1962)
Informally: What’s the chance of observing something like this if there were nothing going on?
\[\begin{gather*} Chance = (Finding \ something \ like \ this \ | \ Nothing \ going \ on) \end{gather*}\]Formally: The probability of observing data this extreme or more extreme under the null hypothesis
\[\begin{gather*} P = (Data|H_0) \end{gather*}\]Jerzy Neyman (1894-1981)
If we’re forced to make a decision, then error rates are what we deem acceptable levels of being right/wrong in the long-run:
When there truly is no effect, two things can happen: We find a significant effect or we don’t.
When there truly is no effect, two things can happen: We find a significant effect (error) or we don’t (no error).
When there truly is an effect, two things can happen: We find no significant effect (error) or we find one (correct).
When there truly is an effect, two things can happen: We find no significant effect (error) or we find one (correct).
“When there isn’t something?” Why not just say there’s nothing?
\[\begin{gather*} (Data|H_0) \neq (H_0|Data) \end{gather*}\]We can’t find evidence for H0 with “classical” NHST (unless we use equivalence tests). A nonsignificant p-value only means we can’t reject H0, and therefore can’t accept H1, but we can’t accept H0.
H0 true | H1 true | |
---|---|---|
Significant | False Positive (\(\alpha\)) | True Positive (1-\(\beta\)) |
Nonsignificant | True Negative (1-\(\alpha\)) | False negative (\(\beta\)) |
Power is the probability of finding a significant result when there is an effect. It’s determined (simplified) by:
Let’s assume we want to know whether the population mean is larger than 50. We sample n = 100.
This is the sampling distribution if the null were true: The true effect is 50.
Where does a sample need to fall for us to wrongly conclude there’s a difference?
That’s our \(\alpha\): our false positives. Left of it: our true negatives (1-\(\alpha\)).
Our sampling distribution if the population value is 60. We commit a false positive if we assume a sample comes from the right distribution if in fact it comes from the left.
Our \(\beta\): our false negatives. We commit a false negative if we assume a sample comes from the left distribution if in fact it comes from the right.
Everything right of the critical value: If a sample comes from the right distribution, this is how often we’ll correctly identify it.
Power is the probability of finding a significant result when there is an effect. It’s determined (simplified) by:
Let’s have a look how: Preview
Running studies with low power (aka underpowered studies) risks:
Society has commissioned us to find out something. Why would we start by setting us up so that we’re barely able to do that?
Let’s go back to our example. Let’s assume we want to know whether the population mean is larger than 50. We sample n = 100.
The sampling distribution gets wider: Now a sample mean needs to be really large to be significant. The smaller our sample (aka the lower our power), the more extreme a sample has to be to “make it” across the critical value.
If our study is small (has low power), only an overestimate will pass our threshold for significance. With underpowered studies, significant results will always be an overestimate. Below, even effects that are larger than the average true effect won’t be found.
Tu put it differently: Small studies are only sensitive to large effects. But if the effect is truly small, we’ll only get a significant result for the rare massive overestimate.
Let’s have a look again: Preview
How many effects will we expect?
So:
\[\begin{gather*} power \times R \end{gather*}\]How many significant results do we expect?
What is the probability that a significant effect is indeed true? The rate of significant results that represent true effects divided by all significant results.
\[\begin{gather*} PPV = \frac{power \times R}{power \times R + \alpha} \end{gather*}\]Let’s assume our hypothesis has a 25% of being true and we go for the “conventional” alpha-level (5%).
\[\begin{gather*} PPV = \frac{power \times R}{power \times R + \alpha} = \frac{power \times \frac{P(effect)}{P(No \ effect)}}{power \times \frac{P(effect)}{P(No \ effect)} + \alpha} \end{gather*}\]Bottom line: The lower our power, the lower the probability that our significant effects represent the truth. Aka: Low power produces false findings.
Heard of the replication crisis?
“[The] lack of transparency in science has led to quality uncertainty, and . . . this threatens to erode trust in science” (Vazire 2017)