<- Sys.time()
t1 # wait a couple of seconds
<- Sys.time()
t2
-t1 t2
Exercise IV
Overview
Once more, this set of exercises won’t introduce new technical skills: You got everything you need for the most common designs. Here, you’ll dive deeper into simulating multiple groups and getting a better feel for how these comparisons work.
0.1 Exercise
ANOVAs are just linear models. There’s nothing wrong with using R’s aov
command. In fact, it’s just a wrapper for lm
. Type ?aov
and convince yourself by reading the (short) description. There’s another reason why directly calling lm
can be an advantage: efficiency. So far, we haven’t really cared much about the time our simulations take, but with more complicated designs (or more runs) it can easily take several hours, days, or even weeks.
Let’s see whether lm
gives us an advantage. For that, we’ll need the Sys.time()
command. It does exactly what it says: tells you the time of your computer system. When we store the time before and after a simulation, we can check how long it took and compare different functions.
Run the following to get familiar with how this works:
Now create a data frame with 4 (independent) groups whose means are c(4.2, 4.5, 4.6, 4.6)
and their SDs are c(0.7, 0.8, 0.6, 0.5)
. The sample size is 240 (equal size in the groups). Check for an effect of condition on the outcome in 10,000 simulations. Do that once with aov
and once with lm
. For each type of model (lm
or aov
), clock the system time before and after the simulation with Sys.time
. Are there differences? (Tip: Remember vectorization for rnorm
to create the groups.)
set.seed(42)
<- c(4.2, 4.5, 4.6, 4.6)
means <- c(0.7, 0.8, 0.6, 0.5)
sd <- 240
n <- 1e4
runs
# aov run
<- Sys.time()
t1_aov
for (i in 1:runs) {
<-
d data.frame(
scores = rnorm(n, means, sd),
condition = rep(letters[1:4], n/4)
)
aov(scores ~ condition, d)
}
<- Sys.time()
t2_aov
# lm run
<- Sys.time()
t1_lm
for (i in 1:runs) {
<-
d data.frame(
scores = rnorm(n, means, sd),
condition = rep(letters[1:4], n/4)
)
lm(scores ~ condition, d)
}
<- Sys.time()
t2_lm
<- t2_aov-t1_aov
aov_time <- t2_lm-t1_lm
lm_time
-lm_time aov_time
Time difference of 1.797944 secs
0.2 Exercise
Time to run a power analysis. You have 3 independent groups. Their means are c(5, 5.2, 5.6)
; the SD is constant: 1. Your minimum sample size is 120; your maximum sample size is 300. How large does your sample have to be for 95% power? Do 1,000 per run. Go in steps of 6 for the sample size (so increase 2 per group). (Tip: You can use seq
and remember this link). This simulation can take a minute (well, several actually).
<- c(5, 5.2, 5.6)
means <- 1
sd <- seq(120, 300, 6)
sizes <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<-
d data.frame(
scores = rnorm(n, means, sd),
condition = rep(letters[1:3], n/3)
)
<- summary(lm(scores ~ condition, d))
m <- broom::glance(m)$p.value
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues)
)
)
}
plot(outcomes$sample_size, outcomes$power)
0.3 Exercise
Do the above again, but this time there’s no difference between the first two groups: c(5, 5, 5.6)
. What’s the effect on power if there’s a null effect for one contrast? Go in steps of 12 to speed it up.
<- c(5, 5, 5.6)
means <- 1
sd <- seq(120, 300, 12)
sizes <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<-
d data.frame(
scores = rnorm(n, means, sd),
condition = rep(letters[1:3], n/3)
)
<- summary(lm(scores ~ condition, d))
m <- broom::glance(m)$p.value
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues)
)
)
}
plot(outcomes$sample_size, outcomes$power)
0.4 Exercise
Let’s try to work on the standardized scale. You again have 3 groups. Their SDs are c(2, 2.2, 1.8)
. Choose means such that the Cohen’s \(d\)s are as follows:
- Group 1 vs. Group 2: \(d\) = 0.2
- Group 1 vs. Group 3: \(d\) = 0.6
- Group 2 vs. Group 3: ?
Remember the formulate for Cohen’s \(d\): \(d = \frac{M_1-M_2}{\sqrt{\frac{(sd_1^2 + sd_2^2)}{2}}}\)
Have a look at earlier exercises when you dealt with Cohen’s \(d\). It might (wink, wink) help to get the pooled SD for each group comparison. Once you have the pooled SD, the absolute means don’t matter as much. You can start with a random mean for the control group, then have the mean for group 2 be the control group mean plus 0.2 pooled SDs (so that we have our Cohen’s \(d\)). Have a look at earlier exercises on how to calculate the pooled SD. Create a data frame and calculate power for samples ranging from 20 per group to 60 per group. Go in steps of 3 per group. Do 500 runs for each combination. Run effectsize::cohens_d()
for each run (and contrast). Store not only the p-values to calculate power, but also the Cohen’s \(d\) for each contrast. Plot the power first. Then take the mean of Cohen’s \(d\) for each contrast to check whether you actually got the correct numbers.
<-
pooled_sd function(sd1, sd2){
<- sqrt((sd1**2 + sd2**2) / 2)
pooled_sd
return(pooled_sd)
}
<- c(2, 2.2, 1.8)
sds
# pooled sd for first two combos (the third is implied when we determine the means)
<- pooled_sd(sds[1], sds[2])
g1_g2_sd <- pooled_sd(sds[1], sds[3])
g1_g3_sd
# choose random mean for first group and other groups as proportion of the pooled SD. the last one, logically, has to be 0.4
<- 10
m1 <- m1 + 0.2*g1_g2_sd
m2 <- m1 + 0.6*g1_g3_sd
m3
<- c(m1, m2, m3)
means <- seq(20*3, 60*3, 3*3)
sizes <- 500
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL,
d12 = NULL,
d13 = NULL,
d23 = NULL
)
for (n in sizes) {
<- NULL
pvalues <- NULL
d12 <- NULL
d13 <- NULL
d23
for (i in 1:runs) {
<-
d data.frame(
scores = rnorm(n, means, sds),
condition = rep(letters[1:3], n/3)
)
<- summary(lm(scores ~ condition, d))
m <- broom::glance(m)$p.value
pvalues[i]
# get cohen's for each contrast
<- effectsize::cohens_d(d$scores[d$condition==letters[1]], d$scores[d$condition==letters[2]])$Cohens_d
d12[i] <- effectsize::cohens_d(d$scores[d$condition==letters[1]], d$scores[d$condition==letters[3]])$Cohens_d
d13[i] <- effectsize::cohens_d(d$scores[d$condition==letters[2]], d$scores[d$condition==letters[3]])$Cohens_d
d23[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues),
d12 = mean(d12),
d13 = mean(d13),
d23 = mean(d23)
)
)
}
plot(outcomes$sample_size, outcomes$power)
mean(outcomes$d12);mean(outcomes$d13);mean(outcomes$d23)
[1] -0.2022801
[1] -0.6103061
[1] -0.3666429
0.5 Exercise
So far, we’ve worked with dummy coding for our explanatory factor. That presumes that we’re interested in those contrasts where the second and third condition are compared to the first. Sometimes, however, we’re not interested in these comparisons and would rather like to know the difference between our conditions and the grand mean. In that case, we should use sum to zero coding. Take the data set below:
set.seed(42)
<- 100
n <-
d data.frame(
condition = as.factor(rep(letters[1:3], times = n)),
scores = rnorm(n*3, c(100, 150, 200), c(7, 10, 12))
)
Let’s have a look at the contrasts:
contrasts(d$condition)
b c
a 0 0
b 1 0
c 0 1
It’s the familiar dummy coding. This means, going row by row, that, to get the mean for a, both b and c estimates will be zero. When everything is zero, that’s our intercept. In other words, the intercept will be the mean of condition a. To get b, we take the intercept plus 1 times the b estimate (mean of condition b in comparison to a), and to get c, we take the intercept plus 1 time the estimate for c (aka the mean of c in comparison to a).
Let’s have a look at contrast coding instead:
contrasts(d$condition) <- contr.sum
contrasts(d$condition)
[,1] [,2]
a 1 0
b 0 1
c -1 -1
Now the intercept is our grand mean (the overall mean of the outcome), the first contrast compares the score in condition a against the grand mean and the second contrast compares the mean of condition b against the grand mean. To get c, we need to take the intercept take the estimate of a times -1 plus the estimate of c times -1 (so grand mean minus a and minus b).
summary(lm(scores~condition, d))
Call:
lm(formula = scores ~ condition, data = d)
Residuals:
Min 1Q Median 3Q Max
-32.364 -6.417 -0.009 6.081 26.953
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 149.8348 0.5700 262.872 <2e-16 ***
condition1 -50.0140 0.8061 -62.045 <2e-16 ***
condition2 -0.6376 0.8061 -0.791 0.43
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.873 on 297 degrees of freedom
Multiple R-squared: 0.946, Adjusted R-squared: 0.9456
F-statistic: 2600 on 2 and 297 DF, p-value: < 2.2e-16
If we want a different comparison, we can reorder the factor levels:
$condition <- factor(d$condition, levels = c("b", "c", "a"))
dcontrasts(d$condition) <- contr.sum
contrasts(d$condition)
[,1] [,2]
b 1 0
c 0 1
a -1 -1
Now the first contrasts compares condition b against the grand mean, and the second contrast compares condition c against the grand mean.
For later sessions on interactions, we’ll rely on this kind of sum-to-zero coding (also called effect coding), so let’s get familiar with it here. Create four groups: a placebo group, a low dose group, a medium dose group, and a high dose group.
You want to know whether the low dose, the medium dose, and the high dose are significantly different from the grand mean. Simulate a data set, set the contrasts, and run a linear model. Choose mean values as you like, but make sure you can find them again in the model output. Speaking of the model output: How would you get the mean of the placebo group now? How would you get the mean of any group? (Tip: Look at the contrasts (aka the estimates from the model), take the grand mean, and apply the contrasts to each estimate.)
<- 50
n <-
d data.frame(
condition = factor(rep(c("placebo", "low dose", "medium dose", "high dose"), times = n)),
scores = rnorm(n*4, c(10, 20, 30, 40), 2)
)
$condition <- factor(d$condition, levels = c("low dose", "medium dose", "high dose", "placebo"))
dcontrasts(d$condition) <- contr.sum
contrasts(d$condition)
[,1] [,2] [,3]
low dose 1 0 0
medium dose 0 1 0
high dose 0 0 1
placebo -1 -1 -1
summary(lm(scores ~ condition, d))
Call:
lm(formula = scores ~ condition, data = d)
Residuals:
Min 1Q Median 3Q Max
-4.3991 -1.2198 -0.1447 1.3646 5.5612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.9151 0.1334 186.74 <2e-16 ***
condition1 -5.4787 0.2311 -23.71 <2e-16 ***
condition2 5.4555 0.2311 23.61 <2e-16 ***
condition3 14.9170 0.2311 64.55 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.887 on 196 degrees of freedom
Multiple R-squared: 0.9731, Adjusted R-squared: 0.9726
F-statistic: 2360 on 3 and 196 DF, p-value: < 2.2e-16
# get placebo mean
<- coef(lm(scores ~ condition, d))
coefs 1] + sum(coefs[-1] * contrasts(d$condition)["placebo", ]) coefs[
(Intercept)
10.02139
0.6 Exercise
You plan an experiment where you want to test the effect of framing on agreeing with a message. In your experiment people read a text with either neutral, positive, or negative framing. Each person reads each text (so we have a within-subjects design) and rates their level of agreeing on a 7-point Likert-scale.
You determine that the smallest contrast you care about is 0.25 points between the neutral and the negative condition and between the neutral and the positive condition. You’ve found in previous testing that only such a difference on agreeing actually has an effect on behavior. That’s your SESOI.
You decide that the midpoint of the scale is a decent starting point for the neutral condition. The negative condition should be 0.25 points lower and the positive condition 0.25 points higher. You generally expect that the variation should be somewhere around 1 Likert-point; this way, most scores are within -2 and +2 from the mean. However, you know that negative framing usually increases variation, so you’ll set the SD for that score 30% higher than that of the other two conditions.
The experiment takes place within 20 minutes, so you expect a decent amount of consistency in answers per participant: a person who generally agrees more should also agree more to all conditions. In other words, you set the correlation between scores to 0.6.
Last, you determined that you’re early in the research process, so you definitely don’t want to miss any effects (aka commit a Type II error). You also consider that a false positive won’t have any negative consequences. Therefore, you set your \(\alpha\) to 0.10.
You can afford to collect 100 participants in total (that’s where you’ll stop with the simulation). Simulate power (500 runs each); go in steps of 5 (start with 20). How many participants do you need for 95% power? Will 100 even get you there? Use contrast coding.
Several tips:
- For the variance-covariance matrix, you’ll need to get different covariances based on the groups
- Remember the data format the data have to be in. You’ll need to transform them from wide to long format. Have a look at the slides or good old Google
- You might have to Google how to get the p-value out of
aov
(one method that worked for me wasunlist(model)["Error: Within.Pr(>F)1"]
)
library(MASS)
library(tidyr)
<- c(neutral = 4, negative = 3.75, positive = 4.25)
means <- 1
sd_neutral_po <- 1.3
sd_negative <- 0.6
correlation <- seq(10, 100, 5)
sizes <- 500
runs <- 0.10
alpha
# covariance for negative with the two others is identical because the sd for the other two is identical
<- correlation * sd_neutral_po * sd_negative
cov_negative <- correlation * sd_neutral_po * sd_neutral_po
cov_other
<- matrix(
our_matrix c(
**2, cov_negative, cov_other,
sd_neutral_po**2, cov_negative,
cov_negative, sd_negative**2
cov_other, cov_negative, sd_neutral_po
),ncol = 3
)
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<- mvrnorm(
d
n,
means,
our_matrix
)
<- as.data.frame(d)
d $id <- factor(1:n)
d
<- pivot_longer(
d
d,cols = -id,
values_to = "scores",
names_to = "condition"
)
$condition <- as.factor(d$condition)
d
contrasts(d$condition) <- contr.sum
<- summary(aov(scores ~ condition + Error(id), d))
m
<- unlist(m)["Error: Within.Pr(>F)1"]
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < alpha) / length(pvalues)
)
)
}
plot(outcomes$sample_size, outcomes$power)
You can plot the results with plot(outcomes$sample_size, outcomes$power)
.
0.7 Exercise
The previous study worked. Nice. Now you want to know whether positive framing scales: Can you reduce positivity and agreeing reduces proportionally? You run a follow-up experiment where you now have a neutral condition, a low positivity condition (half the positivity of your previous positive condition), and a high positivity condition (the positivity of your original positive condition). Your effect size remains: 0.25 for the comparison between neutral and high positivity (the original positive condition). You expect, therefore, that half the positivity should lead to half the effect size: 0.125. That’s your new SESOI.
The previous study also showed that you weren’t too far off with an SD of 1, but this time you’ll use 0.8 for all scores.
However, you expect that, because the two positivity conditions are so similar, they’ll correlate higher than the correlation between neutral and positivity. So you expect a correlation between neutral and either positivity condition to be 0.3, but 0.6 between the two positivity conditions.
Because your department was so happy with how well you designed the previous study, they gave you more money. Now you can collect a maximum of 500 participants. Last time, the power simulation took quite some time, so this time you want to stop whenever you reach 95% power (while
). Also, because this time you want to be more stringent, you set \(\alpha\) to 0.01.
Do 500 runs, start with 60 people and go in in steps of 10. Use treatment contrasts. This can take a while.
library(MASS)
library(tidyr)
set.seed(42)
<- 0.125
sesoi <- c(neutral = 4, low = 4+sesoi, high = 4+2*sesoi)
means <- 0.8
sd <- 0.6
cor_positive <- 0.3
cor_neutral <- 500
runs <- 0.01
alpha <- 500
max_n <- 10
steps
# covariance for neutral with the two others is identical because the SD and correlation for the other two is identical
<- cor_positive * sd * sd
cov_positive <- cor_neutral * sd * sd
cov_neutral
<- matrix(
our_matrix c(
**2, cov_neutral, cov_neutral,
sd**2, cov_positive,
cov_neutral, sd**2
cov_neutral, cov_positive, sd
),ncol = 3
)
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
<- 0
power <- 60
n
while (power < 0.95 & n <= max_n) {
<- NULL
pvalues
for (i in 1:runs) {
<- mvrnorm(
d
n,
means,
our_matrix
)
<- as.data.frame(d)
d
$id <- factor(1:n)
d
<- pivot_longer(
d
d,cols = -id,
values_to = "scores",
names_to = "condition"
)
$condition <- as.factor(d$condition)
d
<- summary(aov(scores ~ condition + Error(id), d))
m
<- unlist(m)["Error: Within.Pr(>F)1"]
pvalues[i]
}
<- sum(pvalues < alpha) / length(pvalues)
power
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = power
)
)
<- n + steps
n
}
plot(outcomes$sample_size, outcomes$power)
0.8 Exercise
Usually when you run an ANOVA, you specify your contrasts before-hand so you know which groups you want to compare beyond just the overall effect of your condition. Let’s take the data set below. The overall effect of condition is significant:
library(MASS)
library(tidyr)
set.seed(42)
<- c(neutral = 10, low = 12, high = 14)
means <- 8
sd <- 0.6
correlation <- 80
n
<- correlation * sd * sd
covariance
<- matrix(
sigma c(
**2, covariance, covariance,
sd**2, covariance,
covariance, sd**2
covariance, covariance, sd
),ncol = 3
)
<- data.frame(
d mvrnorm(
n,
means,
sigma
)
)
$id <- factor(1:n)
d
<- pivot_longer(
d
d,cols = -id,
values_to = "scores",
names_to = "condition"
)
$condition <- as.factor(d$condition)
d
<- aov(scores ~ condition + Error(id), d)
m
summary(m)
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 79 12853 162.7
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
condition 2 606 303.22 14.23 2.07e-06 ***
Residuals 158 3366 21.31
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s inspect the post-hoc comparisons between the groups with the emmeans
package. Here, not every comparison is significant.
<- emmeans::emmeans(m, ~ condition)
posthocs posthocs
condition emmean SE df lower.CL upper.CL
high 13.55 0.925 122 11.7 15.4
low 12.30 0.925 122 10.5 14.1
neutral 9.73 0.925 122 7.9 11.6
Warning: EMMs are biased unless design is perfectly balanced
Confidence level used: 0.95
pairs(posthocs)
contrast estimate SE df t.ratio p.value
high - low 1.25 0.73 158 1.713 0.2035
high - neutral 3.82 0.73 158 5.232 <.0001
low - neutral 2.57 0.73 158 3.519 0.0016
P value adjustment: tukey method for comparing a family of 3 estimates
Run a power simulation that finds the sample size for which we have 90% power for the smallest post-hoc contrast (aka the largest of the three p-values.) Go about it as you think is best, but save both the main effect p-value and the largest p-value of the posthoc contrasts to be able to compare power. In effect, you’re powering for a t-test, which only emphasizes that you should work on the raw scale and specify each comparison. (Tip: summary
turns the pairs(postdocs)
into a data frame.)
<- c(neutral = 10, low = 12, high = 14)
means <- 8
sd <- 0.6
correlation <- 20
n <- 1e3
draws
<- correlation * sd * sd
covariance
<- matrix(
sigma c(
**2, covariance, covariance,
sd**2, covariance,
covariance, sd**2
covariance, covariance, sd
),ncol = 3
)
<- data.frame(
outcomes sample_size = NULL,
power = NULL,
power_posthoc = NULL
)
<- 0
power_posthoc
while (power_posthoc < .90) {
<- NULL
pvalues_posthoc <- NULL
pvalues
for (i in 1:runs) {
<- data.frame(
d mvrnorm(
n,
means,
sigma
)
)
$id <- factor(1:n)
d
<- pivot_longer(
d
d,cols = -id,
values_to = "scores",
names_to = "condition"
)
$condition <- as.factor(d$condition)
d
<- aov(scores ~ condition + Error(id), d)
m
<- unlist(summary(m))["Error: Within.Pr(>F)1"]
pvalues[i]
<- suppressMessages(emmeans::emmeans(m, ~ condition))
posthocs <- summary(pairs(posthocs))
outputs
<- max(outputs$p.value)
pvalues_posthoc[i]
}
<- sum(pvalues < 0.05) / length(pvalues)
power <- sum(pvalues_posthoc < 0.05) / length(pvalues_posthoc)
power_posthoc
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = power,
power_posthoc = power_posthoc
)
)
<- n + 10
n
}
library(ggplot2)
%>%
outcomes pivot_longer(
cols = starts_with("power"),
names_to = "Power Type"
%>%
) ggplot(
aes(x = sample_size, y = value, color = `Power Type`)
+
) geom_line() + theme_bw()
0.9 Exercise
Rather than doing posthoc comparisons between each group, we can rely on planned contrasts. If that’s completely new to you, have a read here. The blog post does a really nice job explaining planned contrasts.
In effect, rather than relying on sum to zero or treatment coding, we create our own comparisons. Let’s say we want to compare neutral to both low positivity and high positivity (as a general test that framing works). Then, our second contrast compares the two framing conditions. For simplicity, let’s go three independent groups with means of c(4, 4.3, 4.4)
and a common SD of 1.1. Custom contrasts will allow us to directly make the two comparisons we’re interested in like such:
<- data.frame(
d scores = rnorm(150, c(4, 4.3, 4.4), 1.1),
condition = factor(rep(c("neutral", "low", "high"), 50))
)
contrasts(d$condition)
low neutral
high 0 0
low 1 0
neutral 0 1
<- c(1, 1, -2)
highlow_vs_neutral <- c(1, -1, 0)
high_vs_low
contrasts(d$condition) <- cbind(highlow_vs_neutral, high_vs_low)
contrasts(d$condition)
highlow_vs_neutral high_vs_low
high 1 1
low 1 -1
neutral -2 0
summary.lm(aov(scores ~ condition, d))
Call:
aov(formula = scores ~ condition, data = d)
Residuals:
Min 1Q Median 3Q Max
-2.97137 -0.80870 -0.03995 0.77749 2.70410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.26182 0.08776 48.562 <2e-16 ***
conditionhighlow_vs_neutral 0.13517 0.06206 2.178 0.031 *
conditionhigh_vs_low 0.09555 0.10748 0.889 0.375
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.075 on 147 degrees of freedom
Multiple R-squared: 0.03629, Adjusted R-squared: 0.02318
F-statistic: 2.768 on 2 and 147 DF, p-value: 0.06609
Run a power simulation where you first run standard treatment contrasts. Then, run posthoc tests (emmeans::emmeans
) on that test. For simplicity, focus on the comparison of high vs. low. Select the p-value for that posthoc contrast. In the same run, change the contrasts to custom from above, run the model again, and extract the p-value for the high vs. low comparison. Do that for samples ranging from 50 to 200 in steps of 10. Do 200 runs to save a bit of time, then plot the two types of power against each other. Which approach is more “powerful”? No tips here–see this as a little challenge.
set.seed(42)
<- c(4., 4.3, 4.6)
means <- 1.1
sd <- 200
runs <- seq(50, 200, 10)
sizes
<-
outcomes data.frame(
sample_size = NULL,
type = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues_treatment <- NULL
pvalues_contrasts
for (i in 1:runs) {
<- data.frame(
d scores = rnorm(n*3, means, 1.1),
condition = factor(rep(c("neutral", "low", "high"), n))
)<- suppressMessages(emmeans::emmeans(lm(scores~condition, d), ~ condition))
posthocs <- summary(pairs(posthocs))
outputs
<- outputs[1,6]
pvalues_treatment[i]
<- c(1, 1, -2)
highlow_vs_neutral <- c(1, -1, 0)
high_vs_low contrasts(d$condition) <- cbind(highlow_vs_neutral, high_vs_low)
<- coefficients(summary(lm(scores~condition, d)))[3,4]
pvalues_contrasts[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = c(n, n),
type = c("treatment", "contrasts"),
power = c(
sum(pvalues_treatment < 0.05) / length(pvalues_treatment),
sum(pvalues_contrasts < 0.05) / length(pvalues_contrasts)
)
)
)
}
ggplot(outcomes, aes(x = sample_size, y = power, color = type)) + geom_line() + theme_bw()