library(ggplot2)
ggplot(outcomes, aes(x = sample_size, y = power, color = type)) + geom_line() + theme_bw()
Exercise VI
Overview
You made it: last set of exercises. Nothing new here; just an extension of the skills you already have at this point. You’ll simulate power for continuous predictors and interactions.
0.1 Exercise
You’re interested in the effects of three predictors on an outcome. When all predictors are 0, the outcome (y
) should be around 16. The first predictor, x1
, causes a 1-point increase in y
; the second predictor, x2
causes a 0.3-point increase in y
; the third predictor, x3
, causes a 1.4 increase in y
. All predictors range from 0 to 7; for our purposes, we can assume that they’re uniformly distributed. The error term has a mean of 0 and an SD of 10.
Simulate power with 500 runs. You could either power for the entire model (so for the p-value of the F-test for the full lm
model) or for each individual predictor. Find out which has the most power: Store power for the full model and each individual predictor from the model. Start with 50 participants and go up in steps of 10 until you reach 200. Plot the power curves for the different power types. You can use the code below:
set.seed(42)
<- 16
b0 <- 1
b1 <- 0.3
b2 <- 1.4
b3 <- seq(50, 200, 10)
sizes <- 500
runs
<-
outcomes data.frame(
sample_size = NULL,
type = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues_all <- NULL
pvalues_x1 <- NULL
pvalues_x2 <- NULL
pvalues_x3
for (i in 1:runs) {
<- rnorm(n, 0, 10)
error
<- runif(n, 0, 7)
x1 <- runif(n, 0, 7)
x2 <- runif(n, 0, 7)
x3
<-
d data.frame(
x1 = x1,
x2 = x2,
x3 = x3,
y = b0 + b1*x1 + b2*x2 + b3*x3 + error
)
<- summary(lm(y ~ x1 + x2 + x3, d))
m
<- broom::glance(m)$p.value
pvalues_all[i] <- broom::tidy(m)$p.value[2]
pvalues_x1[i] <- broom::tidy(m)$p.value[3]
pvalues_x2[i] <- broom::tidy(m)$p.value[4]
pvalues_x3[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = rep(n, 4),
type = factor(c("all", "x1", "x2", "x3")),
power = c(
sum(pvalues_all < 0.05) / length(pvalues_all),
sum(pvalues_x1 < 0.05) / length(pvalues_x1),
sum(pvalues_x2 < 0.05) / length(pvalues_x2),
sum(pvalues_x3 < 0.05) / length(pvalues_x3)
)
)
)
}
library(ggplot2)
ggplot(outcomes, aes(x = sample_size, y = power, color = type)) + geom_line() + theme_bw()
0.2 Exercise
Create a correlation between two variables of 0.2. Use a correlation matrix. How many participants do you need for 95% power with an alpha of 0.01? Go about this as you think is best. Verify your estimate with GPower.
library(MASS)
<-
sigma matrix(
c(1, 0.2, 0.2, 1),
ncol = 2
)<- 0.01
alpha <- 0
power <- 100
n <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
while (power < 0.95) {
<- NULL
pvalues
for (i in 1:runs) {
<-
d mvrnorm(
n,c(0,0),
sigma
)
<- as.data.frame(d)
d
colnames(d) <- c("x1", "x2")
<- cor.test(d$x1, d$x2)$p.value
pvalues[i]
}
<- sum(pvalues < alpha) / length(pvalues)
power
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = power
)
)
<- n + 10
n
}
plot(outcomes$sample_size, outcomes$power)
0.3 Exercise
In the correlation matrix, you specifically say how variables are related, but also when they aren’t related (i.e., when you assign zero for a correlation). For the data generating process, this is important, because it specifies the causal structure. Going simply by \(R^2\) or significance will often be misleading. For example, what if you have a third variable that creates a spurious effect? You can simulate that as well. Say you measure how much ice cream a person consumes in a year and expect that eating a lot of sweet, sweet ice cream makes people crave savory foods, which you measure as the portions of fries consumed in a beer garden. However, there truly is no effect of ice cream on eating fries; it’s just that both are influenced by the number of sun hours: More sun hours will lead people to eat more ice cream but also to spend more time in beer gardens and, inevitably, eat fries there.
Simulate that: Use a sample size of 10,000. First, create a sun hours variable (rnorm
will be enough). Then, have sun hours cause ice cream eating with an effect size of your choosing (don’t forget some error) . Next, also have sun hours cause fries eating. Now run two regression models:
- one where you predict fries with ice cream alone
- one where you predict fries with ice cream and sun hours
Which model gives you the correct causal effect of ice cream eating? But which one has a lower p-value for the ice cream variables? Do you see how it makes little sense to just go for significance and how you must consider the causal process when simulating data? If this tickled your interest, have a look here and here.
<- 1e4
n
<- rnorm(n)
sun <- 0.4*sun + rnorm(n)
ice_cream <- 0.5*sun + rnorm(n)
fries
summary(lm(fries ~ ice_cream))
Call:
lm(formula = fries ~ ice_cream)
Residuals:
Min 1Q Median 3Q Max
-3.8340 -0.7412 -0.0039 0.7345 4.3659
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.009393 0.011036 0.851 0.395
ice_cream 0.175206 0.010235 17.118 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.104 on 9998 degrees of freedom
Multiple R-squared: 0.02847, Adjusted R-squared: 0.02838
F-statistic: 293 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(fries ~ ice_cream + sun))
Call:
lm(formula = fries ~ ice_cream + sun)
Residuals:
Min 1Q Median 3Q Max
-3.6994 -0.6739 -0.0083 0.6723 3.6741
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.004700 0.010029 0.469 0.639
ice_cream -0.003296 0.010080 -0.327 0.744
sun 0.497472 0.010829 45.937 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.003 on 9997 degrees of freedom
Multiple R-squared: 0.1978, Adjusted R-squared: 0.1976
F-statistic: 1233 on 2 and 9997 DF, p-value: < 2.2e-16
0.4 Exercise
Your colleague comes to you and tells you that they predicted satisfaction with a film with people’s enjoyment: The higher the enjoyment, the more they reported to be satisfied with the film. However, they only had 20 people in their sample. Run a sensitivity analysis and check for what \(r\) 20 people give you 90% power, even if you allow a more liberal alpha of 0.10. Increase \(r\) in steps of 0.01 and go the full distance, checking all effect sizes from 0 to 1. Do 500 runs per step. Verify with GPower.
<- 20
n <- 0.10
alpha <- seq(0, 1, 0.01)
effects <- 1e3
runs
<-
outcomes data.frame(
effect_size = NULL,
power = NULL
)
<- NULL
pvalues
for (anr in effects) {
for (i in 1:runs) {
<- matrix(
sigma c(1, anr, anr, 1),
ncol = 2
)
<- mvrnorm(
d
n,c(0,0),
sigma
)
<- as.data.frame(d)
d colnames(d) <- c("enjoyment", "satisfaction")
<- cor.test(d$enjoyment, d$satisfaction)$p.value
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
effect_size = anr,
power = sum(pvalues < alpha) / length(pvalues)
)
)
}
plot(outcomes$effect_size, outcomes$power)
0.5 Exercise
You design a follow-up study where you measure enjoyment and satisfaction with films, but this time you want to determine a SESOI and do a power analysis before-hand. You measure both variables on a 5-point index Likert-scale (by index I mean it’s the average of several items). You know enjoyment usually scores above the midpoint; say a mean of 3.9 sounds realistic. The SD will be narrow: 0.5. For satisfaction, you expect a score below the midpoint of the scale, at 2.1, but with a larger SD of 1. Your smallest effect size of interest is 0.7 points because a previous analysis shows that this is the point where satisfaction translates to higher well-being for the day.
You want to be strict, so you set your alpha to 0.005, but at the same time, you don’t mind missing a true effect just as much, which is why you set your power goal to 85%. Run the power analysis (1000 runs per combo). Start at 50 people and go up in steps of 1 until you reach your desired power level (translation: use a while
statement). Tip: Use the raw effect and SDs to get \(r\), which you then use in a variance-covariance matrix to simulate a data set for a given sample size. Verify with GPower.
<- 0.85
goal <- 0.005
alpha <- 0.7
sesoi <- 50
n <- c(enjoyment = 3.9, satisfaction = 2.1)
means <- 0.5
sd_enjoy <- 1
sd_sat <- sesoi * sd_enjoy/sd_sat
r <- r * sd_enjoy * sd_sat
covariance <- 1e3
runs
<-
sigma matrix(
c(
**2, covariance,
sd_enjoy**2
covariance, sd_sat
),ncol = 2
)
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
<- 0
power
while (power < goal) {
<- NULL
pvalues
for (i in 1:runs) {
<- as.data.frame(
d mvrnorm(
n,
means,
sigma
)
)
<- cor.test(d$enjoyment, d$satisfaction)$p.value
pvalues[i]
}
<- sum(pvalues < alpha) / length(pvalues)
power
<- rbind(
outcomes
outcomes,data.frame(
sample_size = n,
power = power
)
)
<- n + 1
n
}
plot(outcomes$sample_size, outcomes$power)
0.6 Exercise
A colleague has found that films with higher ratings on IMDB bring in more money at the box office. However, you think this effect is mostly due to genre: for comedies, there is an effect of quality on success, but for action films this doesn’t matter. In other words, you predict an interaction, such that the positive effect of quality (i.e., IMDB rating) is only present in one condition (genre: comedies), but not in the other condition.
You want to test that hypothesis and start your power simulation. Specifically, you want to power for the interaction effect. IMDB ratings range from 0-10. Success is measured in million dollar steps. For genre, action films will be your baseline. You’ll also center the rating so that 0 represents an average rating. Overall, when a film is an action flick and has an average rating, you expect it to bring in 20 million. You don’t expect a main effect of quality because the quality effect will depend on genre. You do expect, however, a main effect of genre, such that, at average quality, comedies bring in 5 million more than action flicks. Crucially, you expect an interaction effect: For comedies, each 1-point increase in quality will generate 2 million extra at the box office.
As for error: You expect a normally distributed error with a mean of 0 and an SD of 15. Simulate the data and see how many films you need to have enough power to detect the interaction effect with 95% power with an alpha of 0.05. Start at 10 and go up in steps of 10 to a maximum of 250. Do 1,000 runs per combo.
Like before, when you work with the linear model, it will help write out the model. So maybe start with the formula below:
\[\begin{align} millions = \beta_0 + \beta_1 Quality + \beta_2 Genre + \beta_3 Quality \times Genre + \epsilon \end{align}\]First start by filling in the betas and using them to simulate a massive data set (say 10,000 cases) before you start simulating. Plot the data set with:
ggplot(d, aes(x = quality, y = success, color = genre)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") +
theme_bw()
Does the plot look like what we specified? Make sure you code the factor correctly: genre
should have action films as the baseline; otherwise, the meaning of \(\beta_2\) and \(\beta_3\) changes.
<- 50
b0 <- 0
b1 <- 5
b2 <- 2
b3 <- seq(10, 300, 10)
sizes <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<- rnorm(n, 0, 15)
error <- rep(0:1, n/2)
genre <- scale(runif(n, 0, 10), center = TRUE, scale = FALSE)
quality
<-
d data.frame(
genre = genre,
quality = quality,
success = b0 + b1*quality + b2*genre + b3*quality*genre + error
)
# success can't be less than 0
$success <- ifelse(d$success < 0, 0, d$success)
d
<- summary(lm(success ~ quality*genre, d))
m
<- broom::tidy(m)$p.value[4]
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues)
)
)
}
with(outcomes, plot(sample_size, power))
0.7 Exercise
Simulate the above again, but this time increase the main effects of both quality and genre to 10 million. Leave the interaction effect at 2. What do you think – will this have an effect on power? If so, how and why?
<- 50
b0 <- 10
b1 <- 10
b2 <- 2
b3 <- seq(10, 250, 10)
sizes <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<- rnorm(n, 0, 15)
error <- rep(0:1, n/2)
genre <- scale(runif(n, 0, 10), center = TRUE, scale = FALSE)
quality
<-
d data.frame(
genre = genre,
quality = quality,
success = b0 + b1*quality + b2*genre + b3*quality*genre + error
)
# success can't be less than 0
$success <- ifelse(d$success < 0, 0, d$success)
d
<- summary(lm(success ~ quality*genre, d))
m
<- broom::tidy(m)$p.value[4]
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues)
)
)
}
with(outcomes, plot(sample_size, power))
0.8 Exercise
You want to know how much listening to music leads to feelings of being relaxed. You expect a linear effect of the number of songs someone listens to on them feeling relaxed. After all, if listening to music relaxes you, then the more music, the more relaxed you get.
However, you suspect that whether this effect occurs will strongly depend on how much people like these songs. Can’t relax if you listen to lots of crap (notice how I didn’t mention an artist here to be nice).
You measure feelings of relaxation on a Likert-type index variable with a range of 1-7. The number of songs is measured as a count with a maximum of 30 songs–you counted songs over a 1h period. After that period, you also asked how much people enjoyed the songs they were listening to on a scale from 0-100.
Simulate number of songs (use sample
) and liking (uniform). Transform the number of songs so that 1 up on the variable means listening to 5 more songs. As for liking, you want to go in steps of 10. Center both. These transformations make it easier to get an intuition about the effect sizes.
At average listening and liking, you think relaxation should be somewhat below the midpoint of the scale: 3.1. You don’t expect main effects of either predictor: Liking music shouldn’t relax you unless you listen to some of it, and listening to music alone shouldn’t do much unless you like the music.
However, you expect a fairly sizable interaction effect, such that the combination of liking and listening will increase relaxation by 0.2 points. Remember the transformations: We say that listening to 5 songs (which is going up 1 on the listening variable) will increase relaxation 0.2 when liking also goes up by 10 points (which is going up 1 on the liking variable).
For error, you expect a normally distributed error with mean 0 and an SD of 2. Simulate power for the interaction effect, starting at 10 and ending at a sample size of 100. Go in steps of 5.Do 1,000 runs per combo. Also use interactions::interaction_plot
to plot a random model to check how the data look like.
<- 3.1
b0 <- 0
b1 <- 0
b2 <- 0.2
b3 <- seq(10, 100, 5)
sizes <- 1e3
runs
<-
outcomes data.frame(
sample_size = NULL,
power = NULL
)
for (n in sizes) {
<- NULL
pvalues
for (i in 1:runs) {
<- rnorm(n, 0, 2)
error <- scale(sample(1:30, n, replace = TRUE) / 5, scale = FALSE)
listening <- scale(runif(n, 0, 100) / 10, scale = FALSE)
liking
<-
d data.frame(
listening = listening,
liking = liking,
relaxation = b0 + b1*listening + b2*liking + b3*listening*liking + error
)
#trim the outcome
$relaxation <- ifelse(d$relaxation < 0, 0, d$relaxation)
d$relaxation <- ifelse(d$relaxation > 7, 7, d$relaxation)
d
<- summary(lm(relaxation ~ listening*liking, d))
m <- broom::tidy(m)$p.value[4]
pvalues[i]
}
<-
outcomes rbind(
outcomes,data.frame(
sample_size = n,
power = sum(pvalues < 0.05) / length(pvalues)
)
)
}
plot(outcomes$sample_size, outcomes$power)
::interact_plot(lm(relaxation ~ listening*liking, d), pred = "listening", modx = "liking") interactions