Understand that continuous predictors are just another case of the linear model
Extend this understanding to continuous (by categorical) interactions
Be able to translate that extension to generating data
Back to the linear model
So far our predictors have been categorical. If we have two groups predicting an outcome and our predictor is 0:
\[\begin{align}
& y = \beta_0 + \beta_1x\\
& y = \beta_0 + \beta_1 \times 0\\
& y = \beta_0
\end{align}\]
If it’s 1:
\[\begin{align}
& y = \beta_0 + \beta_1x\\
& y = \beta_0 + \beta_1 \times 1\\
& y = \beta_0 + \beta_1
\end{align}\]
Changing \(x\)
A continuous predictor is nothing new: We’re basically looking at what happens if \(x\) goes up by 1. So if our measure isn’t categorical (aka dummy coded), but continuous, we’re asking the same question. Only now, \(x\) can be more than 0 or 1.
\[\begin{align}
y &= \beta_0 + \beta_1x \\
y &= \beta_0 + \beta_11 \\
y &= \beta_0 + \beta_12 \\
y &= \beta_0 + \beta_13 \\
y &= \dots
\end{align}\]
Full range of \(x\)
If we assume linearity, then getting \(y\) is easy. Assume we predict disagreeableness from age. With each year people grow older, they become 1 point more disagreeable on a 100-point scale. (And when they’re 0 years old, they’re also 0 disagreeable.)
Now getting the score for someone aged 50 is trivial:
\(disagreeableness = 0 + 1 \times 50\)
Simulating this process is easy
We get our age, put it into the formula, and we have our outcome.
age <-rnorm(100, 50, 20)disagreeableness <-0+1* agesummary(lm(disagreeableness ~ age))
Call:
lm(formula = disagreeableness ~ age)
Residuals:
Min 1Q Median 3Q Max
-1.287e-13 3.740e-16 9.290e-16 1.765e-15 1.018e-14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.274e-14 3.426e-15 -6.637e+00 1.78e-09 ***
age 1.000e+00 6.370e-17 1.570e+16 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.323e-14 on 98 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.465e+32 on 1 and 98 DF, p-value: < 2.2e-16
Perfect fit
plot(age, disagreeableness)
Adding error
If there’s lots of error, it’ll be harder to separate signal (aka our true 1-point effect) from noise (the additional variation in our outcome).
error <-rnorm(100, 0, 100)disagreeableness <-0+ age + errorsummary(lm(disagreeableness ~ age))
Call:
lm(formula = disagreeableness ~ age)
Residuals:
Min 1Q Median 3Q Max
-182.39 -65.60 -12.45 61.45 300.97
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5999 25.4154 -0.024 0.9812
age 1.1264 0.4725 2.384 0.0191 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 98.13 on 98 degrees of freedom
Multiple R-squared: 0.05481, Adjusted R-squared: 0.04517
F-statistic: 5.683 on 1 and 98 DF, p-value: 0.01906
Psst, scale
Adding this much error will bring our outcome measure out of bounds. For a proper simulation where we’re interested in the data generating process, we need to deal with this (by truncating etc.).
That’s it, really
For the linear model, it doesn’t matter on what level our predictor is. Categorical or continuous, we can simulate any outcome with this formula—including multiple predictors and interactions between different levels of predictors.
Let’s assume number of relatives living close-by is also causing disagreeableness:
age <-rnorm(100, 50, 20)relatives <-rpois(100, 5)error <-rnorm(100, 0, 10)disagreeableness <-0+ age + relatives + error
Two independent effects
summary(lm(disagreeableness ~ age + relatives))
Call:
lm(formula = disagreeableness ~ age + relatives)
Residuals:
Min 1Q Median 3Q Max
-27.018 -6.560 0.326 6.535 23.319
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.4204 3.2527 -0.744 0.4586
age 1.0096 0.0470 21.480 <2e-16 ***
relatives 1.1264 0.4591 2.453 0.0159 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.882 on 97 degrees of freedom
Multiple R-squared: 0.8301, Adjusted R-squared: 0.8266
F-statistic: 237 on 2 and 97 DF, p-value: < 2.2e-16
A tangent
Causality is hard. Best to stay away from multiple predictors unless we’re prepared to defend our causal model.
When both the predictor and control variable are unreliable measures of the same construct, the true predictive effect of the construct gets partitioned into two coefficients, neither of which capture the full causal effect. (Wysocki, Lawson, and Rhemtulla 2022)
Varying effect sizes
What if we believe the effect of age is smaller, but that of number of relatives much larger. No problem, we just adjust our betas. Say each year only contributes 0.25 higher grumpiness, but each extra relative contributes 5 points on grumpiness.
disagreeableness <-0+0.25*age +5*relatives + errorsummary(lm(disagreeableness ~ age + relatives))
Call:
lm(formula = disagreeableness ~ age + relatives)
Residuals:
Min 1Q Median 3Q Max
-27.018 -6.560 0.326 6.535 23.319
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.4204 3.2527 -0.744 0.459
age 0.2596 0.0470 5.523 2.79e-07 ***
relatives 5.1264 0.4591 11.166 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.882 on 97 degrees of freedom
Multiple R-squared: 0.6252, Adjusted R-squared: 0.6175
F-statistic: 80.91 on 2 and 97 DF, p-value: < 2.2e-16
That’s the data generating process
In our simulation, we yet again make our assumptions explicit about how the data are generated: According to this linear model and our inputs (aka numbers).
Error adds uncertainty to our data generating process. It specifies that our linear model doesn’t 100% explain the causal structures and that there are influences on our outcome that we haven’t accounted for (or that an effect is heterogeneous).
About that intercept
\(disagreeableness = \beta_0 + \beta_1 \times age + \beta_2relatives\)
Now the intercept is disagreeableness when both age and number of relatives are 0. Maybe 0 age doesn’t make a lot of sense, so let’s center that variable.
Now the meaning intercept changes: Disagreeableness at average age and 0 relatives living close-by. Easier to have an intuition for.
In R
The effect for age doesn’t change, but the interpretation of the intercept does: Now it’s the disagreeableness when there’s no relatives and age is at average.
centered_age <-scale(age, center =TRUE, scale =FALSE)summary(lm(disagreeableness ~ centered_age + relatives))
Call:
lm(formula = disagreeableness ~ centered_age + relatives)
Residuals:
Min 1Q Median 3Q Max
-27.018 -6.560 0.326 6.535 23.319
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.7744 2.3192 4.646 1.07e-05 ***
centered_age 0.2596 0.0470 5.523 2.79e-07 ***
relatives 5.1264 0.4591 11.166 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.882 on 97 degrees of freedom
Multiple R-squared: 0.6252, Adjusted R-squared: 0.6175
F-statistic: 80.91 on 2 and 97 DF, p-value: < 2.2e-16
Simulate power
n <-100effect <-0.10runs <-1000pvalues <-NULLfor (i in1:runs) { age <-rnorm(n, 50, 20) centered_age <-scale(age, center =TRUE, scale =FALSE) error <-rnorm(n, 0, 10) disagreeableness <-50+ effect*centered_age + error disagreeableness <-ifelse(disagreeableness >100, 100, disagreeableness) disagreeableness <-ifelse(disagreeableness <0, 0, disagreeableness) m <-summary(lm(disagreeableness ~ centered_age)) pvalues[i] <- broom::glance(m)$p.value}sum(pvalues <0.05) /length(pvalues)
[1] 0.485
Standardized effects?
What if we want to work with standardized effects? Remember that a standardized effect is just an expression of standard deviation units? So we can standardize our variables and voila: \(r\).
age <-rnorm(100, 50, 20)disagreeableness <-0+0.25*age +rnorm(100, 0, 10)stan_age <-scale(age)stan_dis <-scale(disagreeableness)
Let’s compare
cor(age, disagreeableness)
[1] 0.4281227
summary(lm(stan_dis ~ stan_age))
Call:
lm(formula = stan_dis ~ stan_age)
Residuals:
Min 1Q Median 3Q Max
-2.3554 -0.5474 -0.1218 0.4755 2.5096
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.064e-17 9.083e-02 0.00 1
stan_age 4.281e-01 9.129e-02 4.69 8.86e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9083 on 98 degrees of freedom
Multiple R-squared: 0.1833, Adjusted R-squared: 0.175
F-statistic: 21.99 on 1 and 98 DF, p-value: 8.86e-06
Not super clean
This doesn’t give us a lot of control over the standardized effect size (our raw effect of 0.25 just happened to be 0.22 SDs). How about we just start off with standardized variables? If we want \(r\) = 0.20, we can do that as follows:
n <-1e4stan_age <-rnorm(n, 0, 1)stan_dis <-0+0.2*stan_age +rnorm(n, 0, 0.3)
\(r\) = 0.20?
summary(lm(stan_dis ~ stan_age))
Call:
lm(formula = stan_dis ~ stan_age)
Residuals:
Min 1Q Median 3Q Max
-1.09330 -0.20679 0.00066 0.20381 1.02869
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0005512 0.0030027 -0.184 0.854
stan_age 0.1957996 0.0029714 65.895 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3003 on 9998 degrees of freedom
Multiple R-squared: 0.3028, Adjusted R-squared: 0.3027
F-statistic: 4342 on 1 and 9998 DF, p-value: < 2.2e-16
Wait a second
Why isn’t the effect as we specified?
cor(stan_dis, stan_age)
[1] 0.5502689
Because we didn’t standardize
We created the standardized version of disagreeableness with
stan_dis <-0+0.2*stan_age +rnorm(n, 0, 0.3)
That means the variable isn’t actually standardized:
mean(stan_dis); sd(stan_dis)
[1] 0.0005116522
[1] 0.3595904
How do we fix this?
Luckily, we encountered a way to simulate variables, including their means, standard deviations, and correlations: The variance-covariance matrix. Now, we just make sure the means are 0 and standard deviations are 1.
Specify an outcome on the raw scale, but we sort of eyeball the error
Specify both predictor and outcome on the standardized scale, full control over means, SDs, but we prefer unstandardized
How do we specify an effect on the raw scale, but use the multivariate normal distribution? Remember the formula for \(r\)?
\(r = B_{xy} \frac{\sigma_x}{\sigma_y}\)
Just plug in the number then
Say we want a raw effect of age on disagreeableness of 0.5 points. We want age to have a mean of 50 and an SD of 20. We want disagreeableness to have a mean of 60 and an SD of 15. Let’s use the raw score and SDs first.
Now all we need are some means, and we’re good to go.
means <-c(age =50, disagreeableness =60)d <-mvrnorm( n, means, sigma)d <-as.data.frame(d)
Let’s check it all worked
Can we recover our raw effect of 0.5?
summary(lm(disagreeableness ~ age, d))
Call:
lm(formula = disagreeableness ~ age, data = d)
Residuals:
Min 1Q Median 3Q Max
-39.376 -7.649 -0.091 7.637 39.706
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.870513 0.299804 116.31 <2e-16 ***
age 0.502866 0.005571 90.27 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.2 on 9998 degrees of freedom
Multiple R-squared: 0.449, Adjusted R-squared: 0.449
F-statistic: 8148 on 1 and 9998 DF, p-value: < 2.2e-16
Standardized effect size
\(r = B_{xy} \frac{\sigma_x}{\sigma_Y}\)
b <-coef(summary(lm(disagreeableness ~ age, d)))[2]b *sd(d$age)/sd(d$disagreeableness)
[1] 0.6701006
cor(d$age, d$disagreeableness)
[1] 0.6701006
A word on getting that \(r\)
\(r = B_{xy} \frac{\sigma_x}{\sigma_Y}\)
What if we had specified an SD of 20 for age (\(x\)) and an SD of 5 for disagreeableness (\(y\))?
0.5*20/5
[1] 2
But \(r\) can’t be larger than 1–what’s going on??
Data generating process creates limits
When we determine the “raw” regression slope, we also determine the causal structure. In other words: If we say Y is caused by X, the SD of Y will be dependent on the SD of X and the effec size.
Perfect correlation: \(y\) is ten times the size of \(x\).
x <-rnorm(n, 50, 20)y <-10*xsd(x); sd(y)
[1] 19.56937
[1] 195.6937
Maximum correlation
summary(lm(y ~ x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-6.221e-11 -2.500e-14 -2.000e-15 2.000e-14 1.114e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.093e-13 3.510e-14 -1.451e+01 <2e-16 ***
x 1.000e+01 6.523e-16 1.533e+16 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.276e-12 on 9998 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.35e+32 on 1 and 9998 DF, p-value: < 2.2e-16
Maximum correlation
Now if we use the formula, we can get the maximum correlation of 1 because the outcome SD is exactly 10 times the predictor SD, which is exactly our effect:
b <-coef(lm(y~x))[2]b
x
10
b *sd(x)/sd(y)
x
1
cor(x,y)
[1] 1
Let’s add some error
x <-rnorm(n, 50, 20)y <-10*x +rnorm(n, 0, 50)sd(x); sd(y)
[1] 19.97822
[1] 206.2865
Now our correlation is lower
b <-coef(lm(y~x))[2]b
x
10.0238
b *sd(x)/sd(y)
x
0.9707744
cor(x,y)
[1] 0.9707744
Bottom line
If you’re saying that one variable is caused by another, you’re not free to choose the variation of the outcome variable. The variation is a result of the data generating process and you’re determining what the data generating process is with your linear model. For our example, with a raw effect of 10, the smallest SD we can choose for the outcome is 10 times that of the cause.
Back to our example: If age causes disagreeableness with an effect of 0.5, then the SD of disagreeableness can only be as small as half the SD of age.
Bottom line
In other words, by determining the effect size and SD of the cause, you’re setting bounds on the range and SD of the outcome. You need to take that into account when simulating data: What’s a sensible raw effect size in relation to both the scale of the cause and the effect?
Likert setup
Let’s say both predictor and outcome are on a 7-point Likert-scale. You think that both should be roughly on the mid-point with a 1.2-point SD for predictor and 0.9 for outcome. As a raw effect, you assume 0.3 points as your SESOI. Do those SDs make sense? Would they produce a sensible \(r\)?
0.3*1.2/0.9
[1] 0.4
Power
means <-c(x =4, y =4)sd_x <-1.2sd_y <-0.9sesoi <-0.3r <- sesoi * sd_x/sd_ycovariance <- r * sd_x * sd_yn <-50runs <-500sigma <-matrix(c(sd_x**2, covariance, covariance, sd_y),ncol =2 )pvalues <-NULLfor (i in1:runs) { d <-mvrnorm( n, means, sigma ) d <-as.data.frame(d) pvalues[i] <- broom::glance(summary((lm(y ~ x, d))))$p.value}sum(pvalues <0.05) /length(pvalues)
[1] 0.778
More than one cause
If we have more than one predictor, we need to specify effect sizes for each. Also, we need to be clear what our causal model is: We’re saying that both predictors are independently influencing our outcome. Otherwise we commit the Table II fallacy(Westreich and Greenland 2013). Simulating data also means simulating the causal structure.
What power?
If we want to simulate power for multiple predictors, powering for \(R^2\) is possible, but strange: It means powering for the total effect of all predictors. Just like with group differences, powering for a total effect can mean many underlying patterns:
\(x_1\) explaining everything, but \(x_2\) and \(x_3\) explaining nothing
\(x_1\) and \(x_2\) both explaining a moderate amount, but \(x_3\) explaining nothing
All 3 explaining a little
Etc.
What we need
Ideally, we power for all effectts meaning the smallest independent effect:
Slope for each predictor
Correlation between each predictor
Correlation between each predictor and the outcome
All means and SDs
How to simulate power for multiple causes? Next exercise.
Let’s do interactions
Categorical by continuous interactions
An interaction between a categorical and a continuous variable states that the effect of one depends on the other. Back to the linear model:
In effect, we ask what information increasing both predictors adds to increasing them individually.
Example
Say we want to know the effect of framing of a message (positive vs. neutral) on how much people agree with it, but we expect the effect to depend on how much people like positive framing.
Pictures, plase
Therefore, we expect the effect to look something like this, if we were to plot means per group and liking:
\(\beta_0\): The outcome when condition is neutral and liking is 0
\(\beta_1\): Difference between condition neutral and liking 0 and condition positive and liking 0 (aka main effect of condition)
\(\beta_2\): Difference between condition neutral and liking 0 and condition neutral and liking 1 (aka main effect liking)
\(\beta_3\): In addition to the condition effect, what does going up in liking by 1 add (or subtract) from the outcome?
Let’s create those scores
Let’s say we measure agreement on a 100-point scale. We make several assumptions:
There’s probably no main effect of liking: Why would how much you like a positive message influence the effect of any message? It should only enhance the positive one.
We’ll center liking: It makes it easier to think about what our coefficients mean.
Let’s say at average liking (0 = centered), agreement is on the mid-point of the scale: 50
We expect a positive message to “work” at average liking, so we put down a main effect of 5 points as our SESOI
We expect that going up 1 point on liking will enhance our framing effect by 1 point
Call:
lm(formula = agree ~ condition * liking, data = d)
Residuals:
Min 1Q Median 3Q Max
-57.891 -13.504 0.112 13.581 50.237
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.772322 0.198910 250.226 < 2e-16 ***
conditionpositive 5.177188 0.281301 18.404 < 2e-16 ***
liking 0.002632 0.097403 0.027 0.978
conditionpositive:liking 1.001518 0.138376 7.238 4.73e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.89 on 19996 degrees of freedom
Multiple R-squared: 0.02175, Adjusted R-squared: 0.0216
F-statistic: 148.2 on 3 and 19996 DF, p-value: < 2.2e-16
Pictures, please
Two-way interaction
The interaction can also go way the other way around: The effect of A under B or the effect of B under A. In our case, we could also ask how the effect of liking a message is modified through framing. Same data, different plot:
Let’s redo the logic
set.seed(42)b0 <-50b1 <-0# no main effect of conditionb2 <-2# main effect of likingb3 <-3# interactionn <-1e4error <-rnorm(n*2, 0, 5)condition <-rep(0:1, n)liking <-runif(n*2, min =0, max =7)liking <-scale(liking, center =TRUE, scale =FALSE)d <-data.frame(condition = condition,liking = liking,agree = b0 + b1 * condition + b2 * liking + b3 * condition * liking + error )d$condition <-as.factor(ifelse(d$condition ==0, "neutral", "positive"))d$agree <-ifelse(d$agree >100, 100, d$agree)d$agree <-ifelse(d$agree <0, 0, d$agree)
Now we’re looking at slopes
Continuous interactions
Same logic as before: What extra information do we gain if we go up on both variables compared to going up on them individually?
Say we want to know the effect of liking a message sender on agreeing with the sender’s message. However, we expect the effect to depend on how trustworthy the sender is.
Pictures, plase
We expect the effect to look something like this, if we were to plot a line per trustworthiness rating:
How to translate
\[
y = \beta_0 + \beta_1liking + \beta_2trust + \beta_3 \times liking \times trust
\] Both predictors are centered, so 0 is their mean:
\(\beta_0\): The outcome when both liking and trustworthiness are 0 (at their means)
\(\beta_1\): Outcome when liking goes 1 up, but trustworthiness remains at average (= 0) (aka main effect of liking)
\(\beta_2\): Outcome when trustworthiness goes 1 up, but liking remains at average (= 0) (aka the main effect of trustworthiness)
\(\beta_3\): In addition to the effect of liking going up 1, what does going up in trustworthiness by 1 add (or subtract) from the outcome?
Let’s create those scores
Once more, we measure agreement on a 100-point scale. We make several assumptions:
We’ll center both predictors: It makes it easier to think about the interaction.
At average liking and trustworthiness (0 = centered), agreement is on the mid-point of the scale: 50
There’s a main effect of liking: With each extra point (at average trustworthiness), agreement should go up by 3 points.
There’s a main effect of trustworthiness: With each extra point (at average liking), agreement should go up by 2 points.
We expect that going up on trustworthiness will enhance our liking effect by 2 points
Call:
lm(formula = agree ~ liking * trustworthiness, data = d)
Residuals:
Min 1Q Median 3Q Max
-59.451 -13.609 0.126 13.583 66.108
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.68043 0.19581 253.72 <2e-16 ***
liking 2.97717 0.09649 30.85 <2e-16 ***
trustworthiness 1.83232 0.09605 19.08 <2e-16 ***
liking:trustworthiness 1.92430 0.04730 40.69 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.58 on 9996 degrees of freedom
Multiple R-squared: 0.2276, Adjusted R-squared: 0.2274
F-statistic: 982 on 3 and 9996 DF, p-value: < 2.2e-16
Pictures, plesase
Interaction plots
In the above plot, I cheated a bit by rounding trustworthiness so that we can get only 7 “levels”. With truly continuous variables, we usually operate with standard deviations: What’s the effect of liking on agreeing when trustworthiness is 1 SD above or below its mean? You can create those plots yourself or rely on other packages.
Prediction plots
m <-lm(agree ~ liking*trustworthiness, d)interactions::interact_plot(m, pred ="liking", modx ="trustworthiness")
The interaction term is really just a product, so we can treat it as its own variable. We can simply create a correlation matrix where we specify how the product of two variables (aka \(x_1 \times x_2\)) should correlate with our outcome.
\[
\begin{bmatrix}
& x_1 & x_2 & x_1x_2 & y \\
x_1 & 1 & & & \\
x_2 & r & 1 & &\\
x_1x_2 & r & r & 1 &\\
y & r & r& r& 1
\end{bmatrix}
\]
Putting that into numbers
Let’s say liking is correlated to agreeing with 0.2, trustworthiness with 0.25, and the interaction “adds” 0.1 standard deviations. The interaction is correlated to liking and trustworthiness with 0. Liking and trustworthiness are correlated at 0.4.
Remember: Those are conditional effects, not just 1-to-1 correlations.
summary(lm(y ~ x1 + x2 + x1x2, d))
Call:
lm(formula = y ~ x1 + x2 + x1x2, data = d)
Residuals:
Min 1Q Median 3Q Max
-3.3061 -0.6368 0.0043 0.6319 3.4221
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006035 0.009414 0.641 0.521
x1 0.120853 0.010219 11.826 <2e-16 ***
x2 0.196707 0.010297 19.103 <2e-16 ***
x1x2 0.106056 0.009338 11.358 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9414 on 9996 degrees of freedom
Multiple R-squared: 0.08888, Adjusted R-squared: 0.0886
F-statistic: 325 on 3 and 9996 DF, p-value: < 2.2e-16
What about raw?
If we want to work on the raw scale, we once more need the standard deviations. We could do the transformation per variable variable pairing, but that gets very unwieldy. At this point, you’d need to do some matrix multiplication, see here for a starter.
Takeaways
Understand that continuous predictors are just another case of the linear model
Extend this understanding to continuous (by categorical) interactions
Be able to translate that extension to generating data
References
Westreich, Daniel, and Sander Greenland. 2013. “The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients.”American Journal of Epidemiology 177 (4): 292–98. https://doi.org/10.1093/aje/kws412.
Wysocki, Anna C., Katherine M. Lawson, and Mijke Rhemtulla. 2022. “Statistical Control Requires Causal Justification.”Advances in Methods and Practices in Psychological Science 5 (2): 25152459221095823. https://doi.org/10.1177/25152459221095823.