Do you need 16x the sample size to estimate a half-sized interaction effect?

Today, another good number to keep in your back pocket. 16. That’s how much more data you need to estimate an interaction compared to an average treatment effect… assuming the interaction is half the size of the main-effect and you want the same power as before. And you are fitting a OLS regression model.

You need 4x the data to estimate an interaction that is the same magnitude as the main effect.

These are the wise words I read on Andrew Gelman’s blog. I’m a bit curious though, does the same rule of thumb apply to logistic regression models? Let’s find out with some simulations.

New experiments for Bernoulli data

First, we will establish a baseline. Assume we would like to fit a logistic regression model for a single, binary factor. Set the effect size (on the log-odds scale) to be one, and assume the baseline effect is 50%. How big of a sample do we need to estimate the treatment effect with 80% power at the 95% confidence level?

# handy defintions for my brain
logit = qlogis
inv_logit = plogis
nsim = 1000

Now the sim code. Some explanation of the code: I generate 1000 individual simulations from a logistic regression model with a false null. In each of the individual simulations, I then fit a model and count the number of times the main effect null hypothesis is false. The frequency of null hypothesis rejection is the power of the test.

p <- rep(NA, nsim)
samp <- 140
dat <- data.frame(x = rep(c(0, 1), each = samp / 2))
main_effect <- 1
prob <- inv_logit(0.0 + main_effect * dat$x)

for(i in 1:nsim){
  dat$y <- rbinom(samp, 1, prob)
  fit <- glm(y ~ ., data = dat, family = binomial)
  p[i] <- summary(fit)$coefficients[2, 4]
}

mean(p < 0.05)

## [1] 0.813

Ah, okay. You need about a sample of size 140 to estimate a main effect of size 1 at the 95% confidence level with 80% power. Now we add the half interaction, and measure the power of the interaction.

Now, I’m a bit curious about non-linearity of the logit function throwing things off, but just for confirmation, inv_logit(1) is about 0.73, and inv_logit(1/2) is about 0.62. So, on both the probability scale and the linear predictor scale, my interaction effect is about half-size.

Similar set-up, just a slightly more complicated linear predictor for the simulation:

p <- rep(NA, nsim)
samp <- 140 * 18
dat <- data.frame(x = rep(c(0, 1), each = samp / 2),
                  w = rep(c(0, 1), samp / 2))
main_effect <- 1 # log-odds
interaction_effect <- 0.5 # interaction
prob <- inv_logit(0.0 + main_effect * dat$x + interaction_effect * dat$x * dat$w)

for(i in 1:nsim){
  dat$y <- rbinom(samp, 1, prob)
  fit <- glm(y ~ . *. , data = dat, family = binomial)
  p[i] <- summary(fit)$coefficients[4, 4] # p-value of interaction
}

mean(p < 0.05)

## [1] 0.825

So, you actually need 18 times the sample to estimate the interaction effect with the same confidence level and power. This is greater than 16, but the effect is on the logit scale, and my baseline probability is near 1/2, as noisy as can be.

And, if the interaction is the same size as the main effect? Do we require more than 4x the sample size?

p <- rep(NA, nsim)
samp <- 140 * 5
dat <- data.frame(x = rep(c(0, 1), each = samp / 2),
                  w = rep(c(0, 1), samp / 2))
main_effect <- 1 # log-odds
interaction_effect <- 1 # log-odds
prob <- inv_logit(0.0 + main_effect * dat$x + interaction_effect * dat$x * dat$w)

for(i in 1:nsim){
  dat$y <- rbinom(samp, 1, prob)
  fit <- glm(y ~ . *. , data = dat, family = binomial)
  p[i] <- summary(fit)$coefficients[4, 4] # p-value of interaction
}

mean(p < 0.05)

## [1] 0.804

We need about 5 times the sample size! For comparison, let’s just verify that the 16x figure is correct for normally distributed residuals.

Old experiment with Normal data

p <- rep(NA, nsim)
samp <- 34
dat <- data.frame(x = rep(c(0, 1), each = samp / 2))
main_effect <- 1
linpred <- 0.0 + main_effect * dat$x

for(i in 1:nsim){
  dat$y <- rnorm(samp,  mean = linpred)
  fit <- lm(y ~ ., data = dat) # OLS this time
  p[i] <- summary(fit)$coefficients[2, 4]
}

mean(p < 0.05)

## [1] 0.795

So, 34 samples for the normal distribution will take us to 80% power on the main effect. Now let’s add the interaction:

p <- rep(NA, nsim)
samp <- 34 * 16
dat <- data.frame(x = rep(c(0, 1), each = samp / 2),
                  w = rep(c(0, 1), samp / 2))
main_effect <- 1
interaction_effect <- 1 / 2
linpred <- 0.0 + main_effect * dat$x + interaction_effect * dat$x * dat$w

for(i in 1:nsim){
  dat$y <- rnorm(samp,  mean = linpred)
  fit <- lm(y ~ . * ., data = dat)
  p[i] <- summary(fit)$coefficients[4, 4] # p-value of interaction
}

mean(p < 0.05)

## [1] 0.818

Yep, about the same power. So 16x for OLS, 18x for binomial GLM. BTW all of these results are sensitive to your preferred contrast coding scheme. Since I use R, the default coding scheme is treatment coding.

Another post about the effects on contrast coding is now warranted.

Do you need 16x the sample size to estimate a half-sized interaction effect?

John Haman

2020/07/15

New experiments for Bernoulli data

Old experiment with Normal data