Sample Size for interaction effects under different codings

John Haman

2020/10/17

Categories: Statistics R Tags: Statistics R

Following another discussion of the 16x sample size rule for half-sized interactions on Andrew Gelman’s blog, I commented that I do not think the rule holds up under different coding schemes, but that I’d not done a simulation. In particular, I was concerned about the rule under the DOE ‘-1, +1’ coding scheme compared to the usual treatment coding scheme.

It’s actually an important distinction to make, because it’s easy to design an experiment with one coding in mind, and then analyze it under a different one. For example, lots of DOE software will use the ‘-1, +1’ coding, but R defaults to treatment coding.

Another commenter asked that I conduct the simulation and some discussion ensued about the effects of coding schemes on interaction sizes, and more fundamentally, what an interaction actually means.

So here is that simulation.

Function to calculate sample sizes

The function is unfortunately long, but I’ll give you the gist of what it does:

There are four inputs: the effect size par, the coding scheme of the design matrix, code_gen, the coding scheme of the analysis, code, and whether to determine the sample size based on the \(p\)-value of a main effect or interaction, main.

Some common settings are made internal to the function: the confidence level is set at \(95\%\) and the target power is \(90\%\). Once the function gets going, it starts a simulation that loops through an increasingly larger design matrix until it finds the right size matrix that achieves power \(> 90\%\).

There are only two factors in the experiment, and as the sample size grows, I only increment the size of the matrix by \(4\) runs at a time to keep it balanced. There is also some internal logic that makes sure the right things are set up correctly when we toggle between main effects power and interaction power.

get_nsamp <- function(par = 1,
                      code = "contr.treatment",
                      code_gen = "contr.treatment",
                      main = TRUE) {
  #
  # par : magnitude of regression coef
  # code : 'contr.treatment' or 'contr.sum'
  # main : TRUE for main effect, FALSE for interaction effect
  #
  alpha <- 0.05
  tgt_power <- 0.9
  nsim <- 500
  pval <- rep(NA, nsim)
  pow <- 0
  n <- 8
  #
  while (pow < 0.9){
    #
    dat <- data.frame(a = gl(2, 1, n),
                      b = gl(2, 2, n))
    if (main) {
      f1 <- formula( ~ .)
      reg_coefs <- c(0, par, par)
    } else {
      f1 <- formula( ~ . * .)
      reg_coefs <- c(0, par, par, par / 2)
    }
    X <- model.matrix(f1, data = dat,
                      contrasts = list(a = code_gen,
                                       b = code_gen))
    for (i in 1 : nsim){
      #
      dat$y <- X %*% reg_coefs + rnorm(n)
      #
      if (main) {
        fit <- lm(y ~ ., data = dat,
                  contrasts = list(a = code,
                                   b = code))
        pval[i] <- as.list(summary(fit))$coefficients[3, 4]
      } else {
        fit <- lm(y ~ . * ., data = dat,
                  contrasts = list(a = code,
                                   b = code))
        pval[i] <- as.list(summary(fit))$coefficients[4, 4]
      }
    }
    pow <- mean(pval < alpha)
    if (pow < tgt_power) n <- n + 4
  }
  n
}

In the experiment, we vary how we generate the data, and how we fit the models, there are four scenarios

Summary of experimental results:

It does not matter how the data are analyzed, the powers mostly depend on how we generate the data. This is pretty comforting, but means that we do need to be careful about how we generate the data when conducting power studies.

tm1 <- get_nsamp(main = TRUE)
ti1 <- get_nsamp(main = FALSE)
tm2 <- get_nsamp(par = 1 / 2, code_gen = "contr.sum", main = TRUE)
ti2 <- get_nsamp(par = 1 / 2, code_gen = "contr.sum", main = FALSE)
sm1 <- get_nsamp(code = "contr.sum", main = TRUE)
si1 <- get_nsamp(code = "contr.sum", main = FALSE)
sm2 <- get_nsamp(par = 1 / 2, code = "contr.sum",
                 code_gen = "contr.sum", main = TRUE)
si2 <- get_nsamp(par = 1 / 2, code = "contr.sum",
                 code_gen = "contr.sum", main = FALSE)

results <- data.frame(
  code_gen = gl(2, 2, 8, labels = c("treatment", "sum")),
  code_analysis = gl(2, 4, 8, labels = c("treatment", "sum")),
  power = gl(2, 1, 8, labels = c("main effect power", "interaction power")),
  nsamp = c(tm1, ti1, tm2, ti2, sm1, si1, sm2, si2))
knitr::kable(results)
code_gen code_analysis power nsamp
treatment treatment main effect power 48
treatment treatment interaction power 632
sum treatment main effect power 48
sum treatment interaction power 160
treatment sum main effect power 44
treatment sum interaction power 636
sum sum main effect power 48
sum sum interaction power 156

So, why does the data generation affect the results so much? We’ll talk next time! But I find this to be a convincing argument in favor of calculating powers based on anticipated outcomes, rather than anticipated regression coefficients.