Comparison: Roll-ups vs. Post-Hoc Linear Contrast

John Haman

2020/10/01

Categories: Statistics Tags: Statistics

The nice thing about a designed experiment is that the data are always conditioned on the factors. Variance resides within conditions and between conditions. In a typical ANOVA model, the residual variance is an estimate of the ‘within condition’ variance, and the fixed effects explain the between condition variability. Fancier models extend this basic idea to more complex situations.

In this note I’ll show that consideration of the ‘within condition’ variance is beneficial, even if the goal to do inference on the overall mean \(E(y)\).

Suppose we take an experiment with two factors. It can even be unbalanced. Let’s make a design with two factors, each with two levels.

library(nlme) # for data
set.seed(4850)

dat <- Oats[sample(nrow(Oats), 20), ]
dat$nitro <- as.factor(dat$nitro) # for ease of exaplanation
t <- table(dat[, 2 : 3])
t
##              nitro
## Variety       0 0.2 0.4 0.6
##   Golden Rain 0   1   1   3
##   Marvellous  4   3   0   3
##   Victory     2   3   0   0

So obviously the design is unbalanced.

The overall mean and standard error of the data are:

with(dat, mean(yield))
## [1] 95.25
with(dat, sd(yield) / sqrt(length(yield)))
## [1] 5.311792

Can we do better? The surprising answer is yes! If we take the mean to be the linear combinations of regression parameters (aka fixed effects) that happens to equal the overall average, we can achieve smaller standard error.

For example:

fit <- lm(yield ~ nitro + Variety, data = dat)

Now, we can use the table to make weights.

w <- t / nrow(dat)
wr <- apply(w, 1, sum)[2 : 3]
wc <- apply(w, 2, sum)[2 : 4]

weights <- c(1, wc, wr)

The weights can be used to take the appropriate linear combination of regression parameters. Of course, this in not unique, but I only choose it to show that \(E(y)\) can be calculated from the DOE model.

weights %*% coef(fit)
##       [,1]
## [1,] 95.25

Yep, it’s the same. What about the standard error?

sqrt(weights %*% vcov(fit) %*% weights)
##          [,1]
## [1,] 4.314097

Much smaller than before!

So we can conclude that if the weights are interesting, we can get an accuracy boost by using the linear model.