Sometimes we use univariate data summaries to capture in the information from a multi-factor experiment. For example, when we study the effect of correlated factors A and B on Y, but summarize the effects separately. It is well known to statisticians that the marginal effect interpretation of regression coefficients falls part when factors are correlated. Nevertheless, researchers will probably always try to pinpoint marginal effects, even when their estimation is hazardous.

So what is lost when we ignore the correlations between our factors?

This is just an easy way to make a correlated design.

```
set.seed(4850)
design <- data.frame(
A = rbinom(40, 1, 0.5),
B = rbinom(40, 1, 0.5)
) %>% arrange(A, B)
cor(design)
```

```
## A B
## A 1.0000000 0.4505636
## B 0.4505636 1.0000000
```

## One summary at a time

```
## # A tibble: 2 x 2
## A `mean(y)`
## <int> <dbl>
## 1 0 4.50
## 2 1 8.32
```

```
## # A tibble: 2 x 2
## B `mean(y)`
## <int> <dbl>
## 1 0 4.99
## 2 1 8.02
```

`## [1] 3.820703`

`## [1] 3.02835`

What is the problem? Its that both effects cannot be true *simultaneously*
when the design covariates are correlated! There is only so much variation
in the data.

## Fixed effects

So the effects are about 4 and 3 respectively. Let’s pop open the old lm table to confirm that:

```
##
## Call:
## lm(formula = y ~ ., data = design)
##
## Coefficients:
## (Intercept) A B
## 4.068 3.080 1.642
```

Oh, this is quite a bit different. lm says the effect of A is 3, and the effect of B is 1.6, much smaller than what the simple effects seem to imply.

What’s going on here? Is the effect of b 1.5 or 3? I don’t think this experiment is set up to measure the effect of b. If the effect of b is \(E(y|b=1)-E(y|b=0)\), the design is only set up to measures differences that are conditional on A. And A is not some random variable that can just be integrated out.