The problem with correlated predictors is that there’s no way to pin-point the unique effect of the individual predictors. Well, that’s a bit of a lie. We can determine partial effects if the model is correctly specified.
Here’s the simulation: Let’s take two correlated predictors, and , and say they have some joint effect on .
This code block simulates the correlated variates. and are correlated with a Pearson’s correlation of .
Details
library(MASS)
rho <- 0.9
beta <- c(3, -1, 1)
dat <- data.frame(mvrnorm(100, c(5, 5), matrix(rho, 2, 2) + diag(1 - rho, 2)))
colnames(dat) <- c('x', 'z')
mat <- model.matrix( ~ x + z, dat)
We can create the response and tack on some random noise. This is a standard linear regression model .
Details
dat$y <- mat %*% beta + rnorm(100)
If we take a look at the model output, we will see that the effects are correctly recovered:
Details
lm(y ~ ., dat)
##
## Call:
## lm(formula = y ~ ., data = dat)
##
## Coefficients:
## (Intercept) x z
## 3.4054 -0.7082 0.6497
But the “issue”1 is that correct effect identification depends on correct model specification.
Here are the “raw” effect estimates:
Details
lm(y ~ x, dat)
##
## Call:
## lm(formula = y ~ x, data = dat)
##
## Coefficients:
## (Intercept) x
## 3.61735 -0.09406
Details
lm(y ~ z, dat)
##
## Call:
## lm(formula = y ~ z, data = dat)
##
## Coefficients:
## (Intercept) z
## 2.82566 0.06294
What is going on here?
The key point is that marginal effects are not the same as conditional effects. The model y ~ x + y
estimates the effect of x
on y
conditional on a value of z
, and the effect of z
on y
conditional on a value of x
. However, y ~ x
estimates the effect of x
on y
unconditionally.
When data are collected from an orthogonal designed experiment it happens that these effects coincide, and so model mis-specification isn’t such a big deal. In any other case, we need to have some good idea of what we wish to condition on. This is my “resolution” to Simpson’s “paradox”.
Thomas Lumley discussed this issue very clearly on his blog.
for practitioners looking for causal effects↩︎