The principle of marginality (PM or MP) is a statistical principle (not a mathematical principle) that guides the way researchers should interpret linear models. (some PDF notes)
If a factor effect is marginal to another effect in the model, then we neither test not interpret that factor effect. Further, if interactions are included in the regression model, then all lower order main effects should be included in the model as well.
A main idea behind PM, in my opinion, is that one of the goals of statistics is to summarize without misleading. PM is one tool we can use to avoid misleading consumers of statistics. A price we pay for using the principle is a moderate reduction in the amount of inferences that can be obtained from a complicated regression model fit. What is gained is a more coherent view of statistical modeling.
What is marginality in this context?
Marginality here refers to terms in a linear regression model that lie in a subspace of higher order terms included in the model. Some examples:
- The main effect of factor A is marginal to the interaction effect of A:B.
- The intercept is marginal to the main effect of A (and all other factor effects).
- The main effects of A, B, and C and the two way interactions A:B, A:C, and B:C are marginal to the three way interaction A:B:C.
- In the regression model
y ~ 1 + A + B, the intercept is marginal to A and B, but neither A nor B are marginal to any term (there is no interaction term).
Further, it does not matter if the interactions are due to crossing or nesting.
Suppose we fit the regression model
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 \times x_2 + \varepsilon. \]
The model contains an interaction between \(x_1\) and \(x_2\), so PM advises that we should not test or interpret the regression coefficients \(\beta_0, \beta_1, \beta_2\).
We ignore these factor effects because they do not honestly describe the factor effects in the presence of an interaction term (if we believe the regression model is a remotely true model.) The main effects are important to the extent that they help us correctly interpret the value of the interaction coeffient, \(\beta_3\), but their values can be extremely misleading when the interaction is truly non-zero.
This is logical: If the interaction effect is large, then the effect of \(x_2\) depends on the value of \(x_1\), and speak of some average effect can be misleading. More precisely, the effect of (say) \(x_2\) is \(\beta_2 + \beta_3\times x_1\) in the \(0-1\) contrast coding scheme, clearly a function of \(x_1\). What meaning can be prescribed to the average effect of \(x_2\) in this case? PM says none at all.
History of PM
Regression model marginality was described in the Nelder’s A reformulation of linear models.
Marginality is not emphasized in statistical education these days. I hypothesize that this is because it makes regression models somewhat less useful and explainable, because PM places strict limits on what can be explained by the model.
PM is attractive to statisticians who take the regression model strictly as an inferential procedure that models the effects of response conditional on all the covariates.
Unlike Type I and Type II SS, Type III SS does not adhere to the principle of marginality. Type III SS is the resulting SS that would be obtained for the variable if it was the last variable entered into the model. Under Type III SS, we can obtain the marginal SS of a main effect after its interaction SS had been put in the model.
While Type III SS does solve some real problems in analysis of variance, we shouldn’t use this to test main effects when interactions are present.
See Exegeses on Linear Models (Venables) for a discussion of marginality and Type III SS in R.
R adheres to PM inconsistently (my opinion). While it does not support Type III SS and estimated marginal means out-of-the-box, it does allow users easily test and interpret regression model main effects using the \(p\)-values while interactions are in the model. PM is the core of why certain basic statistical procedures, like Type III SS and emmeans, are absent from R, but available in SAS and Stata.
Argument against PM
While the principle is coherent, we can easily imagine situations where PM can lead to a lot of headaches. For example, a clinical trial where a drug is believed to have a heterogeneous effect – maybe the drug has different effects on men and women. If the heterogeneous effect is estimated by a regression model, then the average treatment effect (ATE) should not be tested or interpreted. This makes discussing the effect of the drug very difficult. It becomes more difficult when the drug’s effect interacts with several covariates.