The likelihood principle (LP) is a statistical principle that can be stated, roughly, as follows:
All the infomation/evidence regarding a model parameter is contained in the
likelihood function
Or equivalently:
If two likelihoods are proportional, then the information/evidence concerning the
parameters should be the same.
These are not the most precise definitions, but I’m trying to convey the spirit of LP instead of the most accurate definition, which may suffer from some interpretability issue.
LP is used by some philosophers of statistics to claim that frequentist methods are unacceptable. Methods like confidence intervals and \(p\)-values depend on “objects” (data) that are not observed, and not part of the likelihood function, and so are easily dismissed by LP-followers.
A quick explanation: \(p\)-values violate LP because they depend on “data” that did not occur. The \(p\)-value is \(Pr(t(X) > t(x) | H_{0})\) for a right-tail event, where \(x\) is the observed data, \(t\) is a statistic, and \(X\) are fictitious data from the data generating mechanism defined by \(H_{0}\). Observations greater than \(t(x)\) did not occur, and therefore should not count as evidence for/against any model parameter. This is hopefully clear as I called \(X\) fictitious data.
The \(p\)-value is no issue from a frequentist perspective, because fictitious data \(X\) under \(H_{0}\) is the foundation of their techniques. For the LP-follower, the \(p\)-value should be rejected on philosophical grounds because \(X\) did not occur, and had \(X\) occurred, one would have conditioned on it. We don’t just leave data on table!
It is claimed that most Bayesian methods automatically adhere to LP because data can only alter outcomes through the likelihood function. However, prior distributions are an important (necessary!) part of Bayesian analysis. Further, only a Bayesian analysis that uses subjective priors is truly adherent to LP. This is because the prior distribution can only be understood in the context of the likelihood function. For example, here are a number of Bayesian techniques that violate LP:
Most prior specification methods that depend on a data transformation. For example, if data are centered and scaled, then a weakly-informative prior is determined for one or more parameters that is “scale invariant”. The standard deviation of \(x\) is data used to adjust the prior, clearly violating LP.
Any ‘empirical’ Bayes method. This class of methods is maximally data dependent, because the prior is picked specifically to maximize a marginal data statistic.
Prior adjustments (aka calibration) that followed as the result of a posterior predictive check.
Essentially all ABC methods. (But I give a pass here, because there is no likelihood function to begin with.)
Jeffrey’s Priors, Maximum Entropy priors (Jaynes), Reference priors.
Virtually all modern Bayesian analysis takes advantage of weakly informative priors and/or posterior predictive checks, so all this research is found to be in violation of LP. Again, the only Bayesian analysis that adheres to LP is the truly subjective Bayesian analysis that does not budge or even need to check a posterior predictive diagnostic.
The Bayesian violation of LP is more subtle than the frequentist violation, but it nevertheless exists. It may be the case that the only way to have your statistical research published is to violate LP!
However, I’d argue that it is exactly these features (along with the wonderful advancements of modern computation) that have brought Bayesian analysis to the masses.
So, while I believe the LP-kosher analysis is something that we should strive to conduct (following Robinson ), it is unfair to say that Bayesian methods follow LP, especially considering the way we conduct Bayesian analysis in 2020.
On this basis, we should not take LP too seriously. However, I still think LP is a paragon of good statistical practice. Researchers should strive to stick as closely as possible to it.