In this post, I present Birnbaum’s formulation of the likelihood principle and explain why frequentists reject and Bayesians accept this principle. Again, the big picture: frequentists typically accept conditionality and sufficiency principles while rejecting the likelihood principle. The likelihood principle is a central tenant of Bayesianism that follows directly from using Bayes’ theorem as an update rule. Many of the features of Bayesianism that frequentists find objectionable follow from the likelihood principle alone, so it is a short step from accepting the likelihood principle to becoming a Bayesian. Birnbaum argues that the conditionality and sufficiency principles are equivalent to the likelihood principle, putting significant pressure on frequentists to justify their position.
You should not be surprised to learn that the likelihood principle appeals to the notion of a likelihood; or, more precisely, a likelihood function. Birnbaum models an experiment as having a well-defined joint probability density f(x, θ) for all x in its sample space and all θ in its parameter space. This joint density implies a conditional density f_X|Θ(x|θ) for each θ. The likelihood function is this conditional density considered as a function of θ rather than x, defined up to an arbitrary multiplicative constant c: L( θ|x)=cf_X|Θ(x|θ). Roughly speaking, the likelihood function tells you how probable the model makes the data as a function of that model’s parameters.
Most frequentists are happy to use the likelihood function of an experiment in certain specific ways, such as maximum likelihood estimation and likelihood ratio testing. However, they do not accept the likelihood principle, which says, roughly, that all of the information about θ an experiment provides is contained in the likelihood function of θ. As Birnbaum formulates it, the likelihood principle is (like the conditionality and sufficiency principles) a claim about evidential equivalence. Specifically, it asserts that if the two experiments E and E’ with a common parameter space produce respective outcomes x and y that determine proportional likelihood functions, then Ev(E, x)=Ev(E’, y).
Consider two possible experiments. In the first experiment, you decide to spin a coin 12 times, and it comes up heads 3 times. Assuming that the spins are independent and identically distributed Bernoulli trials with the probability of heads on a given trial equal to p, the likelihood function of this outcome (up to an arbitrary multiplicative constant) is L(p|x=3) = (12 choose 3) 3^p 9^(1-p). In the second experiment, you decide to spin the coin until you obtain 3 heads. As it turns out, heads comes up for the third time on the 12th spin. Assuming that again the spins are independent and identically distributed Bernoulli trials with the probability of heads on a given trial equal to p, the likelihood function of this outcome (up to an arbitrary multiplicative constant) is L(p|x=12) = (11 choose 2) 3^p 9^(1-p). The likelihood functions for these two experiments are proportional, so the likelihood principle says that they have the same evidential meaning.
Standard frequentist methods say, contrary to the likelihood principle, that these two experiments do not have the same evidential meaning. (In fact, the second experiment but not the first allows one to reject the null hypothesis p=.5 at the .05 level in a one-sided test.) Frequentist methods are based on P values, where a P value is, roughly, the probability of a result at least as extreme as the observed result. The likelihood principle implies that only the outcome actually obtained in an experiment is relevant to the evidential interpretation of that experiment. Because P values refer to results other than the result that actually occurred (namely, the unrealized results that are at least as extreme as the observed result), they are incompatible with the likelihood principle.
This conflict between the likelihood principle becomes particularly stark when one considers “try and try again” stopping rules, which direct one to continue sampling until one achieves a particular result, such as a specific P value or posterior probability. For instance, a possible stopping rule is to continue collecting data until one achieves a nominally .05 significant result; that is, a result that appears to be significant at the P=.05 level if one analyzes the data as if a fixed-sample size stopping rule had been used. A frequentist would insist that data gathered according to this stopping rule does not have the same evidential meaning as the same data gathered according to a fixed-sample stopping rule. After all, the experiment with the “try and try again” stopping rule is guaranteed to generate a nominally .05 significant result, so its real P value is not .05, but 1.
Bayesians argue on the contrary that it is absurd to make the evidential meaning of an experiment sensitive to the stopping rule used. Why should the evidential meaning of a result depend on an experimenter’s intentions, which are after all “inside his or her head?”
There is much more that could be said about frequentist-Bayesian disputes about the relevance of stopping rules to inference. For present purposes, it is enough to note that the relevance of stopping rules for frequentist tests violates the likelihood principle.
The likelihood principle, while unacceptable to frequentists, is a simple consequence of the use of Bayes’ theorem as an update rule. According to Bayes’ theorem, the posterior probability of a hypothesis is equal to its prior probability times its likelihood, divided by the average of the prior probabilities of all hypotheses in the hypothesis space weighted by their likelihoods. Thus, given a prior distribution over the hypothesis space, the posterior probability of a hypothesis depends only on the likelihood function. (The arbitrary multiplicative constant included in the likelihood can be factored out of both the top and the bottom of Bayes’ theorem, so it cancels out.) Thus, Bayesianism implies the likelihood principle.