I’m giving a “Works in Progress” talk on Friday to explain my current position on frequentist responses to Birnbaum’s proof. Here is my abstract:
Frequentists appear to be committed to the sufficiency principle (S) and the conditionality principle (C). However, Birnbaum (1962) proved that (S) and (C) entail the likelihood principle (L), which frequentist methods violate. To respond adequately to Birnbaum’s theorem, frequentists must place restrictions on (S) and/or (C) that block Birnbaum’s proof and argue that those restrictions are well motivated. Restricting (C) alone will not suffice, because (S) by itself implies too much of the content of (L) for frequentists to accept it. Specifically, frequentists need to restrict (S) so that it does not apply to mixture experiments some of whose components have respective outcomes with the same likelihood function. Berger and Wolpert (1988, p. 46) claim that such a restriction would be artificial, but in fact it has a strong frequentist motivation: reduction to the minimal sufficient statistic in such an experiment throws away information about what sampling distribution is appropriate for frequentist inference.
I’ll try to explain the basic argument here. Start with the claim that (S) by itself implies too much of the content of (L) for frequentists to accept it. Kalbfleisch makes this point in his (1975) as a criticism of Durbin’s proposal to restrict (C) rather than (S). Consider two experiments E1 and E2. E1 involves flipping a coin five times and reporting the number of heads. E2 involves flipping a coin until it comes up heads and reporting the number of flips required. Suppose that in both experiments the flips are i.i.d. Bernoulli. Imagine an instance of E1 and E2 in which one gets one head in E1, and five flips in E2, so that both E1 and E2 consist of flipping a coin five times and getting heads once. The likelihood principle says that these two outcomes have the same evidential meaning.
The sufficiency principle does not imply that those outcomes of E1 and E2 have the same evidential meaning. However, it does say that they would have had the same evidential meaning if they had been two outcomes of one experiment rather than outcomes of two different experiments. So consider a mixture experiment E* that involves first flipping a coin two decide whether to perform E1 or E2 and then performing the selected experiment. (The coin used to decide which experiment to perform should be distinct from the coin used in E1 or E2 with independent bias.) According to the sufficiency principle, the outcome that consists of performing experiment E1 and getting one head has the same evidential meaning as the outcome that consists of performing experiment E2 and getting five tosses when each is performed as part of the mixture experiment E*. To get the result that they also have the same evidential meaning when performed outside of E*, one needs to appeal to something like (C). (This is essentially how Birnbaum proves that (S) and (C) entail (L).) However, to go so far but no farther seems rather unreasonable. As Kalbfleisch puts it, “In order to reject (L) and accept (S) one must attach great importance to the possibility of choosing randomly between E1 and E2” (p. 252). To avoid adopting this strange position, someone who rejects (L) should reject (S) as well.
The minimal restriction on (S) that blocks Birnbaum’s proof is to modify (S) so that it does not apply to mixture experiments some of whose components have respective outcomes with the same likelihood function. Berger and Wolpert say that this restriction “seems artificial, there being no intuitive reason to restrict sufficiency to certain types of experiments” (1988, p. 46). One can flesh out an argument along these lines as follows. The following argument for (S) is compelling and completely general:
Conditional on the value of a sufficient statistic, which outcome occurs is independent of the parameters of the experimental model. Independent variables do not contain information about one another. Thus, conditional on a sufficient statistic, which outcome occurs does not contain any information about the parameters of the experimental model. Therefore, the evidential meaning of an experimental outcome is the same as the evidential meaning of an outcome corresponding to the same value of the sufficient statistic.
Because this argument makes no assumptions about whether the experiment in question is pure or mixed, it would be artificial to restrict (S) to non-mixture experiments.
The problem with this argument (from a frequentist perspective) is that it assumes that the experimental model appropriate for frequentist inference is fixed in advance, regardless of which outcome occurs. But in a mixture experiment, (C) implies that which experimental model is appropriate for frequentist inference depends on which component experiment is performed. When the components of the mixture experiment have respective outcomes with the same likelihood function, reduction to the minimal sufficient statistic throws away the information about which component experiment was performed. Thus, applying (S) to a mixture experiment some of whose components have respective outcomes with the same likelihood function is inappropriate from a frequentist perspective.
I think this is quite a good frequentist response to Berger and Wolpert’s objection. The challenge for a frequentist is to make the needed restriction on (S) precise in a defensible way. Berger and Wolpert claim that the distinction between mixture and non-mixture experiments is difficult if not impossible to characterize clearly, suggesting that this challenge will not be easy to meet. As long as the distinction appears to be real, however, a frequentist need not be bothered too much by difficulties in formulating it precisely.
I have argued that frequentists need to restrict (S). Fortunately for the frequentist, the needed restriction is well-motivated from a frequentist perspective. I should note that frequentists may also need to restrict (C). They certainly do need to do so if Evans, Fraser, and Monette (1986) are correct in their claim that (C) alone implies (L). I have just begun looking at their paper. They point out that the fact that seemingly innocuous principles (S) and (C) imply the highly controversial principle (L) should be a clue that there is more to (S) and (C) than meets the eye. They seem to think that Birnbaum’s way of characterizing experimental models is too simple and that with a more adequate approach (L) would no longer follow from appropriately modified versions of (S) and (C). It looks like their paper will take some time to digest but will be well worth the effort.