Allan Birnbaum’s 1962 paper “On the Foundations of Statistical Inference” purports to prove that the conditionality and sufficiency principles—which frequentists typically accept—jointly entail the likelihood principle—which frequentists typically reject. The likelihood principle is an important consequence of Bayesianism. Moreover, many of the consequences of Bayesianism that frequentists typically find objectionable (e.g., the stopping rule principle) follow from the likelihood principle alone. Thus, once one accepts the likelihood principle and its consequences, there is little to stop one from becoming a Bayesian. The prominent Bayesian L. J. Savage said that he began to take Bayesian seriously “only through the recognition of the likelihood principle.” As a result, he called the initial presentation of Birnbaum’s paper “really a historic occasion.”
In this post, I will discuss why both frequentists and Bayesians find the conditionality principle attractive, and I will provide Birnbaum’s formulation of that principle.
In rough intuitive terms, the conditionality principle says that only the experiment that was actually performed is relevant for interpreting that experiment’s results. Stated this way, the principle seems rather obvious. For instance, suppose your lab contains two thermometers, one of which is more precise than the other. You share your lab with another researcher, and you both want to use the more precise thermometer for today’s experiments. You decide to resolve your dispute by tossing a fair coin. Once you have received your thermometer and run your experiment, there are two kinds of methods you could use to analyze your results. One kind of method is an unconditional approach, which assigns margins of error in light of the fact that you had a 50/50 chance of using either thermometer, without taking into account which thermometer you actually used. The other kind is an conditional approach, which assigns margins of error to measurements in light of the thermometer you actually used, ignoring the fact that you might have used the other thermometer. Most statisticians regard as highly counterintuitive the idea that you should take into account the fact that you might have used a thermometer other than the one you actually used in interpreting your data. Thus, most statisticians favor the conditional approach in this case. The conditionality principle is designed to capture this intuition.
To express the conditionality principle precisely, it will be necessary to introduce some notation. Birnbaum models an experiment as having a parameter space Ω of vectors θ, a sample space S of vectors x, and a joint probability distribution f(x, θ) defined for all x and θ. He writes the outcome x of experiment E as (E, x), and the “evidential meaning” of that outcome as Ev(E, x). He does not attempt to characterize the notion of evidential meaning beyond the constraints given by the conditionaly, sufficiency, and likelihood principles. Each of those principles states conditions in which two outcomes of experiments have the same evidential meaning.
Birnbaum expresses the conditionality principle in terms of the notion of a mixture experiment. A mixture experiment E involves first choosing which of a number of possible “component experiments” to perform by observing the value of a random variable h with a known distribution independent of θ, and then taking an observation xh from the selected component experiment Eh. One can then represent the outcome of this experiment as either (h, xh) or, equivalently, as (Eh, xh). The conditionality principle says that Ev(E, (Eh, xh))=Ev(Eh, xh). In words, the evidential meaning of the outcome of a mixture experiment is the same as the evidential meaning of the corresponding outcome of the component experiment that was actually performed.
The above discussion of the conditionality principle follows a pattern that is common in philosophy: start with an intuition-pumping example, then state a principle that seems to capture the source of the intuition at work in that example. It takes only a little experience with philosophical disputes to become suspicious of this pattern of reasoning. There are always many general principles that can be used to license judgments about particular cases, and there are typically counterexamples to whatever happens to be the most “obvious” or “natural” general principle. Take, for instance, theories of causation. It is easy to give examples to motivate, say a David Lewis-style counterfactual analysis of causation. For instance, the Titanic sank because it struck an iceberg. Analysis: the Titanic struck an iceberg and sank, and if it hadn’t struck that iceberg then it wouldn’t have sunk. In general: c causes e if and only if c and e both occur, and if c hadn’t occurred then e wouldn’t have occurred. This analysis seems to capture what’s going on in the Titanic example, but counterexamples abound. For instance, suppose that (counterfactually, so far as I know) there had been a terrorist on board the Titanic who would have sabotaged it and caused it to sink the next day if it hadn’t struck the iceberg and sunk. Presumably, one still wants to say in this scenario that the iceberg caused the Titanic to sink. Nevertheless, the Titanic would have sunk even if it hadn’t struck the iceberg. Typically in philosophical debates, a counterexample like this one leads to a revision of the original analysis that blocks the counterexample; that revised analysis is then subjected to another counterexample, which leads to further revision; and this counterexample-revision-counterexample cycle iterates until the analysis becomes so complex that the core idea that motivated the original analysis starts to seem hopeless. That idea is abandoned, a new idea is proposed, and the process is repeated with that new idea.
In short, my training in philosophy inclines me to be suspicious of Birnbaum’s conditionality principle, even though it seems to capture what’s going on in the simple thermometer example. Because of the conditionality principle’s technical and specialized nature, however, it is not as easy to think of potential counterexamples. I will table this concern for now; in future posts, I will discuss counterexamples to the conditionality principle that statisticians have proposed, and revisions to that principle they have suggested.
Great post, Greg! I'm looking forward to seeing the counter-examples.
ReplyDeleteOn another note, I would be interested to get your take on stopping rules and their relation to Bayesian statistics and to the likelihood principle. I had thought that Bayesians rejected stopping rules, but then I saw a remark by Gelman that suggests they do not. I haven't tracked down the example in his book, yet.
That's interesting. Gelman often has valuable insights, so I'd like to know what he has in mind. The comments on the blog post you linked give some hints, but nothing very clear (to me). They suggest that Gelman does not have in mind that stopping rules are relevant pre-data but not post-data, which is one standard line; rather, he seems to hold that stopping rules are relevant to prediction about future observations but not to inference about parameter values.
ReplyDeleteI checked out Gelman’s reference. The example he refers to shows that stopping rules are relevant to posterior predictive model checking, and thus not entirely irrelevant to Bayesians as DiNardo asserts. Gelman does not appear to deny the stopping rule principle, which says that the stopping rule is irrelevant to inferences about the parameters of an assumed model.
ReplyDeleteThe example Gelman uses to illustrate posterior predictive model checking is the following sequence of binomial outcomes, modeled as iid Bernoulli with Pr(1)=p: 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0. The sequence appears to be autocorrelated, contrary to the iid assumption. A possible posterior predictive model check is to generate by simulation a probability distribution over the number of “switches” from 0 to 1 and 1 to 0 that would occur if the sequence were iid, using the posterior distribution over p generated by the experiment. The observed number of switches (3) has a low P-value in this distribution (0.028), so the iid assumption should be rejected. (I don’t know how the use of the P-value in this case fits with Gelman et al.’s Bayesian commitments.)
Incidentally, I found amusing Gelman’s comment in the blog post you linked “Forget Urbach (1985), whoever he is.” That comment illustrates the sociological gap between Bayesian statistics and Bayesianism in philosophy far better than Gelman’s complaints about DiNardo’s article.