Science and Statistics: April 2011

Wednesday, April 27, 2011

Term Paper on Birnbaum's Proof

Here's a pdf of the term paper described in the previous post. It has many loose ends, but I think it's a good start on an exciting project.

Tuesday, April 26, 2011

Abstract of a Term Paper on Birnbaum's Proof

I'm writing a term paper that I hope will serve as a preliminary step toward my philosophy comp. Here's the abstract.

Frequentist methods of statistical inference violate the likelihood principle (L). However, Birnbaum [4] proved that (L) follows from speciﬁc versions (S) and (C ) of two principles—the suﬃciency principle and the conditionality principle, respectively—to which frequentists appear to be committed. In a recent publication [15], Mayo notes that Birnbaum’s proof “has generally been accepted by frequentists, likelihoodists, and Bayesians alike” (p. 307). Nevertheless, she argues that the proof is fallacious (chapter 7(III)). Mayo’s critique involves replacing Birnbaum’s (S) and (C ) with diﬀerent formulations of the principles of suﬃciency and conditionality (S’) and (C’). Mayo shows that (S’) and (C’) do not entail
(L) but gives no reason to doubt Birnbaum’s theorem that (S) and (C) entail (L). While Mayo thus fails to show that Birnbaum’s proof is fallacious, her critique does raise the important question whether (S) and (C) or (S’) and (C’) are better formulations of the principles of suﬃciency and conditionality. I canvas a few arguments that have been oﬀered on either side of this issue. On balance, these arguments appear to favor Birnbaum’s position. However, they are not suﬃciently compelling to declare the issue resolved.

I think it's a good start. For the comp, I would like to address other responses to Birnbaum's argument in addition to Mayo's and to have something more definite to say about how we should interpret the sufficiency and conditionality principles.

Friday, April 22, 2011

Birnbaum's Proof Part II: The Details

The notation needed to explain Birnbaum's proof in detail outruns the capabilities of Blogger, so I wrote it up as a pdf. My conclusions are essentially unchanged from my rough sketch of the argument: Birnbaum's proof is valid, but I am suspicious that he has not formulated the principles of conditionality and sufficiency properly. If you interpret the principle of conditionality not as a statement about evidential equivalence but instead as a directive about how to analyze experimental results (which seems appropriate to me at this time), then it is incompatible with the principle of sufficiency as formulated by Birnbaum, and indeed with any principle that can do the work that the principle of sufficiency does in Birnbaum's proof. Another way to undermine Birnbaum's proof would be to insist, as Durbin (1970) does, that a conditional analysis can only condition on a variable that is part of the minimal sufficient statistic, although that move seems to me less appropriate to me at this time.

I did realize in examining Birnbaum's proof that it is only appropriate for experiments with discrete sample spaces. However, I do not think that this limitation is serious because the fact that no measurement is completely precise means that all real experiments have discrete sample spaces, the idea of a continuous sample space being only a useful idealization.

Saturday, April 16, 2011

Birnbaum's Proof Part 1: The Rough Idea

The centerpiece of Birnbaum's 1962 paper is his proof that the conditionality and sufficiency principles (as he formulates them) entail the likelihood principle (as he formulates it). This proof is significant, again, because frequentists generally accept conditionality and sufficiency but do not accept the likelihood principle, which follows from Bayesianism and implies many of the consequences of the Bayesian position that frequestists find objectionable. In a future post, I will delve into the details of Birnbaum's proof; in this post, I just want to display its overall structure and introduce the objections it has received.

Birnbaum considers two experiments that have pairs of respective outcomes—call them “star pairs”—that determine proportional likelihood functions. He then constructs a hypothetical mixture experiment with these two experiments as its components. By the conditionality principle, an outcome of either component experiment has the same as the evidential meaning of the corresponding outcome of the mixture experiment. Now, there is a sufficient statistic that lumps together outcomes of the mixture experiment corresponding to star pair outcomes of the component experiments. By the sufficiency principle, then, these outcomes of the mixture experiment have the same evidential meaning. Given that an outcome of the mixture experiment has the same evidential meaning as a corresponding outcome of a component experiment, and that outcomes of the mixture experiment corresponding to star pair outcomes of the two component experiments have the same evidential meaning as one another, it follows that star pair outcomes of the two component experiments have the same evidential meaning as one another. That is just what the likelihood principle asserts.

Call the two component experiments E and E’, respectively; call the mixture experiment E*; and let (x*, y*) be a "stair pair," with x* an outcome of E and y* an outcome of E’. Then the following diagram depicts the structure of Birnbaum’s proof, using lines to indicate evidential equivalence and denoting above each line which principle is invoked to establish equivalence:

Several objections to this proof have appeared in the statistics literature (e.g. Durbin 1970, Cox and Hinkley 1974, Kalbfleisch 1974, Joshi 1990) and at least one in the philosophy literature (Mayo 2011), but it is still widely accepted. I am suspicious of Birnbaum's proof, but I do not yet have confidence in any precise diagnosis of where it goes wrong.

One objection to Birnbaum's proof toward which I am sympathetic says that the conditionality princple should be understood not as a claim about evidential equivalence, but as a directive about how to analyze experimental results: thus, the conditionality principle says, "analyze experimental results conditional on which experiment was actually performed." Understood in this way, the conditionality principle prohibits the use of the sufficient statistic that lumps together results from experiment E and experiment E', blocking Birnbaum's proof. (Kalbleisch develops a version of this idea in his 1974, but the specific way in which he develops it may be problematic.) Birnbaum denies that the conditionality principle is to be understood as a directive (1962, p. 281 and elsewhere), but it is not clear to me that he has good reasons for doing so.

Friday, April 15, 2011

Birnbaum's Likelihood Principle

In this post, I present Birnbaum’s formulation of the likelihood principle and explain why frequentists reject and Bayesians accept this principle. Again, the big picture: frequentists typically accept conditionality and sufficiency principles while rejecting the likelihood principle. The likelihood principle is a central tenant of Bayesianism that follows directly from using Bayes’ theorem as an update rule. Many of the features of Bayesianism that frequentists find objectionable follow from the likelihood principle alone, so it is a short step from accepting the likelihood principle to becoming a Bayesian. Birnbaum argues that the conditionality and sufficiency principles are equivalent to the likelihood principle, putting significant pressure on frequentists to justify their position.

You should not be surprised to learn that the likelihood principle appeals to the notion of a likelihood; or, more precisely, a likelihood function. Birnbaum models an experiment as having a well-defined joint probability density f(x, θ) for all x in its sample space and all θ in its parameter space. This joint density implies a conditional density f_X|Θ(x|θ) for each θ. The likelihood function is this conditional density considered as a function of θ rather than x, defined up to an arbitrary multiplicative constant c: L( θ|x)=cf_X|Θ(x|θ). Roughly speaking, the likelihood function tells you how probable the model makes the data as a function of that model’s parameters.

Most frequentists are happy to use the likelihood function of an experiment in certain specific ways, such as maximum likelihood estimation and likelihood ratio testing. However, they do not accept the likelihood principle, which says, roughly, that all of the information about θ an experiment provides is contained in the likelihood function of θ. As Birnbaum formulates it, the likelihood principle is (like the conditionality and sufficiency principles) a claim about evidential equivalence. Specifically, it asserts that if the two experiments E and E’ with a common parameter space produce respective outcomes x and y that determine proportional likelihood functions, then Ev(E, x)=Ev(E’, y).

Consider two possible experiments. In the first experiment, you decide to spin a coin 12 times, and it comes up heads 3 times. Assuming that the spins are independent and identically distributed Bernoulli trials with the probability of heads on a given trial equal to p, the likelihood function of this outcome (up to an arbitrary multiplicative constant) is L(p|x=3) = (12 choose 3) 3^p 9^(1-p). In the second experiment, you decide to spin the coin until you obtain 3 heads. As it turns out, heads comes up for the third time on the 12th spin. Assuming that again the spins are independent and identically distributed Bernoulli trials with the probability of heads on a given trial equal to p, the likelihood function of this outcome (up to an arbitrary multiplicative constant) is L(p|x=12) = (11 choose 2) 3^p 9^(1-p). The likelihood functions for these two experiments are proportional, so the likelihood principle says that they have the same evidential meaning.

Standard frequentist methods say, contrary to the likelihood principle, that these two experiments do not have the same evidential meaning. (In fact, the second experiment but not the first allows one to reject the null hypothesis p=.5 at the .05 level in a one-sided test.) Frequentist methods are based on P values, where a P value is, roughly, the probability of a result at least as extreme as the observed result. The likelihood principle implies that only the outcome actually obtained in an experiment is relevant to the evidential interpretation of that experiment. Because P values refer to results other than the result that actually occurred (namely, the unrealized results that are at least as extreme as the observed result), they are incompatible with the likelihood principle.

This conflict between the likelihood principle becomes particularly stark when one considers “try and try again” stopping rules, which direct one to continue sampling until one achieves a particular result, such as a specific P value or posterior probability. For instance, a possible stopping rule is to continue collecting data until one achieves a nominally .05 significant result; that is, a result that appears to be significant at the P=.05 level if one analyzes the data as if a fixed-sample size stopping rule had been used. A frequentist would insist that data gathered according to this stopping rule does not have the same evidential meaning as the same data gathered according to a fixed-sample stopping rule. After all, the experiment with the “try and try again” stopping rule is guaranteed to generate a nominally .05 significant result, so its real P value is not .05, but 1.

Bayesians argue on the contrary that it is absurd to make the evidential meaning of an experiment sensitive to the stopping rule used. Why should the evidential meaning of a result depend on an experimenter’s intentions, which are after all “inside his or her head?”

There is much more that could be said about frequentist-Bayesian disputes about the relevance of stopping rules to inference. For present purposes, it is enough to note that the relevance of stopping rules for frequentist tests violates the likelihood principle.

The likelihood principle, while unacceptable to frequentists, is a simple consequence of the use of Bayes’ theorem as an update rule. According to Bayes’ theorem, the posterior probability of a hypothesis is equal to its prior probability times its likelihood, divided by the average of the prior probabilities of all hypotheses in the hypothesis space weighted by their likelihoods. Thus, given a prior distribution over the hypothesis space, the posterior probability of a hypothesis depends only on the likelihood function. (The arbitrary multiplicative constant included in the likelihood can be factored out of both the top and the bottom of Bayes’ theorem, so it cancels out.) Thus, Bayesianism implies the likelihood principle.

Thursday, April 14, 2011

Birnbaum's Sufficiency Principle

In my previous post, I gave an example that motivates the conditionality principle and presented Birnbaum’s formulation of that principle. In this post, I do likewise for the sufficiency principle. To recap the big picture: frequentists typically accept conditionality and sufficiency principles while rejecting the likelihood principle. The likelihood principle is a central tenant of Bayesianism that follows directly from using Bayes’ theorem as an update rule. Many of the features of Bayesianism that frequentists find objectionable follow from the likelihood principle alone, so it is a short step from accepting the likelihood principle to accepting Bayesianism. Birnbaum argues that the conditionality and sufficiency principles are equivalent to the likelihood principle, putting significant pressure on frequentists to justify their position.

The sufficiency principle appeals to the notion of a sufficient statistic. Roughly speaking, a sufficient statistic lumps together some outcomes of an experiment that have the following property: given that some outcome in the lumped-together set occurred, which one of those outcomes occurred is independent of the parameters of the experiment. For instance, consider the outcome of a series of two coin tosses, where the tosses are assumed to be independent and identically distributed Bernoulli trials with probability p of heads. The outcome space for this experiment is the set of possible sequences of outcomes of two coin tosses: HH, HT, TH, TT. A sufficient statistic for this experiment is the number of heads. This statistic is sufficient because, given the number of heads, the exact sequence of heads and tails is independent of p.

The sufficiency principle says, roughly, that a sufficient statistic summarizes the results of an experiment with no loss of information. In other words, given the value t(x) of a statistic T(X) that is sufficient for θ, you don’t learn any more about θ by learning x. This claim is very widely accepted and appears to be well-motivated. x is independent of θ conditional on t(x), and it’s hard to see how one quantity could provide information about another quantity of which it is independent. For instance, the sufficiency principle says that, given the number of heads obtained in a sequence of n tosses, you don’t learn any more about p by learning the exact sequence of heads and tails. (This application of the likelihood principle requires the assumption that the tosses are independent and identically distributed Bernoulli trials; if the possibility that the tosses are non-independent were on the table, for instance, then information about sequence would be relevant and the number of heads would not be a sufficient statistic.)

Birnbaum formulates the likelihood principle, like the sufficiency principle, as a claim about evidential equivalence. Take an experiment E with outcome x and a derived experiment E’ with outcome t=t(x), where T(X) is a sufficient statistic; then Ev(E, x)=Ev(E’,t). In other words, reduction to a sufficient statistic does not change the evidential meaning of an experiment.

My comments at the end of the previous post are also appropriate here: it is a good idea to be wary of general principles insofar as they are motivated merely by the fact that they seem to capture the intuitions at play in simple examples. However, it is worth keeping in mind that the sufficiency principle is not motivated only (or even, I think, primarily) by its intuitive appeal in simple cases, but also by the general claim that one quantity cannot provide information about another quantity of which it is independent. Similarly, the conditionality principle is motivated by the general claim that experiments that were not performed are irrelevant to the interpretation of the experiment that was performed. However, the conditionality principle is formulated to apply to experiments that are “mathematically equivalent” to mixture experiments, so it is not clear that this general claim is general enough to warrant the principle.

Birnbaum's Conditionality Principle

Allan Birnbaum’s 1962 paper “On the Foundations of Statistical Inference” purports to prove that the conditionality and sufficiency principles—which frequentists typically accept—jointly entail the likelihood principle—which frequentists typically reject. The likelihood principle is an important consequence of Bayesianism. Moreover, many of the consequences of Bayesianism that frequentists typically find objectionable (e.g., the stopping rule principle) follow from the likelihood principle alone. Thus, once one accepts the likelihood principle and its consequences, there is little to stop one from becoming a Bayesian. The prominent Bayesian L. J. Savage said that he began to take Bayesian seriously “only through the recognition of the likelihood principle.” As a result, he called the initial presentation of Birnbaum’s paper “really a historic occasion.”

In this post, I will discuss why both frequentists and Bayesians find the conditionality principle attractive, and I will provide Birnbaum’s formulation of that principle.

In rough intuitive terms, the conditionality principle says that only the experiment that was actually performed is relevant for interpreting that experiment’s results. Stated this way, the principle seems rather obvious. For instance, suppose your lab contains two thermometers, one of which is more precise than the other. You share your lab with another researcher, and you both want to use the more precise thermometer for today’s experiments. You decide to resolve your dispute by tossing a fair coin. Once you have received your thermometer and run your experiment, there are two kinds of methods you could use to analyze your results. One kind of method is an unconditional approach, which assigns margins of error in light of the fact that you had a 50/50 chance of using either thermometer, without taking into account which thermometer you actually used. The other kind is an conditional approach, which assigns margins of error to measurements in light of the thermometer you actually used, ignoring the fact that you might have used the other thermometer. Most statisticians regard as highly counterintuitive the idea that you should take into account the fact that you might have used a thermometer other than the one you actually used in interpreting your data. Thus, most statisticians favor the conditional approach in this case. The conditionality principle is designed to capture this intuition.

To express the conditionality principle precisely, it will be necessary to introduce some notation. Birnbaum models an experiment as having a parameter space Ω of vectors θ, a sample space S of vectors x, and a joint probability distribution f(x, θ) defined for all x and θ. He writes the outcome x of experiment E as (E, x), and the “evidential meaning” of that outcome as Ev(E, x). He does not attempt to characterize the notion of evidential meaning beyond the constraints given by the conditionaly, sufficiency, and likelihood principles. Each of those principles states conditions in which two outcomes of experiments have the same evidential meaning.

Birnbaum expresses the conditionality principle in terms of the notion of a mixture experiment. A mixture experiment E involves first choosing which of a number of possible “component experiments” to perform by observing the value of a random variable h with a known distribution independent of θ, and then taking an observation x_h from the selected component experiment E_h. One can then represent the outcome of this experiment as either (h, x_h) or, equivalently, as (E_h, x_h). The conditionality principle says that Ev(E, (E_h, x_h))=Ev(E_h, x_h). In words, the evidential meaning of the outcome of a mixture experiment is the same as the evidential meaning of the corresponding outcome of the component experiment that was actually performed.

The above discussion of the conditionality principle follows a pattern that is common in philosophy: start with an intuition-pumping example, then state a principle that seems to capture the source of the intuition at work in that example. It takes only a little experience with philosophical disputes to become suspicious of this pattern of reasoning. There are always many general principles that can be used to license judgments about particular cases, and there are typically counterexamples to whatever happens to be the most “obvious” or “natural” general principle. Take, for instance, theories of causation. It is easy to give examples to motivate, say a David Lewis-style counterfactual analysis of causation. For instance, the Titanic sank because it struck an iceberg. Analysis: the Titanic struck an iceberg and sank, and if it hadn’t struck that iceberg then it wouldn’t have sunk. In general: c causes e if and only if c and e both occur, and if c hadn’t occurred then e wouldn’t have occurred. This analysis seems to capture what’s going on in the Titanic example, but counterexamples abound. For instance, suppose that (counterfactually, so far as I know) there had been a terrorist on board the Titanic who would have sabotaged it and caused it to sink the next day if it hadn’t struck the iceberg and sunk. Presumably, one still wants to say in this scenario that the iceberg caused the Titanic to sink. Nevertheless, the Titanic would have sunk even if it hadn’t struck the iceberg. Typically in philosophical debates, a counterexample like this one leads to a revision of the original analysis that blocks the counterexample; that revised analysis is then subjected to another counterexample, which leads to further revision; and this counterexample-revision-counterexample cycle iterates until the analysis becomes so complex that the core idea that motivated the original analysis starts to seem hopeless. That idea is abandoned, a new idea is proposed, and the process is repeated with that new idea.

In short, my training in philosophy inclines me to be suspicious of Birnbaum’s conditionality principle, even though it seems to capture what’s going on in the simple thermometer example. Because of the conditionality principle’s technical and specialized nature, however, it is not as easy to think of potential counterexamples. I will table this concern for now; in future posts, I will discuss counterexamples to the conditionality principle that statisticians have proposed, and revisions to that principle they have suggested.

Revised Topics

My philosophy comp topic has evolved gradually, while my history comp topic has changed drastically.

My current philosophy comp project begins with a 1962 paper in which Allan Birnbaum argues that two principles frequentist statisticians typically accept—the conditionality and sufficiency principles—imply a principle they typically reject—the likelihood principle. The likelihood principle is a consequence of Bayes’ theorem, and Bayes’ theorem provides perhaps the simplest way to implement the likelihood principle in statistical inference, so Birnbaum’s argument tends to push frequentists toward Bayesianism.

Birnbaum’s argument is famous among those interested in the philosophy of statistics, but it has been criticized. Several statisticians have argued that Birnbaum’s formulation of either the conditionality principle or the sufficiency principle is too strong, and that replacing it with a suitably weakened principle would not allow Birnbaum’s argument to go through. However, these statisticians disagree among themselves about how Birnbaum’s principles should be weakened, and their specific proposals have been criticized. Joshi and Mayo have raised stronger objections to Birnbaum’s argument, arguing that there is a flaw in Birnbaum’s logic rather than in his premises.

I do not yet know what to say about Birnbaum’s argument, but I think that with enough work I am bound to find something interesting. Whether Birnbaum is right or not, there is work to be done in pinpointing exactly where either he or his critics go wrong, and the results of such an analysis are likely to have significant implications for the foundations of statistics.

I am abandoning my history comp project based on the Millikan oil-drop experiment. Millikan’s notebooks have already received careful scrutiny, and after some preliminary work it is not clear that an experimental approach will yield any significant new insights in time for the comp deadline. Moreover, it has come to my attention that there is a researcher in Germany who is way ahead of me in tracking down and investigating extant versions of Millikan’s apparatus.

Instead, I am planning to write my history comp on a puzzling passage in Darwin’s Origin of Species. I wrote a paper on this topic last year and received encouraging comments on it and suggestions for expanding it. In particular, I am planning to investigate how this passage changed through subsequent editions of the Origin and to look for evidence that might indicate why Darwin made the particular changes he did.

Science and Statistics