Science and Statistics: July 2012

Monday, July 16, 2012

Methodism and Ockham's Razor

In my previous post I drew a rough-and-ready distinction between evidentialists and methodists in the philosophy of science. Evidentialists focus on evidential relations between data and evidence, whereas methodists think that these relations are illusory or unimportant and instead focus on the reliability of our methods.

In this post I want to motivate methodism by telling what I consider to be one of its success stories: justifying Ockham's razor. This story belongs to Kevin Kelly and his students (see their project site here.)

Ockham's razor is very popular among scientists, philosophers of science, and popular science writers. It says, roughly, that we ought to be biased in favor of simple theories. It is easy to motivate Ockham's razor by appealing to historical episodes: Copernicus' heliocentric theory is to be favored over Ptolemy's because it posits fewer circles; Newton's unification of terrestrial and celestial physics is to be favored over Aristotelian alternatives because it posits fewer fundamental laws; Einstein's special relativity is to be favored over its Newtonian predecessors because it does away with the ether and a favored rest state. And so on. Appeals to Ockham's razor are everywhere in popular discussions of physics and sometimes crop up in other areas of science as well.

The apparent importance of Ockham's razor to science is difficult for philosophers of science to explain because it is hard either to explain what "simplicity" is in science or to see how a fixed bias toward simplicity could help guide us toward the truth. As Kelly puts it, if we already know that the truth is simple, then we don't need Ockham's razor. And if we don't already know that the truth is simple, what entitles us to assume that it is?

Kelly and students have proven that Ockham's razor in some sense converges to the truth more efficiently than other methods. Here is a simple example to illustrate their idea. Imagine a machine that spits out either a green ball or a blue ball once a second for all eternity. Your job is to watch the balls that come out of the machine and to produce a conjecture about how many times the color of the balls it spits out will change. You have no background knowledge about how the machine works beyond the fact that it always spits out a green ball or a blue ball and that the number of color changes is finite.

Suppose that the first ten balls that come out of the machine are green. You might be tempted to conjecture that the color never changes. But suppose that the stakes are high and you decide to wait. After one minute, you still have only seen green balls. Likewise after a day, a week, a month, and a year.

At what point do you give in and conjecture that the color never changes? The urge to do so seems irresistible, but there is a problem: with no background knowledge beyond that the machine always spits our a green ball or a blue ball, you have no basis for saying that the probability that the next ball will be green is higher than the probability that the next ball will be blue.

This is where Kelly's justification of Ockham's razor comes in. Kelly and his students propose that simplicity is a matter of asymmetric underdetermination: the data will never refute a complex theory if a simpler theory is true, but it will eventually refute a simpler theory if a more complex theory is true. For instance, the data will never refute "one color change" if "zero color changes" is true, but they will eventually refute "zero color changes" if "one color change" is true. Similarly, "one color change" is simpler than "two color changes," which is simpler than "three color changes," and so on.

Why follow Ockham's razor by giving the simplest answer compatible with the data? Let M be a method of producing conjectures for this problem that converges to the truth in the limit of inquiry on any possible data sequence. M must eventually take the bait and settle on the answer "zero color changes" given a long enough data sequence with no color changes; otherwise, it would fail to converge to the truth. Suppose that it takes the bait after seeing 1,324,235 consecutive green balls. We have no assurance whatsoever that it won't be forced to retract its answer by a blue ball showing up at any later time. Suppose that the 1,543,213th ball is blue. Then "zero color changes" is refuted. M might then abstain from giving an answer for a long time, having been burned once before. But given a long enough sequence of blue balls, it must eventually take the bait and settle on the answer "one color change," again on pain of failing to converge to the truth. There is no assurance whatsoever that it will not be forced to retract that answer later on.

M must eventually posit the simplest answer compatible with experience, on pain of failing to converge to the truth in the limit. It can then be forced to retract if that answer is refuted. In this way, any method that converges to the truth in the limit can be forced to retract "up the complexity hierarchy." By contrast, only methods that violate Ockham's razor can be forced to retract "down the complexity hierarchy." Nature simply has to wait such methods out.

Thus, by never positing an answer other than the simplest one compatible with experience, a method that conforms to Ockham's razor minimizes the number of retractions it makes in the worst case. That's Kelly et al.'s justification of Ockham's razor.

This justification has two important features. First, it says only that Ockham methods are better than non-Ockham methods with respect to worst-case retraction loss. One can't say anything about expected-case retraction loss without imposing a probability distribution on the problem; but if one is in a position to impose a probability distribution on the problem, then one can just perform Bayesian inference based on the posterior distribution without worrying about Ockham's razor. Best-case retraction loss doesn't favor Ockham's razor because a method that starts with a complex answer could get lucky: nature could refute all of the simpler theories in succession and then decline to refute that one for eternity. In addition to the fact that it only concerns worst-case retraction loss, Kelly et al.'s justification is limited in that it provides no assurance at all that a simpler answer is more likely to be correct than a more complex answer. It speaks only about the reliability of methods, and not about the intrinsic belief-worthiness of some hypotheses given some data. It is a methodist justification only, not an evidentialist one.

Despite the weakness of this justification, I consider it a success story for methodism. Kelly et al.'s justification for Ockham's razor may not give us everything we want, but it does seem to give us everything we need. The point is not to get us excited about Ockham's razor. It is, rather, to investigate what if anything can be said in favor of this rule that we are already naturally inclined to follow. And it does give us something that does seem worth taking into consideration, at least when prior probabilities are unavailable.

It is interesting that frequentist null hypothesis significance tests can be thought of as instances of Ockham's razor if we loosen the notion of refutation a bit so that an answer is considered refuted if it is outside a confidence interval with a specified confidence level. In this sense, you can never refute a complex alternative hypothesis if a point null hypothesis is true, but you can eventually refute the null if the alternative is true; thus, the null is simpler than the alternative. The orthodox frequentist practice of acting as if the null were true until it is refuted thus conforms to Ockham's razor.

Nice Quote about Bayesian Methods

I am not a Bayesian because I lack self-confidence. That is, you have to have enough self-confidence to have a specific prior on things, and I do not think I know enough about things to have a specific prior. I may have a general prior on some things. I think that a good Bayesian, that is, a Bayesian who picks a prior that has some value to it, is better than a non-Bayesian. And a bad Bayesian who has a prior that is wrong is worse than a non-Bayesian, and I have seen examples of both. What I do not know is how do I know which is which before we evaluate the outcome.

--Clive Granger

Methodism and Frequentism

Some philosophers of science focus on questions like "What are the necessary and sufficient conditions for data D to count as evidence for hypothesis H?" A few think that questions of this kind are misguided and instead focus on questions like "How do the methods used in science help us get closer to the truth?" I will call the first group evidentialists and the second group methodists. (With apologies to mainstream epistemologists, who use these terms in slightly different ways, as well as to the relevant group of mainline Protestants. And also to my readers for the fact that this distinction may not be entirely clear and my terminology across posts may not be entirely uniform. This research is a work in progress!)

Frequentist methods tend to appeal to methodists because they are typically taken to derive their warrant not primarily from intuitions about evidential relations but instead from their long-run operating characteristics. For instance, a uniformly most powerful level α null hypothesis significance test is typically taken to be a good procedure (when it is so taken) because it has the highest probability of rejecting the null hypothesis when it is false among all tests that have probability α or less of rejecting the null hypothesis when it is true. By contrast, Bayesian methods are typically taken to derive their warrant from considerations of coherence or consistency with plausible axioms about rational inference or behavior.

Although the typical way of motivating frequentist methods has a methodist behavior while the typical way of motivating Bayesian methods does not, a methodist need not be a frequentist. The frequentist methods in common use are appealing because they can be proven to have certain desirable operating characteristics when their assumptions are satisfied. However, their desirable operating characteristics are typically much weaker than they seem at first glance and are compatible with terrible performance in important respects. And Bayesian methods often perform better; they just can't be proven to have any particular truth-related virtues (except from the standpoint of the degrees of belief of a particular agent) because their performance depends on the relationship between their priors and the true state of affairs. Methodists are not committed to preferring methods with weak but provable performance characteristics over methods that may or may not perform better depending on the circumstances.

Wednesday, July 11, 2012

A Lesson from Glymour's Theory and Evidence

In his Theory and Evidence, Clark Glymour presents his "bootstrap" theory of confirmation, which is essentially a conceptual analysis of the notion of confirmation (i.e. positive evidential relevance) that attempts to extend the hypothetico-deductive account to allow a theory to be confirmed by evidence that is derived from the very theory in question, in just the cases in which that kind of circularity is unproblematic. Glymour's theory is extremely ingenious but faces at least one decisive objection (given here). Attempts to patch the theory to address that objection did not get very far, and the theory was more or less abandoned.

I suspect that the most important passage in Theory and Evidence is not one in which Glymour presents or defends his theory, but the final paragraph of the book in which he reflects on what he has accomplished. His insight and his candor are remarkable:

There is nothing in this book that corresponds to an attempt to show that the methods I have described are justified or uniquely rational. There are arguments for the methods, arguments that purport to show that the strategy achieves our intuitive demands on confirmation relations better than do competing strategies, but these arguments do not show that the bootstrap strategy will lead us to the truth in the short run or the long run, or will lead us to the truth if anything can, or is required by some more primitive canon of rationality. There are such arguments for other confirmation theories, although none of them are wholly good arguments; perhaps it would be better to have a bad argument than of one of these kinds for the for the bootstrap strategy than to have none at all. But the strategy does not depend on frequencies, and there is as yet no framework of degrees of belief within which to place it, and so none of the usual lines of argument seems relevant at all. One has only the sense that unless the world is perversely complex, the strategy will help us to locate false hypotheses and separate them from true ones. A sense is not an argument. I am partly consoled by the thought that one cannot do everything; sometimes, alas, one cannot do very much at all.

To me the message of this passage and of the fate of Theory and Evidence and many similar projects is that conceptual analysis is an ineffective way to arrive at normative conclusions, even if the concept being analyzed (e.g. confirmation/evidence) has normative import. Showing that a normative claim follows from more basic normative claims is perhaps a more promising approach, but to my mind the best approach of all is to show that abiding by a normative claim will help us achieve some end we care about. If it doesn't do that, then why should we regard it as normative?

Saturday, July 7, 2012

Unfortunately, "Axiomatic" Doesn't Imply "Useful"

But we may look at the purpose of tests from another view-point. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. Here, for example, would be such a "rule of behaviour": to decide whether a hypothesis, H, of a given type be rejected or not, calculate a specified character, x, of the observed facts; if x>x_0, reject H, if x≤x_0, accept H. Such a rule tells us nothing as to whether in a particular case H is true when x≤x_0, or false when x>x_0. But it may often be proved that if we behave according to such a rule, then in the long run we shall reject H when it is true not more, say, than once in a hundred times, and in addition we may have evidence that we shall reject H sufficiently often when it is false. —Jerzy Neyman and Egon Pearson, 1933

A lot of work in philosophy asks a question of the form "What is x?" and attempts to answer that question by positing necessary and sufficient conditions for something to be an x. This sort of work is called conceptual analysis. For instance, many philosophers of science have asked, "What is evidence?" and attempted to answer that question by saying that data D provide evidence for hypothesis H if and only if D and H stand in some relation R relative to one's background knowledge K.

It's typically easy to make a conceptual analysis task sound interesting and important, at least to the philosophically inclined. Here's a way to motivate the question "What is evidence?" that I have used in teaching. Start with the question "What is science?" or more specifically, "How is science different from (and better than?) other ways of attempting to learn about the world, e.g. astrology, reading sacred texts, etc.?" This question is easy to motivate. Take debates about whether or not intelligent design should be taught in public school science classrooms. Typically, these debates center on whether intelligent design is science or not. In the Kitzmiller vs. Dover Area School District case, for instance, philosophers of science were brought in to testify on this issue. Thus, the question of what constitutes science is central to a big debate in our society that a lot of us already care about.

According to many (most?) philosophers of science today, the question "What is science?" is a red herring. The views of the judge in the Kitzmiller case notwithstanding, the important question is not whether, e.g., intelligent design is science or not, but something more like whether intelligent design is well supported by our evidence or not. Thus, rather than asking "What is science?" we should be asking "What is evidence?"

In general, it is easy to motivate the task of conceptual analysis by pointing out that the concept being analyzed does some important work for us but is not entirely clear. However, conceptual analysis has many problems. For one, it almost never works. Typically, a conceptual analysis is produced; then a counterexample is presented; then the analysis is reworked to get around that counterexample; then a new counterexample is produced; then the analysis is reworked again to get around the new counterexample; and so on until the analysis becomes so ugly and messy that nobody finds it very appealing any more. At some point in this process, a rival analysis is produced which then undergoes the same process. The result is a collection of ugly and messy analyses none of which really work but each of which has vociferous defenders who engage in interminable debates that to most of us seem boring and pointless, even though the question being debated can be made to seem interesting and important.

Perhaps there is a deeper problem with conceptual analysis, namely that questions of the form "what is x?" are not actually interesting and important after all. Recall the motivation I gave for the question "what is evidence?" I said that many philosophers believe that the question of whether intelligent design should be taught in public school science classes should be focused on the question of whether intelligent design is well supported by evidence or not, rather than the question of whether intelligent design is science or not. Thus, the important question is not, "what is science?" but rather "what is evidence?" However, it's not obvious that we need an answer to the question "what is evidence?" in order to judge whether intelligent design is well supported by evidence or not. In fact, many philosophers call acts of reasoning that assume that we cannot know whether something is an x until we have a definition of x instances of the "socratic fallacy." We can tell whether something is green or not just by looking at it. We don't have to know what green is. Evidence seems to work the same way, to some extent at least.

Perhaps we don't need a conceptual analysis of "evidence" in order to judge whether intelligent design is well supported by evidence or not, but such an account would nevertheless be useful in clarifying our intuitive judgments and allowing us to articulate and defend them. That seems right, but it's a moot point if attempts to give a conceptual analysis of "evidence" are bound to fail as nearly all attempts at conceptual analysis fail.

If not conceptual analysis, then what should philosophers be doing? One option is to forget about defining terms like "evidence" and instead to develop axiomatic theories of such notions. Philosophers can operate in "Euclidean mode" rather than "Socratic mode," as Glymour puts it. Euclid did give "definitions" of terms like "point" and "line," but they were more like dictionary entries than conceptual analyses. They don't hold up under philosophical scrutiny. But so what? Euclid developed a beautiful and powerful theory about points and lines and such. It's not clear what rigorous definitions of "point" and "line" would add to his theory, practically speaking.

The same is true of probability theory and the theory of causation. Philosophers still debate about what "probability" and "causation" mean, but these questions have little or no importance for working statisticians. On the other hand, the theories based on Kolmogorov's axioms and on the Causal Markov and Faithfulness Conditions (the latter developed by philosophers) are of great importance to practitioners.

Philosophers typically like thinking for its own sake. But they have should at least try to do work that matters. Conceptual analysis typically does not matter, especially after the obvious options have been laid out and explored and the field has more or less reach a stalemate.

However, it is not a foregone conclusion that an axiomatic theory will matter either. In fact, I suspect that the Likelihood Principle is part of an axiomatic theory that doesn't matter. Hence my dissatisfaction with my project.

Specifically, the Likelihood Principle is part of an axiomatic theory of evidence, or rather, equivalence of evidential meaning. Allan Birnbaum initiated the development of this theory when he showed that the Likelihood Principle follows from the conjunction of the Sufficiency Principle and the Weak Conditionality Principle. I have attempted to improve it by showing that the Likelihood Principle follows from the conjunction of the Weak Evidential Equivalence Principle and the Experimental Conditionality Principle. I think this proof makes it harder for frequentists to deny that their methods fail to respect equivalence of evidential meaning. But it only follows that we ought not use frequentist methods on the assumption that we ought not use methods that fail to respect equivalence of evidential meaning. That assumption begs the question against Neyman's view of tests as rules of behavior, as given in the quote at the top of this post.

My view is that restrictions on methods are justified only insofar as they help us achieve our practical and epistemic goals. Respecting equivalence of evidential meaning is not for me a basic epistemic goal. nor does it seem to be related in any straightforward way to success in achieving my basic practical and epistemic goals (see my previous post). Thus, I at least seem to have no use for an axiomatic theory of evidential equivalence, and thus, it seems, no use for the Likelihood Principle. Bummer.

Friday, July 6, 2012

Evidential Equivalence and Truth

In my previous post, I discussed a concern I have about the project I had been planning to pursue in my dissertation. Briefly, I was planning to defend the Likelihood Principle, which says that certain sets of experimental outcomes are evidentially equivalent. This Likelihood Principle seems interesting because frequentist methods, which are the statistical methods most widely used in science, do not respect evidential equivalence if the Likelihood Principle is true. Here's the problem: it's not obvious to me that we ought to insist on using statistical methods that respect evidential equivalence.

One might claim that respecting evidential equivalence is an end in itself. All I can say in response tp this claim is that it is not an end I personally care about in itself. I care about epistemic goals that involve some sort of correspondence between a proposition and the world, such as truth, truthlikeness, approximate truth, and empirical adequacy. I care about respecting evidential equivalence and other epistemic virtues like coherence that concern the internal ordering of a set of propositions only insofar as they help in achieving those goals.

To my mind, then, we ought to insist on using statistical methods that respect evidential equivalence only to the extent that doing so would help us achieve what we might loosely call correspondence goals.

Would insisting on using statistical methods that respect evidential equivalence help us achieve correspondence goals? There doesn't seem to be any way to answer this question across the board as far as I can see. Frequentist methods (which fail to respect evidential equivalence) have probability one of achieving a certain level of performance with respect to correspondence goals in the indefinite long run when their assumptions are satisfied, while Bayesian methods (which respect evidential equivalence) provide no such guarantees but rather may perform better or worse than frequentist methods in a given respect on a given problem depending on the priors used and the actual state of affairs. This comparison suggests that we ought to use Bayesian methods when we are reasonably confident that our priors are good, and use frequentist methods otherwise (or something like that). That conclusion is not very exciting, and it doesn't give us any interesting story to tell about how respecting evidential equivalence helps us get to the truth.

Thursday, July 5, 2012

Prospectus

I have a nearly complete draft of my prospectus that I've been working on for weeks, but I'm starting to have doubts about my project that have led me back to the drawing board, i.e. this blog. In this post I'll explain what I have been trying to do in my prospectus and why I now think it might be a bad idea.

I've been writing about an idea called the Likelihood Principle. The Likelihood Principle says, roughly, that the evidential meaning of a given datum with respect to a set of hypotheses depends only on how probable that datum is according to those hypotheses.

The Likelihood Principle seems important because some statistical methods respect it while others do not. That is, if you take a set of data that are all evidentially equivalent according to the Likelihood Principle, some statistical methods would yield the same output given any datum in that set (assuming that prior probabilities and utilities are held fixed) while others would in some cases yield different outputs for different members of that set. Specifically, the kinds of statistical methods that are most common in science (usually called "classical" or "frequentist") do not respect the Likelihood Principle, while their main rivals (Bayesian methods) as well as some more obscure alternatives (primarily, Likelihoodist methods) do respect the Likelihood Principle.

In addition, there are arguments for the Likelihood Principle that are controversial within statistics but have received little attention from philosophers of science. These arguments are to a point at least independent of debates about the use of prior probabilities in science that have dominated Bayesian/frequentist debates in the past. Thus, thinking about the Likelihood Principle looks like a promising way to make progress in a debate that has become rather stale remains very important for statistics, and thus for science, and thus for the philosophy of science.

In my philosophy comprehensive paper, I gave a new proof of the Likelihood Principle that I still believe is a significant improvement on previous proofs such as the relatively well-known proof given by Allan Birnbaum in 1962 (article). I have since refined this paper and have had it accepted to present at the 2012 biennial meeting of the Philosophy of Science Association this November.

The project I have been working on for my prospectus would involve bolstering my proof of the Likelihood Principle in various ways and then considering its implications for statistics, science, and the philosophy of science. The problem I have been struggling with for a few weeks now is that I'm not sure that the Likelihood Principle does have any important implications.

What I've just written probably seems rather puzzling. If frequentist statistics violates the Likelihood Principle, and the Likelihood Principle is true, then doesn't it follow that we shouldn't use frequentist statistics?

Well, no. The Likelihood Principle as I prefer to formulate it says only that certain sets of experimental outcomes are evidentially equivalent. To derive from the claim that the Likelihood Principle is true the conclusion that we ought not use frequentist statistics, one needs to assume that we ought to use only statistical methods that always produce the same output given evidentially equivalent data as inputs. That assumption might seem innocuous, but it isn't, for at least two reasons:

1. It begs the question against frequentist views according to which the idea of an evidential relationship between data and hypothesis is misguided and we should think instead in terms of methods and their operating characteristics.

2. In practice, even subjective Bayesian statisticians violate the Likelihood Principle all the time (e.g., by using prior elicitation methods that involve estimating the parameters of a conjugate prior distribution. The conjugate prior family used depends on the sampling distribution, in violation of the Likelihood Principle.)

I'll discuss these points more in a subsequent post.

Back to Blogging

I'm happy to announce that I passed my comps in the fall of 2011 and have since been working on my prospectus. I actually have a draft of a prospectus almost finished, but I've been having doubts about my project that have led me back here. I find that the process of putting my thoughts into writing forces me to clarify them in a way that is very helpful when it's not clear how I should proceed. I can't promise not to abandon this blog again at some point in the future, but I think I'll be posting fairly regularly for a little while at least.

Science and Statistics