Science and Statistics: Methodism and Ockham's Razor

In my previous post I drew a rough-and-ready distinction between evidentialists and methodists in the philosophy of science. Evidentialists focus on evidential relations between data and evidence, whereas methodists think that these relations are illusory or unimportant and instead focus on the reliability of our methods.

In this post I want to motivate methodism by telling what I consider to be one of its success stories: justifying Ockham's razor. This story belongs to Kevin Kelly and his students (see their project site here.)

Ockham's razor is very popular among scientists, philosophers of science, and popular science writers. It says, roughly, that we ought to be biased in favor of simple theories. It is easy to motivate Ockham's razor by appealing to historical episodes: Copernicus' heliocentric theory is to be favored over Ptolemy's because it posits fewer circles; Newton's unification of terrestrial and celestial physics is to be favored over Aristotelian alternatives because it posits fewer fundamental laws; Einstein's special relativity is to be favored over its Newtonian predecessors because it does away with the ether and a favored rest state. And so on. Appeals to Ockham's razor are everywhere in popular discussions of physics and sometimes crop up in other areas of science as well.

The apparent importance of Ockham's razor to science is difficult for philosophers of science to explain because it is hard either to explain what "simplicity" is in science or to see how a fixed bias toward simplicity could help guide us toward the truth. As Kelly puts it, if we already know that the truth is simple, then we don't need Ockham's razor. And if we don't already know that the truth is simple, what entitles us to assume that it is?

Kelly and students have proven that Ockham's razor in some sense converges to the truth more efficiently than other methods. Here is a simple example to illustrate their idea. Imagine a machine that spits out either a green ball or a blue ball once a second for all eternity. Your job is to watch the balls that come out of the machine and to produce a conjecture about how many times the color of the balls it spits out will change. You have no background knowledge about how the machine works beyond the fact that it always spits out a green ball or a blue ball and that the number of color changes is finite.

Suppose that the first ten balls that come out of the machine are green. You might be tempted to conjecture that the color never changes. But suppose that the stakes are high and you decide to wait. After one minute, you still have only seen green balls. Likewise after a day, a week, a month, and a year.

At what point do you give in and conjecture that the color never changes? The urge to do so seems irresistible, but there is a problem: with no background knowledge beyond that the machine always spits our a green ball or a blue ball, you have no basis for saying that the probability that the next ball will be green is higher than the probability that the next ball will be blue.

This is where Kelly's justification of Ockham's razor comes in. Kelly and his students propose that simplicity is a matter of asymmetric underdetermination: the data will never refute a complex theory if a simpler theory is true, but it will eventually refute a simpler theory if a more complex theory is true. For instance, the data will never refute "one color change" if "zero color changes" is true, but they will eventually refute "zero color changes" if "one color change" is true. Similarly, "one color change" is simpler than "two color changes," which is simpler than "three color changes," and so on.

Why follow Ockham's razor by giving the simplest answer compatible with the data? Let M be a method of producing conjectures for this problem that converges to the truth in the limit of inquiry on any possible data sequence. M must eventually take the bait and settle on the answer "zero color changes" given a long enough data sequence with no color changes; otherwise, it would fail to converge to the truth. Suppose that it takes the bait after seeing 1,324,235 consecutive green balls. We have no assurance whatsoever that it won't be forced to retract its answer by a blue ball showing up at any later time. Suppose that the 1,543,213th ball is blue. Then "zero color changes" is refuted. M might then abstain from giving an answer for a long time, having been burned once before. But given a long enough sequence of blue balls, it must eventually take the bait and settle on the answer "one color change," again on pain of failing to converge to the truth. There is no assurance whatsoever that it will not be forced to retract that answer later on.

M must eventually posit the simplest answer compatible with experience, on pain of failing to converge to the truth in the limit. It can then be forced to retract if that answer is refuted. In this way, any method that converges to the truth in the limit can be forced to retract "up the complexity hierarchy." By contrast, only methods that violate Ockham's razor can be forced to retract "down the complexity hierarchy." Nature simply has to wait such methods out.

Thus, by never positing an answer other than the simplest one compatible with experience, a method that conforms to Ockham's razor minimizes the number of retractions it makes in the worst case. That's Kelly et al.'s justification of Ockham's razor.

This justification has two important features. First, it says only that Ockham methods are better than non-Ockham methods with respect to worst-case retraction loss. One can't say anything about expected-case retraction loss without imposing a probability distribution on the problem; but if one is in a position to impose a probability distribution on the problem, then one can just perform Bayesian inference based on the posterior distribution without worrying about Ockham's razor. Best-case retraction loss doesn't favor Ockham's razor because a method that starts with a complex answer could get lucky: nature could refute all of the simpler theories in succession and then decline to refute that one for eternity. In addition to the fact that it only concerns worst-case retraction loss, Kelly et al.'s justification is limited in that it provides no assurance at all that a simpler answer is more likely to be correct than a more complex answer. It speaks only about the reliability of methods, and not about the intrinsic belief-worthiness of some hypotheses given some data. It is a methodist justification only, not an evidentialist one.

Despite the weakness of this justification, I consider it a success story for methodism. Kelly et al.'s justification for Ockham's razor may not give us everything we want, but it does seem to give us everything we need. The point is not to get us excited about Ockham's razor. It is, rather, to investigate what if anything can be said in favor of this rule that we are already naturally inclined to follow. And it does give us something that does seem worth taking into consideration, at least when prior probabilities are unavailable.

It is interesting that frequentist null hypothesis significance tests can be thought of as instances of Ockham's razor if we loosen the notion of refutation a bit so that an answer is considered refuted if it is outside a confidence interval with a specified confidence level. In this sense, you can never refute a complex alternative hypothesis if a point null hypothesis is true, but you can eventually refute the null if the alternative is true; thus, the null is simpler than the alternative. The orthodox frequentist practice of acting as if the null were true until it is refuted thus conforms to Ockham's razor.

Science and Statistics

Monday, July 16, 2012

Methodism and Ockham's Razor

No comments:

Post a Comment

Labels

Blog Archive