Science and Statistics: Error Statistics

Deborah Mayo, the most well-known defender of frequentist statistical methods among philosophers, has developed a theory of evidence based on frequentist methods that she calls error statistics. I find many of the things Mayo says about the use of statistics in science appealing. For instance, I am in sympathy with the claim which she often stresses that science is not well understood in terms of assigning probabilities to hypotheses, especially to high-level theories. It is typically better understood in terms of models that represent the world more or less accurately in various respects and attempts to probe those models for errors and alter or replace them to eliminate those errors. On the other hand, many features of her theory of evidence strike me as obscure and ill-motivated.*

The central claim of Mayo's error statistical theory of evidence is the severity requirement:

Data x in test T provide good evidence for inferring H (just) to the extent that hypothesis H has passed a severe test T with x.

Mayo does not say whether the notion of evidence she has in mind in this principle is absolute or incremental. That is, she doesn’t say whether the fact that a hypothesis has passed a severe test makes it (1) belief-worthy, or (2) more belief-worthy (than it was before the test, or than it would have been if the test hadn’t been performed, or than it would have been if it hasn’t passed the test, or something along those lines). This ambiguity creates many difficulties in Mayo’s discussion and defense of her account.

Obviously, the severity requirement requires a characterization of the notion of a severe test. Mayo offers the following:

Hypothesis H passes a severe test T with x if (and only if):
(i) x agrees with or “fits” H (for a suitable notion of fit), and
(ii) test T would (with very high probability) have produced a result that fits H less well than x does, if H were false or incorrect.

Mayo is deliberately vague about (i), allowing for the notion of “fit” to be cashed out in a variety of ways. (ii) is the key to her theory: whatever suitable notion of fit is used, in order for the result of a test to count in support of a theory, it has to be the case that a result that fits at least as well with the hypothesis as that result would be highly unlikely if the hypothesis were false. She calls (ii) a requirement of “high severity.”

It’s easy to give examples in which high severity seems like the right thing to require in order to regard a test outcome as evidence for a hypothesis. For instance, Mayo gives the example of a high school student, Isaac, taking a test of college readiness. If Isaac passed a test that only contained 6^th grade-level material, his passing that test would not be good evidence that he is college ready; his passing fits the hypothesis that he is college ready, but it is not the case that he would have been unlikely to pass if he were not college-ready. To get good evidence of his college-readiness, you would have to give him a more severe test, one that contained, say, 12^th grade-level material.

Mayo’s notion of a severe test encapsulates a core idea that is present in all plausible theories of evidence: for something to count as evidence in favor of a hypothesis, it has to be compatible with that hypothesis, and it has to speak against alternative hypotheses. What is at issue is whether Mayo’s way of spelling out this requirement is adequate.

Several philosophers have argued that following Mayo’s theory would lead one to endorse instances of the base-rate fallacy. They typically proceed by giving an example of a case in which, they claim, Mayo’s account would lead one to endorse the claim that given data are evidence for a given hypothesis, even though the hypothesis has a low posterior probability conditional on the evidence. Such examples are easy to cook up—they only require a sufficiently low prior probability. Mayo and other advocates of error statistics give at least two kinds of responses to these arguments. The first kind of response is to argue that the assignment of the posterior probability to the hypothesis is illegitimate because it violates well-motivated frequentist scruples about testing and the use of probabilities. The second kind of response is to admit that the error-statistical account can conflict with the assignment of posterior probabilities and to argue that this discrepancy is a sign of inadequacy in accounts of evidence that make posterior probabilities central, rather than an inadequacy in error statistics. As Mayo puts it, what matters for evidence is not whether a hypothesis is highly probable, but whether it has been highly probed.

Neither of these responses has much intuitive appeal for me. The first response appeals to scruples that generally strike me as overly strict. I understand not wanting to make relatively unconstrained personal probabilities central to scientific practice, but I don’t understand being totally opposed to any use of epistemic probabilities regardless of how strong the evidential basis for those epistemic probabilities may be. The second response seems radically implausible. If I understand her correctly, Mayo is saying that a hypothesis may be simultaneously highly belief-worthy and highly improbable. That strikes me as a strange thing to say.

I am not sure, however, that I fully understand either of these responses, and I’m not sure that the philosophers who have written articles criticizing the error-statistical account have taken the time to try to understand them either. Perhaps those critics have judged that it simply isn’t worth their time to examine the frequentist responses in detail, since they seem on a cursory examination to be rather confused. If they are right, then examining these responses might be somewhat a waste of my time. If nothing else, however, I think that a clear, cogent analysis of frequentist responses to the base-rate objection from someone who isn’t firmly in either the frequentist or the Bayesian camp would be a useful contribution to the literature on this topic.

The first response I sketched has been advanced by Aris Spanos in the latest edition of Philosophy of Science. I found this article too difficult to understand by skimming. I am planning to read it closely and to analyze and evaluate Spanos’s argument in a future post. The second has been advanced by Mayo in the article I am currently examining* (and in many other places, I believe). I believe that some advocates of error statistics favor a third kind of response that would bring error-statistical judgments in line with Bayesian posterior probabilities in contexts when Bayesian priors have good frequentist credentials by somehow regarding the information that the Bayesian priors provide into their severity assessments. I haven't seen this response in print, however, and I'm not sure exactly how it's supposed to go.

*My remarks in this post are based primarily on Mayo’s article “Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses” from the volume Scientific Evidence: Philosophical Theories & Applications, edited by Peter Achinstein.

Science and Statistics

Wednesday, January 26, 2011

Error Statistics

1 comment:

Labels

Blog Archive