In my previous post, I presented Mayo’s reconstruction of Howson’s argument against error statistics:
- An abnormal result e is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast disease exists.
- H passes a severe test and thus H is indicated according to (*).
- But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of H given e is still very low (and that of J is still very high).
- Therefore, “intuitively,” H is not indicated but rather J is.
- Therefore (*) is unsound.
In this post, I will examine the reasons Mayo gives for rejecting premise 2.
Again, in the paper I am presently considering* Mayo expresses her severity requirement as follows:
- (*): e is a good indication of H to the extent that H has passed a severe test with e.
where a test is “severe” with respect to H if and only if that test has a very low probability of passing H if H is false.
Howson gives an example of a medical test in which the hypothesis that a given patient has the disease in question (which in Mayo’s version of the example is breast cancer) appears to pass a severe test with a positive result, yet the posterior probability of the hypothesis that the patient has the disease conditional on the positive result is low. He takes this case to be a counterexample which shows that Mayo’s severity requirement is unsound.
Mayo responds in part by denying that the severity requirement has been met in this case. That is, she rejects premise 2 in her reconstruction of Howson’s. What reasons does she give for doing so?
First, after protesting a bit about the idealized nature of Howson’s example (which seems to me irrelevant to the point at issue), Mayo says that she will try to apply her severity requirement to it. She does so as follows (p. S208):
a. An abnormal result e is a poor indication of the presence of disease more extensive than d if such an abnormal result is probable even with the presence of disease no more extensive than d.
b. An abnormal result e is a good indication of the presence of disease as extensive as d if it is very improbable that such an abnormal result would have occurred if a lesser extent of disease were present.
Mayo has in mind a more realistic case than Howson’s, in which a disease can be present to varying extents. However, if her account aspires to provide a general theory of evidence, then it should apply to binary cases as well. Thus, in the context of this debate it seems unfair of Mayo to change the example. Sticking with Howson’s actual example, Mayo’s (a) and (b) reduce to the claim that a positive test result indicates the presence of disease to the extent that a positive result is improbable if the disease is absent.
At times, Mayo seems to be claiming something weaker for her severity requirement than that it provides a general theory of evidence—for instance, that it provides a reasonable guide for inductive inference when we do not have a strong evidential basis for assigning prior probabilities. Moreover, she claims that this kind of situation is very common in science, which makes understanding her severity requirement and the error-statistical techniques that conform to it quite important for understanding scientific practice. It seems to me that Mayo is on firm ground here. Moreover, I suspect that there is a lot of room here for reconciling her approach with Bayesianism by showing, for example, that frequentist techniques provide reasonably good approximations to Bayesian methods when priors are not known with precision but are known to be not too extreme.
However, Mayo goes further by presenting her account as a rival to Bayesianism, and by arguing not only that Bayesian techniques are hard to apply in many cases, but also that they are vitiated by their frequent dependencies on epistemic probabilities. As Clark Glymour points out in his paper “Instrumental Probability,” the claim that Bayesianism and the use of epistemic probabilities are “too subjective” is often motivated by confusing justification with content. An epistemic probability is “subjective” in the sense that it is a property of an individual’s (idealized) belief state (content), but it may nevertheless have a strong “objective” evidential basis (justification). When the “objective” justification for an epistemic probability is strong, I see no reason to object to it and the grounds that its content is subjective.
Returning to Howson’s example, it certainly appears that a positive result in Howson’s example satisfies Mayo’s severity requirement, given that a positive result is quite improbable if the disease is absent (P~H(+)=.05 in Howson’s example, but that number can be made as small as one likes as long as the incidence rate/prior probability is adjusted downward to compensate). But Mayo denies that a positive result satisfies the severity requirement.
In support of this claim, Mayo points out that, unlike, the Neyman-Pearson framework for statistical tests, error statistics does not use automatic accept/reject rules; for instance, within the error-statistical framework one generally would not infer that a point null hypothesis is true from the fact that one fails to reject that null hypothesis at a pre-specified alpha level. The reason for this restraint is clear within the error-statistical framework: significance tests are unlikely to reject the null if the true value of parameter of interest is close to the null value relative to the power of the test. As a result, one cannot infer with severity from a failure to reject a point null hypothesis that that null hypothesis is true; one can at most infer with severity that the true value of the parameter is close to the null. (For instance, one might estimate a (1-alpha)% confidence interval for the parameter value, which would contain the null value.)
This move of Mayo’s seems to me a significant improvement in the Neyman-Pearson framework. However, it does not help in the case at hand, in which we are not considering a point null hypothesis about a continuous variable, but rather a hypothesis about a binary variable. On the other hand, understanding this aspect of Mayo’s account does help in understanding Mayo’s next move: Mayo claims that a failure to reject the hypothesis that the patient has breast cancer given a positive test result does not indicate that the patient has breast cancer so long as there are alternatives to this hypothesis that would very often produce the positive result.
Notice the analogy with a test of a hypothesized value for a parameter, which is presumably motivating Mayo’s claim here: one generally cannot conclude with severity that a point null hypothesis is true, because slight deviations from that point null are effectively indistinguishable from the null in a hypothesis test. In the same way, one cannot conclude with severity that a patient has breast cancer if there are other possible situations that would make a positive test outcome likely.
This requirement is surely too strong. Suppose that there is a non-diseased condition that mimics whatever sign or symptom of breast cancer the test picks up, generating false positives, but these condition is extremely rare—as rare as one likes. Then there would be an alternative to the hypothesis that the patient in question has breast cancer that would very often produce a positive result, but that there is no great need to investigate in concluding that the patient has breast cancer. Obviously one needs to rule out all possible alternatives to make a deductive inference, but one does not need to rule out incredibly rare/improbable alternatives to make a solid inductive inference.
There is a sound motivation behind Mayo’s claim that failure to reject H with a particular result does not indicate that H is true as long as there are alternatives to H that would make that result probable: in order to be telling in favor of a hypothesis, evidence must not only agree with that hypothesis but also speak against alternative hypotheses. However, the requirement goes too far. It seems that the sensible approach is to bring in prior probabilities and to require that any alternative hypotheses that would make the test outcome probable be themselves sufficiently improbable that the probability that any one of them is true is very small. Bayesianism implements this approach in a precise and well-motivated way, but in situations in which Bayes’ theorem is difficult to apply one could combine informal considerations of prior probability with the severity requirement to approximate Bayesian reasoning fairly well.
Getting back to the main point, does Mayo have a good argument against premise 2? I think not. In a realistic case, there could be many ways in which the hypothesis that a given person has a given disease could be false, some of which might make it probable that that the person would test positive for the disease despite not having it. Mayo would require ruling out such possibilities before declaring that the claim that the person has the disease is well supported. As long as we're considering her account as a theory of evidence, however, we need not be constrained by what would happen in most realistic cases. We can simply stipulate a hypothetical case in which the probability that someone who does not have the disease gets a positive result is .95, and that there are no further facts that would allow us to partition the set of people who do not have the disease into some who would get a positive result with high probability and some who would not. There are certainly many realistic cases in which we do not know any such facts, even if they exist, so this case in not so far removed from practice to be wholly uninteresting. In such a case, I do not see how Mayo's objections have any force against premise 2.
*The article I am considering is Deborah Mayo’s 1997 “Error Statistics and Learning from Error: Making a Virtue of Necessity.” It appeared in Philosophy of Science Vol. 64, Supplement. Proceedings of the 1996 Biennial Meeting of the Philosophy of Science Association. Part II: Symposia Papers (Dec., 1997), pp. S195-212. It is a response to Colin Howson’s “Error Statistics in Error,” pp. S185-S194 in the same issue.