- An abnormal result e is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast disease exists
- ~J passes a severe test and thus ~J is indicated according to (*). (Modified)
- But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of ~J given e is still very low (and that of ~H is still very high). (Modified)
- Therefore, “intuitively,” ~J is not indicated but rather ~H is. (Modified)
- Therefore, (*) is unsound.
A blog about my research at the intersection of the philosophy of science and statistics.
Saturday, January 29, 2011
Mayo’s Reasons for Rejecting Premise 4 of Howson’s Argument
Mayo’s Reasons for Rejecting Premise 2 of Howson’s Argument
- An abnormal result e is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast disease exists.
- H passes a severe test and thus H is indicated according to (*).
- But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of H given e is still very low (and that of J is still very high).
- Therefore, “intuitively,” H is not indicated but rather J is.
- Therefore (*) is unsound.
- (*): e is a good indication of H to the extent that H has passed a severe test with e.
Mayo's Reconstruction of Howson's Argument
- e: An “abnormal result,” in this case a positive test result (i.e., one that provides at least incremental evidence of breast cancer).
- H: The hypothesis that the patient has breast cancer.
- J: The hypothesis that breast disease is absent
- (*): The claim that e is a good indication of H to the extent that H has passed a severe test with e. (Mayo’s “severity requirement”)
UPDATE: I misunderstood Mayo here. J indicates the absence of breast disease of any kind, not just breast cancer, so it is not equivalent to ~H. See subsequent posts.
- An abnormal result e is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast cancer exists.
- H passes a severe test and thus H is indicated according to (*).
- But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of H given e is still very low (and that of J is still very high).
- Therefore, “intuitively,” H is not indicated but rather J is.
- Therefore (*) is unsound.
The Premises of Howson's Argument
- In the case of a positive test result, the severity requirement yields the conclusion that there is good evidence that the patient has the disease.
- In that same case, Bayes’ theorem yields the conclusion that the probability that the patient has the disease is low.
- The severity requirement and Bayes’ theorem are inference rules.
- The claim that there is good evidence that the patient has the disease is incompatible with the claim that the probability that the patient has the disease is low.
- If two inference rules applied to the same case yield incompatible conclusions, and one of those inference rules is sound in its application to that case, then the other inference rule is unsound.
- Bayes’ theorem is sound in its application to the case of a positive test result.
- Therefore, the severity requirement is unsound.
The argument is deductive, so evaluating it requires only examining the premises.
Let’s start with (1). Here, again, is Mayo’s severity requirement:
- Data x in test T provide good evidence for inferring H (just) to the extent that hypothesis H has passed a severe test T with x.
And here is her characterization of “severe test:”
- Hypothesis H passes a severe test T with x if (and only if):
- (i) x agrees with or “fits” H (for a suitable notion of fit), and
- (ii) test T would (with very high probability) have produced a result that fits H less well than x does, if H were false or incorrect.
Are these conditions satisfied in the case of a positive test result? Here again is the example. It involves a diagnostic test for a given disease. The incidence of the disease in the relevant population is 1 in 1000. The test yields a dichotomous outcome: positive (+) or negative (-). It gives 0% false negatives and 5% false positives, and we are able to estimate these numbers to a high degree of precision from available data.
A particular patient (chosen at random from the relevant population) gets a positive test result. According to Howson, the hypothesis that that patient has the disease passes a severe test with that result. The positive result does seem to satisfy (i): it agrees or “fit” the hypothesis H that the patient has the disease. Mayo is deliberately vague about the notion of fit, but it’s hard to see how she could deny that the positive test result fits the hypothesis that the patient has the disease without invoking the prior probabilities that her account is deliberately designed to avoid. After all, PH(+), the probability of a positive result in a hypothetical situation in which H is true, is 1.
The positive result also seems to satisfy (ii): the test would have produced a negative result, which fits H less well than the positive result does, with probability .95, if H were false. (If .95 isn’t high enough, let it be higher. Just lower the prior probability appropriately and the counterexample will still work.)
Given that the positive test result satisfies (i) and (ii), Mayo’s account says that it provides good evidence for inferring H.
One way in which Mayo could defend herself would be to claim that she only means “good evidence” in an incremental sense. The fact that a hypothesis passes a severe test does not mean that the hypothesis is belief-worthy; it only means that the hypothesis is more belief-worthy than it was without the test. That claims would be consistent with the Bayesian result that P(H|E)=.02, which is twenty times P(H)=.001. However, Mayo does not take that approach.
I believe that Spanos would deny premise (1) of Howson’s argument, but I have not worked through his argument carefully enough to understand what he is saying. For the moment, Howson appears to be on strong ground with (1).
(2)-(5) all seem to me innocuous. I’m sure one could quibble about the phrase “inference rule,” but I don’t think that doing so would get one anywhere. To deny (4) seems absurd, but I believe that Mayo does so in her paper “Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses.” I’ll consider her reasoning in that paper in a later post.
I worded (6) carefully so as not to make Howson’s argument depend on strong claims about the validity of subjective Bayesianism. All he needs to make his case is that it is appropriate to use Bayes’ theorem in this case to calculate the probability that the patient has the disease given that he got a positive result. It seems to me that frequentist scruples have gone too far when they lead one to deny this claim. We have even supposed that the patient in question was randomly chosen, and we may suppose that we know nothing further about him except that he received a positive test result. Mayo denies (6) by claiming that one commits a “fallacy of instantiating probabilities” in moving from the claim that there is a 2% chance that a randomly chosen person with a positive test result has the disease to the claim that this person, who was randomly chosen and has a positive test result, has a probability of 2% of having the disease. After all, this person either has the disease or doesn’t. I see the response, but it seems to me ill-motivated. Why not instantiate the probability and regard it as epistemic? You would seem to do better that way than by using Mayo’s severity requirement.
I am starting to wonder whether the frequentist responses to this kind of argument are worth the trouble to examine, when the argument itself seems quite clear and quite devastating. That’s what the Bayesians who raised this objection have been thinking, it seems. There are four motivations, however, that make me want to continue with this project. First, I hope that by scrutinizing frequentist responses to the base-rate objection I will come to appreciate better the overall error-statistical framework, which I think has some real insights into scientific practice even if its theory of evidence is fatally flawed. Second, if the frequentist responses are fatally flawed, then there should be a paper in the published literature explaining how they are flawed. Third, I am open to the possibility that there is a strong response to Howson’s argument that I have overlooked or not taken sufficiently seriously. Finally, I need a philosophy comp!
Friday, January 28, 2011
Doing the Oil-Drop Experiment
Wednesday, January 26, 2011
Howson's Argument Against the Severity Requirement
- In the case of a positive test result, the severity requirement yields the conclusion that there is good evidence that the patient has the disease.
- In that same case, Bayes’ theorem yields the conclusion that the probability that the patient has the disease is low.
- The severity requirement and Bayes’ theorem are inference rules.
- The claim that there is good evidence that the patient has the disease is incompatible with the claim that the probability that the patient has the disease is low.
- If two inference rules applied to the same case yield incompatible conclusions, and one of those inference rules is sound in its application to that case, then the other inference rule is unsound.
- Bayes’ theorem is sound in its application to the case of a positive test result.
- Therefore, the severity requirement is unsound.
Error Statistics
Obviously, the severity requirement requires a characterization of the notion of a severe test. Mayo offers the following:
(i) x agrees with or “fits” H (for a suitable notion of fit), and
(ii) test T would (with very high probability) have produced a result that fits H less well than x does, if H were false or incorrect.
It’s easy to give examples in which high severity seems like the right thing to require in order to regard a test outcome as evidence for a hypothesis. For instance, Mayo gives the example of a high school student, Isaac, taking a test of college readiness. If Isaac passed a test that only contained 6th grade-level material, his passing that test would not be good evidence that he is college ready; his passing fits the hypothesis that he is college ready, but it is not the case that he would have been unlikely to pass if he were not college-ready. To get good evidence of his college-readiness, you would have to give him a more severe test, one that contained, say, 12th grade-level material.
I am not sure, however, that I fully understand either of these responses, and I’m not sure that the philosophers who have written articles criticizing the error-statistical account have taken the time to try to understand them either. Perhaps those critics have judged that it simply isn’t worth their time to examine the frequentist responses in detail, since they seem on a cursory examination to be rather confused. If they are right, then examining these responses might be somewhat a waste of my time. If nothing else, however, I think that a clear, cogent analysis of frequentist responses to the base-rate objection from someone who isn’t firmly in either the frequentist or the Bayesian camp would be a useful contribution to the literature on this topic.
Tuesday, January 25, 2011
Truth and Value in Philosophy
Many philosophers (including me) tend to operate under the assumption that a claim has value only if it is true. This assumption is propped up by a vague worry that the only way to deny it would be to adopt an unattractive view about truth, such as nihilism, some kind of relativism, or an obscure post-modernism.
This worry is unwarranted. The problem (or at least a problem) with the assumption that a claim has value only if it is true is not that there is no such thing as truth, or that truth is not absolute, or that truth is not an important desideratum; it is that truth is too high a standard to apply across the board to all intellectual endeavors. Truth requires total precision, an absence of ambiguity, and complete fidelity to the facts. It's difficult to think of any claims outside of logic and mathematics that meet this standard.
One might admit that truth is too high a standard, but deny that it is too high an aspiration. That is: we should always aim for the truth, even if we can never reach it.
I agree that, within a particular inquiry, one should aim to be as faithful to the facts as possible, all else being equal. However--and here's the point that I want to emphasize--one should not place such a high premium on truth that one neglects interesting and important approaches to interesting and important topics on the sole grounds that those approaches to those topics will not yield claims that are true, or even very nearly true.
Consider the account of the nature of science that Kuhn presents in The Structure of Scientific Revolutions. Many critics have pointed out that many aspects of Kuhn's view are false. Take, for instance, the claim (implicitly suggested if not explicitly stated in Kuhn's account) that any sufficiently mature field of science is governed by one dominant paradigm at any given time. It would be easy to multiply examples of seemingly mature fields that contain multiple competing paradigms, and perhaps fields that don't seem to be governed by any very unified paradigms at all.
I don't claim that critics should not compare Kuhn's account of science to actual scientific practice and to point out disparities. What I do claim is that one should not be too quick to move from the claim that Kuhn's account is false in many respects to the claim that Kuhn's account is not useful or valuable for understanding science.
Kuhn gave us is a highly original way to think about science. That framework is imperfect in many ways. If we take it too seriously, it can lead us astray. But by giving us a new way to think about science, it allows us to notice things we would not otherwise have noticed and to draw connections we might not otherwise have drawn. It may be that, even though Kuhn's account is false, we're better off with it than without it.
The same could be said about any number of ambitious projects, particularly in intellectual history. Dewey's The Quest for Certainty and Collingwood's The Idea of Nature are good examples. We philosophers tend not to read such works any more, and we (at least most of us) certainly don't attempt to write them. One reason we shun such large-scale projects, I think, is that we realize that any picture we can paint with such broad strokes will inevitably misrepresent the complex, messy reality of the phenomena we wish to characterize. What we say won't be true. Moreover, counterexample-mongers will advance their careers by pointing out that what we say isn't true.
But maybe that's ok. Maybe what we say doesn't have to be true to be illuminating. Fidelity to the facts matters, but it doesn't matter so much that any significant departure from the facts completely vitiates a project; and it doesn't matter so much that we should only pursue projects for which near-perfect fidelity to the facts is an achievable goal.
Update on the Oil-Drop Apparatus
The plan at this point is to start with the miniaturized apparatus and then to use our experiences with it as the basis for a proposal to work with Millikan's original equipment. At that point, we should be able to show pretty convincingly that we understand the experiment and that we have specific questions about it that working with the original apparatus can help us answer. The biggest question will be whether the good folks at Cal Tech will be willing to let outsiders get their hands on a major piece of physics history.
Monday, January 24, 2011
The Original Millikan Apparatus
Sunday, January 23, 2011
Overview of the Philosophy Comp Project
I would like to write my philosophy comp on a topic in the philosophy of statistics. My leading idea at the moment is to take a close look at some aspect of Deborah Mayo's error-statistical account of statistical inference. The philosophy of statistics (as well as statistics itself) is roughly divided into two camps: frequentist and Bayesian. (I'll ignore likelihoodists for now.) Among philosophers, Mayo is the de facto leader of the frequentist camp. My impression is that the division between the two camps is deep enough that not many people have taken a serious, critical look at Mayo’s work. Those in the frequentist camp already, by and large, agree with her; those in the Bayesian camp can’t be bothered; and people in the philosophy of science who aren’t firmly in either camp, like me, haven’t done much work on the topic.
Bayesian philosophers of science use Bayes’ theorem as the basis for a normative theory of scientific inference. Frequentists objections to this program generally focus on the use of prior probabilities. Frequentists claim that subjective prior probabilities are out of place in science, which they claim ought to be as objective as possible. Some frequentists go further and deny that we can even speak meaningfully about the probability (prior or posterior) of a hypothesis. Orthodox Bayesians, by contrast, interpret all probabilities as subjective degrees of belief, and claim that one cannot draw valid probabilistic inferences without taking prior probabilities into account.
One topic of ongoing debate between the two camps is the claim that frequentist hypothesis testing is subject to the base-rate fallacy. From what I've seen of the literature on this debate, it seems that Bayesians are guilty of a hit-and-run: they've raised the base-rate fallacy objection, but have not made much of an effort to understand and respond to frequentist rejoinders. Bayesians are free, of course, to focus on whatever problems they find most pressing; but someone ought to engage the frequentist position in a more serious way.
In subsequent posts, I will work through some of the central papers in this debate. Here’s my bibliography so far:
- Collin Howson (1997), “Error Probabilities in Error"
- Deborah Mayo (1997), “Error Statistics and Learning from Error: Making a Virtue of Necessity”
- Ronald Giere (1997), “Scientific Inference: Two Points of View”
- Collin Howson (2000), Hume’s Problem
- Peter Achinstein (2001), The Book of Evidence
- -- (2010), “Mill’s Sin’s or Mayo’s Errors?”
- Collin Howson and Peter Urbach (2005), Scientific Reasoning: The Bayesian Approach
- Deborah Mayo (2010) “Sins of the Epistemic Probabilist: Exchanges with Achinstein”
- Aris Spanos (2010), “Is Frequentist Testing Vulnerable to the Base-Rate Fallacy?”
Recreating the Oil-Drop Experiment
The Oil-Drop Experiment
Here's a diagram of Millikan’s oil-drop apparatus. The main part of the apparatus is a big, hollow metal chamber, shown below:
An atomizer (“A” in the diagram) sprays oil drops into the chamber. Originally, Millikan used the atomizer from a perfume bottle (as in the photograph, I’m guessing), but over time he got fancier and starting using a pressure tank and passing the oil through glass wool before it got to the chamber, presumably to eliminate dust.
As the oil drops pass through the atomizer, they pick net electric charges due to friction, the same way that you and I pick up a net electric charge when we shuffle across the floor wearing wool socks. The charge oil drops then fall under the force of gravity until some of them pass through a small opening so that they are between the two metal plates labeled “M” and “N” in the diagram. Those plates are connected to a series of batteries. Those batteries set up an electric field across the plates. One can tweak the direction and magnitude of this electric field so that it exerts an upward force on a given drop that exactly balances the downward gravitational force on that drop. In this way, one can “capture” a drop and hold it suspended.
In an early version of the oil-drop experiment (actually, at that point it was a water-drop experiment), Millikan would simply balance the electric and gravitational forces on a drop until the drop became suspended, and then calculate the electric charge that the drop would have to have for the two forces to balance, based on an estimate of its mass. That procedure was a good start, but it wasn’t terribly precise and it didn’t allow for direct measurements of changes in charge on a single droplet, which would provide the most convincing evidence that electric charge comes in discrete multiples of a fundamental unit.
Millikan switched from water drops to oil drops because the latter evaporate much more slowly, allowing him to make a long series of observations on a single drop during which the mass of the drop remained essentially unchanged. After he had captured a drop, he would switch off the electric field and record the time it took for the drop to fall under an electric field alone. He would then switch on the electric field, with its magnitude adjusted so that it exerted an upward force of the drop that was stronger than the downward force of gravity. He would then record the time it took the drop to rise the same distance as before. From this data, he could calculate the charge on the drop.
Occasionally, during a series of observations an oil drop would pick up a positive or negative ion from the air around it, changing its charge. This change would manifest itself as a change in the speed of the drop during the rising phase of its movement. Millikan used a source of ionizing radiation to make these illuminating events more common. Sometimes a drop would pick up a charge in mid-rise, so that its speed changed suddenly and discretely while Millikan was watching it. Millikan says that it is particularly striking to see a rising drop suddenly stop moving—a phenomenon that the electron theory can explain easily as being due to a drop with one unit of charge picking up a unit charge of the opposite sign. One reason to re-do the experiment is to see these changes in the speed of a drop during its rising phase. Are they really as striking as Millikan says, or is he dramatizing the experiment for rhetorical effect?
Previous experiments had suggested that the fundamental unit of charge, if there is one, has a charge in the vicinity 3 x 10^-10 e.s.u. Thus, if there is such a fundamental unit, Millikan should find two things:
- First, all of his calculations of the total charge on a drop should yield values that are (approximately) integral multiples of a single number on the order of 10^-10 e.s.u.
- Second, the calculated changes in charge should occur in small integral multiples of the same number.
For the data Millikan reported, these conditions were satisfied beautifully. A few of his runs were anomalous, however, as I will explain in future posts. Millikan calculated a value of 4.774 +/- 0.009 x 10^-10 e.s.u. for the elementary electric charge, which is slightly smaller than a recent value of 4.80320420 +/- 0.00000019 e.s.u. (The discrepancy is due primarily to the fact that Millikan used an incorrect value for the viscosity of air.)
Next post: How we will re-create Millikan's experiment.
Overview of the History Comp Project
’ve worried more about the history comp than I have about the philosophy comp, because I’m primarily a philosopher by disposition and by training. However, I think I have a pretty good start on this one.
My history comp project is a piece of experimental HPS, which involves revisiting past experiments in order to address historical, philosophical, and scientific questions. I am working with Paolo Palmieri, a professor in my department, who has been a pioneer in this field. Most of his experimental HPS work has been on Galileo’s pendulum and inclined-plane experiments. There have been debates among Galileo scholars about whether Galileo actually performed certain of the experiments he reports, or whether they were merely thought experiments. Or perhaps they were somewhere in between: maybe he rolled balls down inclined planes, but he didn’t measure the times of fall precisely enough really to demonstrate the results he claims. Historical reconstructions of Galileo’s experiments can contribute to this debate by, for instance, providing evidence that he could or could not have gotten certain kinds of results using the materials and methods that were available to him.
I am planning to take an experimental HPS approach to Robert Millikan’s oil-drop experiments. These experiments—performed primarily between 1909 and 1912—convinced nearly all physicists that electricity comes in discrete units and provided the most precise measurement of the elementary unit of charge that had ever been performed.
Millikan published his first definitive paper with results from the oil drop technique in 1913. In 1923, he won the Nobel Prize for these experiments and his work on the photoelectric effect. One reason it took ten years for him to win the Nobel was that his work was challenged by Felix Ehrenhaft, a Viennese physicist who claimed to find evidence in experiments similar to Millikan’s for “subelectrons,” units of charge smaller than Millikan’s supposedly fundamental units. Millikan argued that Ehrenhaft was generating artifacts by using bits of metal and other materials that, Millikan claimed, could not have spherical, unlike Millikan’s oil drops. Ehrenhaft argued that his materials were in fact spherical, and that Millikan begged the question in favor of the existence of a fundamental unit of charge in the way he analyzed his data. Millikan published his last, rather cranky paper on this debate in 1925, while Ehrenhaft continued to argue his case into the 1940s. By that time the mainstream physics community had accepted Millikan’s results and rejected Ehrenhaft. Ehrenhaft did not help his cause by claiming to have created other anomalous effects, such as magnetic monopoles and magnetolysis.
Although Millikan won the day, questions remain about what Ehrenhaft was observing that made him think that he had found subelectrons and about why the physics community chose to accept Millikan’s results and reject Ehrenhaft’s. These questions became more complicated and more pressing in the 1960s, when historian Gerald Holton reviewed Millikan’s laboratory notebooks from the experiments that made it into his 1913 paper. He found that Millikan selectively excluded some of his experimental results from publication. Holton argues that Millikan was just doing what all good experimentalists do—making reasoned judgments about which results were reliable and which were unreliable. However, the results he threw out included a small number of runs that seemed fine except for the fact that they would have seemed to support Ehrenhaft’s subelectron hypothesis. Moreover, Millikan did not just fail to report that he discarded anomalous results; he says explicitly says that he did not do so: “It is to be remarked, too, that this is not a selected group of drops but represents all of the drops experimented upon during 60 consecutive days” (Millikan 1913, 138). It is hard to interpret these words in a way that doesn’t make Millikan out to be a liar. Daniel Goodstein has tried, but I’ve yet to understand his explanation.
There is a fairly large literature on the Millikan-Ehrenhaft dispute and on the issue of Millikan’s alleged fraud. I aim to contribute to this literature with insights derived from working with a scaled-down version of Millikan’s apparatus. (I also have high-quality scans of Millikan’s laboratory notebooks that I have begun to examine.) It is hard to know in advance what these insights will be, but I have a few guesses. First, I want to compare the phenomenology of the experiment to Millikan’s remarks on it. Millikan speaks as if the experiment essentially speaks for itself. “One who has seen this experiment,” he says, “has SEEN the electron.” I’ll be interested to find out whether doing the experiment feels like seeing the electron. I’ll explain in a forthcoming post what the experiment involves and why I think it might feel this way. If it does, then it would be a plausible conjecture that the experience of doing the experiment played a major role in convincing Millikan that the theory of a fundamental unit of charge was correct—perhaps to the point that he saw the data as a tool for displaying the correctness of the theory, rather than as the primary epistemic basis on which the theory should rest. I think there is some textual support for this conjecture, but it would be greatly strengthened if the experience of doing the experiment turns out to be rather striking. If the experience of doing the experiment bears out this conjecture, then it would, I think, give some grounds for being a bit more lenient regarding Millikan’s data manipulation than we might otherwise be. Why report data that seemed to speak against the electron hypothesis when he had SEEN the electron, and thus knew that it was real?
Second, it would be quite interesting if we could reproduce anomalous observations akin to some of those the results from which Millikan discarded. For instance, Millikan says at a couple points in the notebook that a drop “flickers” as though “unsymmetrical in shape.” (He threw out the results from those drops.) What do such drops look like? Is the flickering obvious? Do the flickering drops seem different from the oil drops in other ways? Can we make any reasonable conjectures about what the flickering drops are, e.g. specks of dust or multiple drops stuck together? It would be even more interesting, if we could do it, to run the experiment using some of the materials Ehrenhaft used. Would we get the same anomalous results? Is there any way we could use modern tools to investigate whether these materials were in fact spherical or not? Could we use computer simulations to investigate how departures from sphericality should affect the results?
In the long run, it would be fantastic to do a plausible re-creation of some of Ehrenhaft’s experiments. Trying to explain Ehrenhaft’s results is the kind of project that one would like HPS scholars to pursue—a side-trail in the history of science that’s quite interesting and unresolved, but that working scientists don’t have time for. However, the practical obstacles are daunting. Ehrenhaft’s apparatus was more complicated than Millikan’s and, as I’ll explain in the next post, we’re not even doing a full-fledged re-creation of Millikan’s. I’ll see how this relatively easy Millikan re-creation goes, and then perhaps think about a more ambitious project on Ehrenhaft.
Hey look, I’m already at over 1,000 words—1/7 of the way to my history comp! I’ll be done in a week. Or not.