Saturday, January 29, 2011

Mayo’s Reasons for Rejecting Premise 4 of Howson’s Argument

I realize now that I misunderstood Mayo’s use of J in place of ~H.  ~H says that breast cancer is absent, whereas J says that breast disease (inclusive of breast cancer) is absent.  The idea seems to be that a non-cancerous breast disease is likely to trigger a false-positive result in a test for breast cancer, and that this possibility makes it the case that a positive test result does not pass H severely.

This point does not affect the upshot of my analysis.  Howson can simply stipulate a hypothetical case in which there is no state corresponding to J in which a false positive test is likely.  That is enough to show that the severity requirement is unsound in principle.

Moreover, Mayo grants that ~J (which says that breast disease is present) does pass a severe test despite having (we can assume) a low posterior probability.  Thus, she allows that a hypothesis can meet the severity requirement despite having a low posterior, effectively granting premise 2 of Howson’s argument, and turns her attention to premise 4.

Here is Mayo’s reconstruction of Howson’s argument modified to reflect the fact that Mayo denies that H passes a severe test but allows that ~J does so:

  1. An abnormal result is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast disease exists
  2. ~J passes a severe test and thus ~J is indicated according to (*). (Modified)
  3.  But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of ~J given e is still very low (and that of ~H is still very high).  (Modified)
  4. Therefore, “intuitively,” ~J is not indicated but rather ~H is.  (Modified)
  5. Therefore, (*) is unsound.
Mayo’s argument against premise 4 is interesting, but an orthodox Bayesian has an easy response.  She points out that both error-statistical and Bayesian tests involve probabilistic calculations that are themselves deductive.  The error-statistical framework only becomes ampliative with the introduction of the severity requirement (*), which goes beyond those calculations to make an assertion about which claims are well supported by tests.  She demands that (*) be compared not against the deductive probabilistic calculations that Bayesians perform, but against a truly ampliative Bayesian rule.  What she is demanding, in effect, is a rule of detachment, which tells a Bayesian when to infer from a statement of the form “the probability of H is p” to the statement “H.”

A Bayesian has at least two possible responses to this maneuver.  First, it is not clear that Bayesian updating is a deductive method of inference.  It uses a rule—Bayes’ theorem—that follows from the axioms of probability, but those axioms are not dictated by classical logic, nor is the normative claim that the right way to update one’s degree of belief in H when one has an experience the only direct epistemic import of which is to change one’s degree of belief in E to 1 is by conditioning on E, for all propositions H and E.  Second, Mayo has not given any reason why Bayesians should adopt a rule of detachment, rather than being strict probabilists.  The demand that orange-selling Bayesians provide an apple to compare with her apple would be unfair if part of the Bayesian position were that oranges can do everything apples can do at least as well apples do it.  (A Bayesian can still approximate high-probability beliefs as full beliefs as a useful heuristic when doing so is not likely to lead to trouble.)

Having demanded (unfairly) that Bayesians adopt a rule of detachment, Mayo ascribes to Howson the following implicit rule:

·        There is a good indication or strong evidence for the correctness of hypothesis H just to the extent that it has a high posterior probability.

She then turns Howson’s example against him, ascribing to him this rule of detachment.  She notes that for a woman in her forties, the posterior probability of breast cancer given an abnormal mammogram is about 2.5%, which makes it very close to Howson’s hypothetical example.  Under an error-statistics approach, the hypothesis that such a woman does not have breast cancer does not pass a severe test with a positive result; nor does the hypothesis that such a woman does have breast cancer.  To provide strong evidence one way or the other, follow-up tests are needed.  Under a Bayesian approach with a rule of detachment, the fact that the posterior probability that the woman has breast cancer is small provides a good indication that breast cancer is absent, “so the follow-up that discovered these cancers would not have been warranted.”

This argument is grossly unfair to the Bayesian position.  For an orthodox Bayesian, whether or not follow-up tests are warranted (and whether or not the initial test was warranted) for a given individual depends on that individuals expected utilities.  Rounding down probability 2.5% to 0% in an expected utility calculation is likely to lead to errors when the utility of the unlikely event is extremely high or extremely low, as in this case.  This fact speaks not against Bayesianism, but against simple rules of detachment.

In summary, Mayo has shown that error statistics is sometimes more sensible than Bayesianism with a simple-minded rule of detachment.  But a sensible Bayesian would not use such a rule of detachment, so this conclusion has no force against Bayesianism.

Mayo’s Reasons for Rejecting Premise 2 of Howson’s Argument

In my previous post, I presented Mayo’s reconstruction of Howson’s argument against error statistics:
  1. An abnormal result is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast disease exists.
  2. H passes a severe test and thus H is indicated according to (*).
  3. But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of H given e is still very low (and that of J is still very high).
  4. Therefore, “intuitively,” H is not indicated but rather J is.
  5. Therefore (*) is unsound.
In this post, I will examine the reasons Mayo gives for rejecting premise 2.

Again, in the paper I am presently considering* Mayo expresses her severity requirement as follows:

  • (*): e is a good indication of H to the extent that H has passed a severe test with e.
where a test is “severe” with respect to H if and only if that test has a very low probability of passing H if H is false.

Howson gives an example of a medical test in which the hypothesis that a given patient has the disease in question (which in Mayo’s version of the example is breast cancer) appears to pass a severe test with a positive result, yet the posterior probability of the hypothesis that the patient has the disease conditional on the positive result is low.  He takes this case to be a counterexample which shows that Mayo’s severity requirement is unsound.

Mayo responds in part by denying that the severity requirement has been met in this case.  That is, she rejects premise 2 in her reconstruction of Howson’s.  What reasons does she give for doing so?

First, after protesting a bit about the idealized nature of Howson’s example (which seems to me irrelevant to the point at issue), Mayo says that she will try to apply her severity requirement to it.  She does so as follows (p. S208):
a.       An abnormal result e is a poor indication of the presence of disease more extensive than d if such an abnormal result is probable even with the presence of disease no more extensive than d.
b.      An abnormal result e is a good indication of the presence of  disease as extensive as d if it is very improbable that such an abnormal result would have occurred if a lesser extent of disease were present.

Mayo has in mind a more realistic case than Howson’s, in which a disease can be present to varying extents.  However, if her account aspires to provide a general theory of evidence, then it should apply to binary cases as well.  Thus, in the context of this debate it seems unfair of Mayo to change the example.  Sticking with Howson’s actual example, Mayo’s (a) and (b) reduce to the claim that a positive test result indicates the presence of disease to the extent that a positive result is improbable if the disease is absent.

At times, Mayo seems to be claiming something weaker for her severity requirement than that it provides a general theory of evidence—for instance, that it provides a reasonable guide for inductive inference when we do not have a strong evidential basis for assigning prior probabilities.  Moreover, she claims that this kind of situation is very common in science, which makes understanding her severity requirement and the error-statistical techniques that conform to it quite important for understanding scientific practice.  It seems to me that Mayo is on firm ground here.  Moreover, I suspect that there is a lot of room here for reconciling her approach with Bayesianism by showing, for example, that frequentist techniques provide reasonably good approximations to Bayesian methods when priors are not known with precision but are known to be not too extreme. 

However, Mayo goes further by presenting her account as a rival to Bayesianism, and by arguing not only that Bayesian techniques are hard to apply in many cases, but also that they are vitiated by their frequent dependencies on epistemic probabilities.  As Clark Glymour points out in his paper “Instrumental Probability,” the claim that Bayesianism and the use of epistemic probabilities are “too subjective” is often motivated by confusing justification with content.  An epistemic probability is “subjective” in the sense that it is a property of an individual’s (idealized) belief state (content), but it may nevertheless have a strong “objective” evidential basis (justification).  When the “objective” justification for an epistemic probability is strong, I see no reason to object to it and the grounds that its content is subjective.

Returning to Howson’s example, it certainly appears that a positive result in Howson’s example satisfies Mayo’s severity requirement, given that a positive result is quite improbable if the disease is absent (P~H(+)=.05 in Howson’s example, but that number can be made as small as one likes as long as the incidence rate/prior probability is adjusted downward to compensate).  But Mayo denies that a positive result satisfies the severity requirement.

In support of this claim, Mayo points out that, unlike, the Neyman-Pearson framework for statistical tests, error statistics does not use automatic accept/reject rules; for instance, within the error-statistical framework one generally would not infer that a point null hypothesis is true from the fact that one fails to reject that null hypothesis at a pre-specified alpha level.  The reason for this restraint is clear within the error-statistical framework: significance tests are unlikely to reject the null if the true value of parameter of interest is close to the null value relative to the power of the test.  As a result, one cannot infer with severity from a failure to reject a point null hypothesis that that null hypothesis is true; one can at most infer with severity that the true value of the parameter is close to the null.  (For instance, one might estimate a (1-alpha)% confidence interval for the parameter value, which would contain the null value.)

This move of Mayo’s seems to me a significant improvement in the Neyman-Pearson framework.  However, it does not help in the case at hand, in which we are not considering a point null hypothesis about a continuous variable, but rather a hypothesis about a binary variable.  On the other hand, understanding this aspect of Mayo’s account does help in understanding Mayo’s next move: Mayo claims that a failure to reject the hypothesis that the patient has breast cancer given a positive test result does not indicate that the patient has breast cancer so long as there are alternatives to this hypothesis that would very often produce the positive result.

Notice the analogy with a test of a hypothesized value for a parameter, which is presumably motivating Mayo’s claim here: one generally cannot conclude with severity that a point null hypothesis is true, because slight deviations from that point null are effectively indistinguishable from the null in a hypothesis test.  In the same way, one cannot conclude with severity that a patient has breast cancer if there are other possible situations that would make a positive test outcome likely.

This requirement is surely too strong.  Suppose that there is a non-diseased condition that mimics whatever sign or symptom of breast cancer the test picks up, generating false positives, but these condition is extremely rare—as rare as one likes.  Then there would be an alternative to the hypothesis that the patient in question has breast cancer that would very often produce a positive result, but that there is no great need to investigate in concluding that the patient has breast cancer.  Obviously one needs to rule out all possible alternatives to make a deductive inference, but one does not need to rule out incredibly rare/improbable alternatives to make a solid inductive inference.

There is a sound motivation behind Mayo’s claim that failure to reject H with a particular result does not indicate that H is true as long as there are alternatives to H that would make that result probable: in order to be telling in favor of a hypothesis, evidence must not only agree with that hypothesis but also speak against alternative hypotheses.  However, the requirement goes too far.  It seems that the sensible approach is to bring in prior probabilities and to require that any alternative hypotheses that would make the test outcome probable be themselves sufficiently improbable that the probability that any one of them is true is very small.  Bayesianism implements this approach in a precise and well-motivated way, but in situations in which Bayes’ theorem is difficult to apply one could combine informal considerations of prior probability with the severity requirement to approximate Bayesian reasoning fairly well.

Getting back to the main point, does Mayo have a good argument against premise 2?  I think not.  In a realistic case, there could be many ways in which the hypothesis that a given person has a given disease could be false, some of which might make it probable that that the person would test positive for the disease despite not having it.  Mayo would require ruling out such possibilities before declaring that the claim that the person has the disease is well supported.  As long as we're considering her account as a theory of evidence, however, we need not be constrained by what would happen in most realistic cases.  We can simply stipulate a hypothetical case in which the probability that someone who does not have the disease gets a positive result is .95, and that there are no further facts that would allow us to partition the set of people who do not have the disease into some who would get a positive result with high probability and some who would not.  There are certainly many realistic cases in which we do not know any such facts, even if they exist, so this case in not so far removed from practice to be wholly uninteresting.  In such a case, I do not see how Mayo's objections have any force against premise 2.

*The article I am considering is Deborah  Mayo’s 1997 “Error Statistics and Learning from Error: Making a Virtue of Necessity.”  It appeared in Philosophy of Science  Vol. 64, Supplement.  Proceedings of the 1996 Biennial Meeting of the Philosophy of Science Association.  Part II: Symposia Papers (Dec., 1997), pp. S195-212.  It is a response to Colin Howson’s “Error Statistics in Error,” pp. S185-S194 in the same issue.

Mayo's Reconstruction of Howson's Argument

Mayo’s response to Howson* has many threads.  I will focus on the passages in which she responds most directly to Howson’s challenge, beginning with Mayo’s reconstruction of Howson’s argument.  Mayo makes Howson's example more concrete by specifying that the disease being tested for is breast cancer.
  • e: An “abnormal result,” in this case a positive test result (i.e., one that provides at least incremental evidence of breast cancer).
  • H: The hypothesis that the patient has breast cancer.
  • J: The hypothesis that breast disease is absent
  • (*): The claim that e is a good indication of H to the extent that H has passed a severe test with e.  (Mayo’s “severity requirement”)

Why does Mayo use J for the hypothesis that the disease is absent instead of ~H?  Well, she points out that “not-H” is a disjunction of many different hypotheses and claims that in order to calculate error probabilities, we need to consider specific alternatives that fall under the heading “not-H.”  It is certainly true that in realistic cases involving diseases ~H could be expressed as a disjunction of more specific alternatives.  In general, however, those alternatives could themselves be expressed as disjunctions of even more specific alternatives ad infinitum.  Thus, simply the fact that ~H could be broken down further does not entail that ~H cannot supply error probabilities.  In a hypothetical example such as Howson’s, there is no reason why one cannot simply stipulate that error probabilities for ~H are available.  Moreover, the hypothesis J, that the disease is absent, is no more specific than the negation of H.  It just is the negation of H.  As a result, Mayo’s use of J here seems to me a needless complication.  Nevertheless, I will follow her notation to lessen the risk of misrepresenting her argument.


UPDATE: I misunderstood Mayo here.  J indicates the absence of breast disease of any kind, not just breast cancer, so it is not equivalent to ~H.  See subsequent posts.

With this notation in place, Mayo reconstructs Howson’s argument as follows: 
  1. An abnormal result e is taken as failing to reject H (i.e., as “accepting H”); while rejecting J, that no breast cancer exists.
  2. H passes a severe test and thus H is indicated according to (*).
  3. But the disease is so rare in the population (from which the patient was randomly sampled) that the posterior probability of H given e is still very low (and that of J is still very high).
  4. Therefore, “intuitively,” H is not indicated but rather J is.
  5. Therefore (*) is unsound.

This reconstruction illustrates a common practice in philosophy that I find annoying of presenting in a non-deductive format an argument that could easily and without distortion be presented in a deductive format.  I do not believe that all arguments should be understood deductively (Musgrave-style deductivism), or that all philosophical projects should have at their core a list of premises followed by a conclusion that follows from those premises either deductively or inductively (although I tend to be drawn toward such projects personally).  I am simply saying that when one is either advancing or criticizing an argument that can be put into a deductive format without distortion, it is a salutary practice to present the argument in such a format.  Doing so helps to clarify exactly what conditions must be met for the conclusion to go through.  But I digress.  I think Mayo's reconstruction is adequate for what follows, but I'll keep an eye on how her response fares relative to my own deductive reconstruction.

Mayo rejects premise 2, which she says is a misapplication of her severity requirement (*).  This premise corresponds well to premise 1 from my deductive reconstruction of Howson's argument.  Mayo also rejects premise 4 of her reconstruction, which “assumes the intuitions of Howson’s Bayesian rule.”  This premise corresponds most closely to premise 6 in my reconstruction, although elements of my premises 4 and 5 may be implicated as well.  

Mayo begs the question somewhat in her premise 4 by implying that Bayesianism rests on a bare appeal to intuition—Bayesians could point to a variety of sophisticated arguments that one ought to have degrees of belief that accord with the probability calculus and update those degrees of belief according to Bayes’ theorem.  Whether those arguments succeed or not is a separate question, of course, but they do deserve to be taken seriously.

In my next post, I plan to consider Mayo’s grounds for rejecting premises 2 and 4 of her reconstruction.

*The article I am considering is Deborah  Mayo’s 1997 “Error Statistics and Learning from Error: Making a Virtue of Necessity.”  It appeared in Philosophy of Science  Vol. 64, Supplement.  Proceedings of the 1996 Biennial Meeting of the Philosophy of Science Association.  Part II: Symposia Papers (Dec., 1997), pp. S195-212.  It is a response to Colin Howson’s “Error Statistics in Error,” pp. S185-S194 in the same issue.

The Premises of Howson's Argument

In my last post about the philosophy comp, I summarized Howson’s argument against the severity requirement as follows:

  1. In the case of a positive test result, the severity requirement yields the conclusion that there is good evidence that the patient has the disease.
  2. In that same case, Bayes’ theorem yields the conclusion that the probability that the patient has the disease is low.
  3. The severity requirement and Bayes’ theorem are inference rules.
  4. The claim that there is good evidence that the patient has the disease is incompatible with the claim that the probability that the patient has the disease is low.
  5. If two inference rules applied to the same case yield incompatible conclusions, and one of those inference rules is sound in its application to that case, then the other inference rule is unsound.
  6. Bayes’ theorem is sound in its application to the case of a positive test result.
  7. Therefore, the severity requirement is unsound.

The argument is deductive, so evaluating it requires only examining the premises.

Let’s start with (1). Here, again, is Mayo’s severity requirement:

  • Data x in test T provide good evidence for inferring H (just) to the extent that hypothesis H has passed a severe test T with x.

And here is her characterization of “severe test:”

  • Hypothesis H passes a severe test T with x if (and only if):
    • (i) x agrees with or “fits” H (for a suitable notion of fit), and
    • (ii) test T would (with very high probability) have produced a result that fits H less well than x does, if H were false or incorrect.

Are these conditions satisfied in the case of a positive test result? Here again is the example. It involves a diagnostic test for a given disease. The incidence of the disease in the relevant population is 1 in 1000. The test yields a dichotomous outcome: positive (+) or negative (-). It gives 0% false negatives and 5% false positives, and we are able to estimate these numbers to a high degree of precision from available data.

A particular patient (chosen at random from the relevant population) gets a positive test result. According to Howson, the hypothesis that that patient has the disease passes a severe test with that result. The positive result does seem to satisfy (i): it agrees or “fit” the hypothesis H that the patient has the disease. Mayo is deliberately vague about the notion of fit, but it’s hard to see how she could deny that the positive test result fits the hypothesis that the patient has the disease without invoking the prior probabilities that her account is deliberately designed to avoid. After all, PH(+), the probability of a positive result in a hypothetical situation in which H is true, is 1.

The positive result also seems to satisfy (ii): the test would have produced a negative result, which fits H less well than the positive result does, with probability .95, if H were false. (If .95 isn’t high enough, let it be higher. Just lower the prior probability appropriately and the counterexample will still work.)

Given that the positive test result satisfies (i) and (ii), Mayo’s account says that it provides good evidence for inferring H.

One way in which Mayo could defend herself would be to claim that she only means “good evidence” in an incremental sense. The fact that a hypothesis passes a severe test does not mean that the hypothesis is belief-worthy; it only means that the hypothesis is more belief-worthy than it was without the test. That claims would be consistent with the Bayesian result that P(H|E)=.02, which is twenty times P(H)=.001. However, Mayo does not take that approach.

I believe that Spanos would deny premise (1) of Howson’s argument, but I have not worked through his argument carefully enough to understand what he is saying. For the moment, Howson appears to be on strong ground with (1).

(2)-(5) all seem to me innocuous. I’m sure one could quibble about the phrase “inference rule,” but I don’t think that doing so would get one anywhere. To deny (4) seems absurd, but I believe that Mayo does so in her paper “Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses.” I’ll consider her reasoning in that paper in a later post.

I worded (6) carefully so as not to make Howson’s argument depend on strong claims about the validity of subjective Bayesianism. All he needs to make his case is that it is appropriate to use Bayes’ theorem in this case to calculate the probability that the patient has the disease given that he got a positive result. It seems to me that frequentist scruples have gone too far when they lead one to deny this claim. We have even supposed that the patient in question was randomly chosen, and we may suppose that we know nothing further about him except that he received a positive test result. Mayo denies (6) by claiming that one commits a “fallacy of instantiating probabilities” in moving from the claim that there is a 2% chance that a randomly chosen person with a positive test result has the disease to the claim that this person, who was randomly chosen and has a positive test result, has a probability of 2% of having the disease. After all, this person either has the disease or doesn’t. I see the response, but it seems to me ill-motivated. Why not instantiate the probability and regard it as epistemic? You would seem to do better that way than by using Mayo’s severity requirement.

I am starting to wonder whether the frequentist responses to this kind of argument are worth the trouble to examine, when the argument itself seems quite clear and quite devastating. That’s what the Bayesians who raised this objection have been thinking, it seems. There are four motivations, however, that make me want to continue with this project. First, I hope that by scrutinizing frequentist responses to the base-rate objection I will come to appreciate better the overall error-statistical framework, which I think has some real insights into scientific practice even if its theory of evidence is fatally flawed. Second, if the frequentist responses are fatally flawed, then there should be a paper in the published literature explaining how they are flawed. Third, I am open to the possibility that there is a strong response to Howson’s argument that I have overlooked or not taken sufficiently seriously. Finally, I need a philosophy comp!

Friday, January 28, 2011

Doing the Oil-Drop Experiment

Paolo and I picked up the PASCO oil-drop apparatus today, and I had a chance to give it a try.  Setting it up was pretty easy, and on my first try I was able to get some oil drops and to watch their velocities change as I changed the voltage.  I didn't get the beautiful shower of blue drops that appears in this video.  What I saw looked more like this video, except that my oil drops generally didn’t zip around as fast:



Before long, I was able to figure out how to use the ionizing radiation source to change the charges on the oil drops.  I had forgotten that the drops generally don't pick up ions while the electric field is on, because the electric field causes the ions to disperse rapidly.  It was easy, though, to get the drops to pick up ions when the field was off.  Several times I picked a drop and turned on the field and watched that drop move up; then turned off the field and let the drop fall for a few seconds; then turned back on the field and found that it either had no effect on the droplet or had the opposite effect of causing the droplet to fall more rapidly.  These changes are quite clear and easy to produce.

I did take one (very sloppy) measurement just to see how it would go and to practice doing the calculations to find the charge on a drop and got something on the right order of magnitude (2.94 * 10-19 C, whereas a recent value for the charge on a single electron is 1.60 * 10-19 C.  I hope I was measuring a drop with two net charges, but with the sloppiness of my procedure there’s really no telling).

One thing that has me puzzled right now is that in some of the pages of his notebook Millikan records "Red Drop" or "Blue Drop."  Here's an example:



Similarly Harvey Fletcher, a graduate student who worked with Millikan on the oil-drop experiment, reported seeing "little starlets, having all colors of the rainbow."  Why did they see a variety of colors, while I only say orangish white?  It seems especially strange to me that color is apparently a consistent characteristic of individual drops, which indicates that it isn’t due to prismatic effects and that it isn’t a simple matter of different colored lighting sources.  Evidently Millikan decided that drop color is not important for his purposes, because I don’t believe that he mentioned it in any of his papers.  It’s probably not an important issue, but it is a bit of a puzzle.

Wednesday, January 26, 2011

Howson's Argument Against the Severity Requirement

To my knowledge, Colin Howson was the first person to raise in print the objection that Deborah Mayo’s error-statistical theory of evidence is subject to the base-rate fallacy.  (He is far from the first to have raised the point with regard to frequentist hypothesis testing in general.)  Howson presented his case in a PSA symposium paper that was published in 1997.  Mayo and Ronald Giere also presented papers at this symposium.  I’ll start my attempts to make sense of this debate by examining Howson’s paper.

Howson presents what he takes to be a counterexample to Mayo’s account: a test in which he takes a particular result to satisfy Mayo’s requirements to count as good evidence for H, such that if you infer the correctness of H from that result then you will draw the wrong conclusion nearly every time that result occurs.  Howson’s example involves a diagnostic test for a given disease.  Let H be the hypothesis that the disease is present in a particular, randomly chosen test subject.  Suppose that the test yields a dichotomous outcome: positive (+) or negative (-).  In addition, suppose that the test yields 0% false negatives, and that we have adequate data to estimate that it has 0% false negatives (to within a finite margin of error that one can take to be as small as one likes).

A Bayesian would generally express the stipulation that the test yields 0% false negatives with a conditional probability: P(-|H)=0 (which entails P(+|H)=1, because + and - outcomes are mutually exclusive and exhaustive outcomes).  However, Mayo objects to conditioning on hypotheses because in the most standard axiomatizations of the probability calculus, a conditional probability is defined as the probability of the conjunction of the item in question and the item conditioned on divided by the probability of the item conditioned on: for instance, P(+|H) =df P(+&H)/P(H).  Mayo denies that it is appropriate to apply speak of probability for anything other than physical chances.  The probability of H in this case is not a physical chance.  The physical chance that the patient in question has the disease in question is either 0 or 1: she either has it or she doesn’t.  Intermediate values for P(H) represent not a physical chance, but a degree of belief. 

Howson anticipates this objection and avoids it by using an alternative notation to signify that he is not speaking of conditional probabilities, but of probabilities (which one could think of as physical chances) that would obtain in a hypothetical situation.  For instance, he uses PH(+) (rather than P(+|H)) to signify the probability of a positive result for a patient who does in fact have the disease.

Thus, Howson expresses the fact that the test yields 0% false negatives using the notation PH(-)=0 (which is equivalent to PH(+)=1).  He supposes also that the test has a false positive rate of 5%, i.e. P~H(+)=.05, and that again we have sufficient data to estimate this value with as much precision as one likes.  In addition, Howson supposes that the disease has a small incidence—say 1 in 1000—in the population from which the patient in question has been drawn.

In this case, Howson claims, Mayo’s error-statistical theory yields the conclusion that a positive test result provides good evidence that the patient in question has the disease in question.  On the other hand, a Bayesian calculation using the incidence rate of the disease as the prior probability shows that P(H|+) is quite small—less than 2%.  Howson regards Bayesianism as providing a normatively correct logic of probability judgments, so he concludes that the fact that Mayo’s theory endorses a conclusion that a Bayesian calculation assigns a posterior probability of only 2% indicates that Mayo’s theory provides an unsound inference rule.

In summary, Howson’s argument has the following structure:
  1. In the case of a positive test result, the severity requirement yields the conclusion that there is good evidence that the patient has the disease.
  2.  In that same case, Bayes’ theorem yields the conclusion that the probability that the patient has the disease is low.
  3. The severity requirement and Bayes’ theorem are inference rules.
  4. The claim that there is good evidence that the patient has the disease is incompatible with the claim that the probability that the patient has the disease is low.
  5. If two inference rules applied to the same case yield incompatible conclusions, and one of those inference rules is sound in its application to that case, then the other inference rule is unsound.
  6. Bayes’ theorem is sound in its application to the case of a positive test result.
  7. Therefore, the severity requirement is unsound.

(7) follows from (1)-(6), so a defender of the severity requirement as a sound inference rule faces a challenge to find a flaw in one or more of (1)-(7).  If what I wrote in a previous post is correct, then Spanos argues against (1), while Mayo argues against (4).  I’ll start working through the premises of Howson's argument in my next post.

Error Statistics

Deborah Mayo, the most well-known defender of frequentist statistical methods among philosophers, has developed a theory of evidence based on frequentist methods that she calls error statistics.  I find many of the things Mayo says about the use of statistics in science appealing.  For instance, I am in sympathy with the claim which she often stresses that science is not well understood in terms of assigning probabilities to hypotheses, especially to high-level theories.  It is typically better understood in terms of models that represent the world more or less accurately in various respects and attempts to probe those models for errors and alter or replace them to eliminate those errors.  On the other hand, many features of her theory of evidence strike me as obscure and ill-motivated.*

The central claim of Mayo's error statistical theory of evidence is the severity requirement:

Data x in test T provide good evidence for inferring H (just) to the extent that hypothesis H has passed a severe test T with x.

Mayo does not say whether the notion of evidence she has in mind in this principle is absolute or incremental.  That is, she doesn’t say whether the fact that a hypothesis has passed a severe test makes it (1) belief-worthy, or (2) more belief-worthy (than it was before the test, or than it would have been if the test hadn’t been performed, or than it would have been if it hasn’t passed the test, or something along those lines).  This ambiguity creates many difficulties in Mayo’s discussion and defense of her account.


Obviously, the severity requirement requires a characterization of the notion of a severe test.  Mayo offers the following:

Hypothesis H passes a severe test T with x if (and only if):
(i)  x agrees with or “fits” H (for a suitable notion of fit), and
(ii)  test T would (with very high probability) have produced a result that fits H less well than x does, if H were false or incorrect.

Mayo is deliberately vague about (i), allowing for the notion of “fit” to be cashed out in a variety of ways.  (ii) is the key to her theory: whatever suitable notion of fit is used, in order for the result of a test to count in support of a theory, it has to be the case that a result that fits at least as well with the hypothesis as that result would be highly unlikely if the hypothesis were false.  She calls (ii) a requirement of “high severity.”


It’s easy to give examples in which high severity seems like the right thing to require in order to regard a test outcome as evidence for a hypothesis.  For instance, Mayo gives the example of a high school student, Isaac, taking a test of college readiness.  If Isaac passed a test that only contained 6th grade-level material, his passing that test would not be good evidence that he is college ready; his passing fits the hypothesis that he is college ready, but it is not the case that he would have been unlikely to pass if he were not college-ready.  To get good evidence of his college-readiness, you would have to give him a more severe test, one that contained, say, 12th grade-level material.

Mayo’s notion of a severe test encapsulates a core idea that is present in all plausible theories of evidence: for something to count as evidence in favor of a hypothesis, it has to be compatible with that hypothesis, and it has to speak against alternative hypotheses.  What is at issue is whether Mayo’s way of spelling out this requirement is adequate.

Several philosophers have argued that following Mayo’s theory would lead one to endorse instances of the base-rate fallacy.  They typically proceed by giving an example of a case in which, they claim, Mayo’s account would lead one to endorse the claim that given data are evidence for a given hypothesis, even though the hypothesis has a low posterior probability conditional on the evidence.  Such examples are easy to cook up—they only require a sufficiently low prior probability.  Mayo and other advocates of error statistics give at least two kinds of responses to these arguments.  The first kind of response is to argue that the assignment of the posterior probability to the hypothesis is illegitimate because it violates well-motivated frequentist scruples about testing and the use of probabilities.  The second kind of response is to admit that the error-statistical account can conflict with the assignment of posterior probabilities and to argue that this discrepancy is a sign of inadequacy in accounts of evidence that make posterior probabilities central, rather than an inadequacy in error statistics.  As Mayo puts it, what matters for evidence is not whether a hypothesis is highly probable, but whether it has been highly probed.

Neither of these responses has much intuitive appeal for me.  The first response appeals to scruples that generally strike me as overly strict.  I understand not wanting to make relatively unconstrained personal probabilities central to scientific practice, but I don’t understand being totally opposed to any use of epistemic probabilities regardless of how strong the evidential basis for those epistemic probabilities may be.  The second response seems radically implausible.  If I understand her correctly, Mayo is saying that a hypothesis may be simultaneously highly belief-worthy and highly improbable.  That strikes me as a strange thing to say.


I am not sure, however, that I fully understand either of these responses, and I’m not sure that the philosophers who have written articles criticizing the error-statistical account have taken the time to try to understand them either.  Perhaps those critics have judged that it simply isn’t worth their time to examine the frequentist responses in detail, since they seem on a cursory examination to be rather confused.  If they are right, then examining these responses might be somewhat a waste of my time.  If nothing else, however, I think that a clear, cogent analysis of frequentist responses to the base-rate objection from someone who isn’t firmly in either the frequentist or the Bayesian camp would be a useful contribution to the literature on this topic.  

The first response I sketched has been advanced by Aris Spanos in the latest edition of Philosophy of Science.  I found this article too difficult to understand by skimming.  I am planning to read it closely and to analyze and evaluate Spanos’s argument in a future post.  The second has been advanced by Mayo in the article I am currently examining* (and in many other places, I believe).  I believe that some advocates of error statistics favor a third kind of response that would bring error-statistical judgments in line with Bayesian posterior probabilities in contexts when Bayesian priors have good frequentist credentials by somehow regarding the information that the Bayesian priors provide into their severity assessments.  I haven't seen this response in print, however, and I'm not sure exactly how it's supposed to go.

*My remarks in this post are based primarily on Mayo’s article “Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses” from the volume Scientific Evidence: Philosophical Theories & Applications, edited by Peter Achinstein.

Tuesday, January 25, 2011

Truth and Value in Philosophy

(This post is not comp-related; it's just an attempt to crystallize something that I was thinking about today.)

Many philosophers (including me) tend to operate under the assumption that a claim has value only if it is true.  This assumption is propped up by a vague worry that the only way to deny it would be to adopt an unattractive view about truth, such as nihilism,  some kind of relativism, or an obscure post-modernism.

This worry is unwarranted.  The problem (or at least a problem) with the assumption that a claim has value only if it is true is not that there is no such thing as truth, or that truth is not absolute, or that truth is not an important desideratum; it is that truth is too high a standard to apply across the board to all intellectual endeavors.  Truth requires total precision, an absence of ambiguity, and complete fidelity to the facts.  It's difficult to think of any claims outside of logic and mathematics that meet this standard.

One might admit that truth is too high a standard, but deny that it is too high an aspiration.  That is: we should always aim for the truth, even if we can never reach it.

I agree that, within a particular inquiry, one should aim to be as faithful to the facts as possible, all else being equal.  However--and here's the point that I want to emphasize--one should not place such a high premium on truth that one neglects interesting and important approaches to interesting and important topics on the sole grounds that those approaches to those topics will not yield claims that are true, or even very nearly true.

Consider the account of the nature of science that Kuhn presents in The Structure of Scientific Revolutions.    Many critics have pointed out that many aspects of Kuhn's view are false.  Take, for instance, the claim (implicitly suggested if not explicitly stated in Kuhn's account) that any sufficiently mature field of science is governed by one dominant paradigm at any given time.  It would be easy to multiply examples of seemingly mature fields that contain multiple competing paradigms, and perhaps fields that don't seem to be governed by any very unified paradigms at all.

I don't claim that critics should not compare Kuhn's account of science to actual scientific practice and to point out disparities.  What I do claim is that one should not be too quick to move from the claim that Kuhn's account is false in many respects to the claim that Kuhn's account is not useful or valuable for understanding science.

Kuhn gave us is a highly original way to think about science.  That framework is imperfect in many ways.  If we take it too seriously, it can lead us astray.  But by giving us a new way to think about science, it allows us to notice things we would not otherwise have noticed and to draw connections we might not otherwise have drawn.  It may be that, even though Kuhn's account is false, we're better off with it than without it.

The same could be said about any number of ambitious projects, particularly in intellectual history.  Dewey's The Quest for Certainty and Collingwood's The Idea of Nature are good examples.  We philosophers tend not to read such works any more, and we (at least most of us) certainly don't attempt to write them.  One reason we shun such large-scale projects, I think, is that we realize that any picture we can paint with such broad strokes will inevitably misrepresent the complex, messy reality of the phenomena we wish to characterize.  What we say won't be true.  Moreover, counterexample-mongers will advance their careers by pointing out that what we say isn't true.

But maybe that's ok.  Maybe what we say doesn't have to be true to be illuminating.  Fidelity to the facts matters, but it doesn't matter so much that any significant departure from the facts completely vitiates a project; and it doesn't matter so much that we should only pursue projects for which near-perfect fidelity to the facts is an achievable goal.

Update on the Oil-Drop Apparatus

On Friday, Paolo and I are meeting with Jim Stango, Pitt's physics demonstrator, to borrow the Physics and Astronomy Department's miniaturized PASCO apparatus.  We'll take it to the HPS lab and start figuring out how to use it.  On February 14 we're planning to use it in a session of the Experimental HPS seminar Paolo is running this semester.

The plan at this point is to start with the miniaturized apparatus and then to use our experiences with it as the basis for a proposal to work with Millikan's original equipment.  At that point, we should be able to show pretty convincingly that we understand the experiment and that we have specific questions about it that working with the original apparatus can help us answer.  The biggest question will be whether the good folks at Cal Tech will be willing to let outsiders get their hands on a major piece of physics history.

Monday, January 24, 2011

The Original Millikan Apparatus

I think CalTech still has Millikan's original apparatus, and they still use it for demonstrations. It would be fantastic if there were some way I could get my hands on it for this project. The idea isn't incredibly far-fetched. I know that one researcher who was working on Goethe's optics got permission to redo Goethe's experiments using his original instruments, which were being held by a museum. The fact that they still use the apparatus for demonstrations shows that they aren't protecting it like a holy relic, and it's evidently a very sturdy piece of equipment. I could probably get at least partial funding for travel and living expenses from the Salmon fund. Being able to work with the original apparatus would elevate my history comp from potentially marginally interesting to very, very cool.

But, ok, it is a little far-fetched to think that all of the relevant factors would have to come together. First, I need to confirm that CalTech actually has the apparatus and that it is in working order with all of the necessary auxiliary parts available. Then I would have to get permission to work with it, which is not impossible but somewhat unlikely. After that, I would have to get sufficient funding to fly out to California and live there for, I would think, about a month. Finally, I would have to weigh the costs and benefits of either being away from or uprooting my family for a good chunk of the summer.

Still, it's worth looking into. I emailed Paolo about my suspicion that CalTech has the apparatus, and he seemed intrigued. My evidence that they have it comes from two videos online. The first is a brief demonstration using an apparatus that looks just like Millikan's:



The second is an old educational video that was filmed in the same lecture hall, in which the lecturer points to an identical-looking apparatus in roughly the same spot on the same table and says explicitly that it is Millikan's own apparatus (about 21:40 in):




Come to think of it, it might actually be better for me if they have a replica of Millikan's apparatus, rather than the original one. The original would have a nice "gee-whiz" factor, but they might be more willing to let a grad student tinker with a replica.

I suggested that Paolo contact Jeff Cady, the demonstrator in the first video, because he is more likely to be taken seriously than I am. We'll see what happens!

Sunday, January 23, 2011

Overview of the Philosophy Comp Project

I would like to write my philosophy comp on a topic in the philosophy of statistics. My leading idea at the moment is to take a close look at some aspect of Deborah Mayo's error-statistical account of statistical inference. The philosophy of statistics (as well as statistics itself) is roughly divided into two camps: frequentist and Bayesian. (I'll ignore likelihoodists for now.) Among philosophers, Mayo is the de facto leader of the frequentist camp. My impression is that the division between the two camps is deep enough that not many people have taken a serious, critical look at Mayo’s work. Those in the frequentist camp already, by and large, agree with her; those in the Bayesian camp can’t be bothered; and people in the philosophy of science who aren’t firmly in either camp, like me, haven’t done much work on the topic.

Bayesian philosophers of science use Bayes’ theorem as the basis for a normative theory of scientific inference. Frequentists objections to this program generally focus on the use of prior probabilities. Frequentists claim that subjective prior probabilities are out of place in science, which they claim ought to be as objective as possible. Some frequentists go further and deny that we can even speak meaningfully about the probability (prior or posterior) of a hypothesis. Orthodox Bayesians, by contrast, interpret all probabilities as subjective degrees of belief, and claim that one cannot draw valid probabilistic inferences without taking prior probabilities into account.

One topic of ongoing debate between the two camps is the claim that frequentist hypothesis testing is subject to the base-rate fallacy. From what I've seen of the literature on this debate, it seems that Bayesians are guilty of a hit-and-run: they've raised the base-rate fallacy objection, but have not made much of an effort to understand and respond to frequentist rejoinders. Bayesians are free, of course, to focus on whatever problems they find most pressing; but someone ought to engage the frequentist position in a more serious way.

In subsequent posts, I will work through some of the central papers in this debate. Here’s my bibliography so far:

  • Collin Howson (1997), “Error Probabilities in Error"
  • Deborah Mayo (1997), “Error Statistics and Learning from Error: Making a Virtue of Necessity”
  • Ronald Giere (1997), “Scientific Inference: Two Points of View”
  • Collin Howson (2000), Hume’s Problem
  • Peter Achinstein (2001), The Book of Evidence
  • -- (2010), “Mill’s Sin’s or Mayo’s Errors?”
  • Collin Howson and Peter Urbach (2005), Scientific Reasoning: The Bayesian Approach
  • Deborah Mayo (2010) “Sins of the Epistemic Probabilist: Exchanges with Achinstein”
  • Aris Spanos (2010), “Is Frequentist Testing Vulnerable to the Base-Rate Fallacy?”

Recreating the Oil-Drop Experiment

Creating a full-scale reproduction of Millikan's apparatus is not feasible on our budget. For now, at least, we will be using a commercially available PASCO apparatus intended for instructional use in undergraduate physics labs. We will be borrowing the PASCO apparatus from the Physics & Astronomy department, which apparently used to use it for undergraduate labs but no longer does.

The fact that we will be using a commercially available device raises at least two questions. First, why bother to recreate the experiment when thousands of undergrads have already done so? Second, won't differences between the PASCO device and Millikan's apparatus limit what we can learn from the re-production?

The answer to the first question is that we will be using the PASCO device to address questions that it is not used to address in undergraduate physics labs. We are not re-visiting Millikan's experiments because it would be fun to try to replicate his results; we are trying to gain insight into historical questions, such as questions about why Millikan handled his data the way that he did, by learning from firsthand experience what Millikan saw when he looked into the oil-drop chamber.

The answer to the second question is a bit more complicated. The PASCO apparatus is essentially a miniaturized version of Millikan's apparatus. All of the physics is there--including a source of ionization radiation--but some things will be different. For instance, one of the advantages of the larger size of Millikan's apparatus is that the metal plates across which a charge is placed more closely approximate the idealized infinite parallel-plate capacitor for a longer range of fall distances. As a result, Millikan's electric field was likely more uniform than ours will be. Another issue will be that the phenomena won't look quite the same in our experiment as they did in Millikan's. The inside walls of the oil drop chamber are made of a different material, the light source that illuminates the drops will be different, the viewing telescope will be different, and so on. In short, we will have to be cautious about extrapolating from what we see to what Millikan would have seen. Ideally, we would like to find textual support in Millikan's writings for any claims we make about what he would have seen based on what we see.

There are special video cameras that would allow us to record the experiment. We are hoping to take videos and post them online, as Paolo has done for his Galileo and Coulomb experiments (here). However, we do not yet know whether Physics & Astronomy has the right kind of camera and, if so, whether they would be willing to lend it to us.

The Oil-Drop Experiment

Here's a diagram of Millikan’s oil-drop apparatus. The main part of the apparatus is a big, hollow metal chamber, shown below:


An atomizer (“A” in the diagram) sprays oil drops into the chamber. Originally, Millikan used the atomizer from a perfume bottle (as in the photograph, I’m guessing), but over time he got fancier and starting using a pressure tank and passing the oil through glass wool before it got to the chamber, presumably to eliminate dust.

As the oil drops pass through the atomizer, they pick net electric charges due to friction, the same way that you and I pick up a net electric charge when we shuffle across the floor wearing wool socks. The charge oil drops then fall under the force of gravity until some of them pass through a small opening so that they are between the two metal plates labeled “M” and “N” in the diagram. Those plates are connected to a series of batteries. Those batteries set up an electric field across the plates. One can tweak the direction and magnitude of this electric field so that it exerts an upward force on a given drop that exactly balances the downward gravitational force on that drop. In this way, one can “capture” a drop and hold it suspended.

In an early version of the oil-drop experiment (actually, at that point it was a water-drop experiment), Millikan would simply balance the electric and gravitational forces on a drop until the drop became suspended, and then calculate the electric charge that the drop would have to have for the two forces to balance, based on an estimate of its mass. That procedure was a good start, but it wasn’t terribly precise and it didn’t allow for direct measurements of changes in charge on a single droplet, which would provide the most convincing evidence that electric charge comes in discrete multiples of a fundamental unit.

Millikan switched from water drops to oil drops because the latter evaporate much more slowly, allowing him to make a long series of observations on a single drop during which the mass of the drop remained essentially unchanged. After he had captured a drop, he would switch off the electric field and record the time it took for the drop to fall under an electric field alone. He would then switch on the electric field, with its magnitude adjusted so that it exerted an upward force of the drop that was stronger than the downward force of gravity. He would then record the time it took the drop to rise the same distance as before. From this data, he could calculate the charge on the drop.

Occasionally, during a series of observations an oil drop would pick up a positive or negative ion from the air around it, changing its charge. This change would manifest itself as a change in the speed of the drop during the rising phase of its movement. Millikan used a source of ionizing radiation to make these illuminating events more common. Sometimes a drop would pick up a charge in mid-rise, so that its speed changed suddenly and discretely while Millikan was watching it. Millikan says that it is particularly striking to see a rising drop suddenly stop moving—a phenomenon that the electron theory can explain easily as being due to a drop with one unit of charge picking up a unit charge of the opposite sign. One reason to re-do the experiment is to see these changes in the speed of a drop during its rising phase. Are they really as striking as Millikan says, or is he dramatizing the experiment for rhetorical effect?

Previous experiments had suggested that the fundamental unit of charge, if there is one, has a charge in the vicinity 3 x 10^-10 e.s.u. Thus, if there is such a fundamental unit, Millikan should find two things:

  • First, all of his calculations of the total charge on a drop should yield values that are (approximately) integral multiples of a single number on the order of 10^-10 e.s.u.
  • Second, the calculated changes in charge should occur in small integral multiples of the same number.

For the data Millikan reported, these conditions were satisfied beautifully. A few of his runs were anomalous, however, as I will explain in future posts. Millikan calculated a value of 4.774 +/- 0.009 x 10^-10 e.s.u. for the elementary electric charge, which is slightly smaller than a recent value of 4.80320420 +/- 0.00000019 e.s.u. (The discrepancy is due primarily to the fact that Millikan used an incorrect value for the viscosity of air.)

Next post: How we will re-create Millikan's experiment.

Overview of the History Comp Project

’ve worried more about the history comp than I have about the philosophy comp, because I’m primarily a philosopher by disposition and by training. However, I think I have a pretty good start on this one.

My history comp project is a piece of experimental HPS, which involves revisiting past experiments in order to address historical, philosophical, and scientific questions. I am working with Paolo Palmieri, a professor in my department, who has been a pioneer in this field. Most of his experimental HPS work has been on Galileo’s pendulum and inclined-plane experiments. There have been debates among Galileo scholars about whether Galileo actually performed certain of the experiments he reports, or whether they were merely thought experiments. Or perhaps they were somewhere in between: maybe he rolled balls down inclined planes, but he didn’t measure the times of fall precisely enough really to demonstrate the results he claims. Historical reconstructions of Galileo’s experiments can contribute to this debate by, for instance, providing evidence that he could or could not have gotten certain kinds of results using the materials and methods that were available to him.

I am planning to take an experimental HPS approach to Robert Millikan’s oil-drop experiments. These experiments—performed primarily between 1909 and 1912—convinced nearly all physicists that electricity comes in discrete units and provided the most precise measurement of the elementary unit of charge that had ever been performed.

Millikan published his first definitive paper with results from the oil drop technique in 1913. In 1923, he won the Nobel Prize for these experiments and his work on the photoelectric effect. One reason it took ten years for him to win the Nobel was that his work was challenged by Felix Ehrenhaft, a Viennese physicist who claimed to find evidence in experiments similar to Millikan’s for “subelectrons,” units of charge smaller than Millikan’s supposedly fundamental units. Millikan argued that Ehrenhaft was generating artifacts by using bits of metal and other materials that, Millikan claimed, could not have spherical, unlike Millikan’s oil drops. Ehrenhaft argued that his materials were in fact spherical, and that Millikan begged the question in favor of the existence of a fundamental unit of charge in the way he analyzed his data. Millikan published his last, rather cranky paper on this debate in 1925, while Ehrenhaft continued to argue his case into the 1940s. By that time the mainstream physics community had accepted Millikan’s results and rejected Ehrenhaft. Ehrenhaft did not help his cause by claiming to have created other anomalous effects, such as magnetic monopoles and magnetolysis.

Although Millikan won the day, questions remain about what Ehrenhaft was observing that made him think that he had found subelectrons and about why the physics community chose to accept Millikan’s results and reject Ehrenhaft’s. These questions became more complicated and more pressing in the 1960s, when historian Gerald Holton reviewed Millikan’s laboratory notebooks from the experiments that made it into his 1913 paper. He found that Millikan selectively excluded some of his experimental results from publication. Holton argues that Millikan was just doing what all good experimentalists do—making reasoned judgments about which results were reliable and which were unreliable. However, the results he threw out included a small number of runs that seemed fine except for the fact that they would have seemed to support Ehrenhaft’s subelectron hypothesis. Moreover, Millikan did not just fail to report that he discarded anomalous results; he says explicitly says that he did not do so: “It is to be remarked, too, that this is not a selected group of drops but represents all of the drops experimented upon during 60 consecutive days” (Millikan 1913, 138). It is hard to interpret these words in a way that doesn’t make Millikan out to be a liar. Daniel Goodstein has tried, but I’ve yet to understand his explanation.

There is a fairly large literature on the Millikan-Ehrenhaft dispute and on the issue of Millikan’s alleged fraud. I aim to contribute to this literature with insights derived from working with a scaled-down version of Millikan’s apparatus. (I also have high-quality scans of Millikan’s laboratory notebooks that I have begun to examine.) It is hard to know in advance what these insights will be, but I have a few guesses. First, I want to compare the phenomenology of the experiment to Millikan’s remarks on it. Millikan speaks as if the experiment essentially speaks for itself. “One who has seen this experiment,” he says, “has SEEN the electron.” I’ll be interested to find out whether doing the experiment feels like seeing the electron. I’ll explain in a forthcoming post what the experiment involves and why I think it might feel this way. If it does, then it would be a plausible conjecture that the experience of doing the experiment played a major role in convincing Millikan that the theory of a fundamental unit of charge was correct—perhaps to the point that he saw the data as a tool for displaying the correctness of the theory, rather than as the primary epistemic basis on which the theory should rest. I think there is some textual support for this conjecture, but it would be greatly strengthened if the experience of doing the experiment turns out to be rather striking. If the experience of doing the experiment bears out this conjecture, then it would, I think, give some grounds for being a bit more lenient regarding Millikan’s data manipulation than we might otherwise be. Why report data that seemed to speak against the electron hypothesis when he had SEEN the electron, and thus knew that it was real?

Second, it would be quite interesting if we could reproduce anomalous observations akin to some of those the results from which Millikan discarded. For instance, Millikan says at a couple points in the notebook that a drop “flickers” as though “unsymmetrical in shape.” (He threw out the results from those drops.) What do such drops look like? Is the flickering obvious? Do the flickering drops seem different from the oil drops in other ways? Can we make any reasonable conjectures about what the flickering drops are, e.g. specks of dust or multiple drops stuck together? It would be even more interesting, if we could do it, to run the experiment using some of the materials Ehrenhaft used. Would we get the same anomalous results? Is there any way we could use modern tools to investigate whether these materials were in fact spherical or not? Could we use computer simulations to investigate how departures from sphericality should affect the results?

In the long run, it would be fantastic to do a plausible re-creation of some of Ehrenhaft’s experiments. Trying to explain Ehrenhaft’s results is the kind of project that one would like HPS scholars to pursue—a side-trail in the history of science that’s quite interesting and unresolved, but that working scientists don’t have time for. However, the practical obstacles are daunting. Ehrenhaft’s apparatus was more complicated than Millikan’s and, as I’ll explain in the next post, we’re not even doing a full-fledged re-creation of Millikan’s. I’ll see how this relatively easy Millikan re-creation goes, and then perhaps think about a more ambitious project on Ehrenhaft.

Hey look, I’m already at over 1,000 words—1/7 of the way to my history comp! I’ll be done in a week. Or not.