A phenomenon called spectrum bias might help my argument that advocates of error statistics should take the positive predictive value (PPV) and negative predictive value (NPV) of their tests seriously. Spectrum bias is typically discussed in the context of medical diagnostic tests. Such tests are characterized by their sensitivity and specificity, where a test’s sensitivity is the probability that it yields a positive result if the condition in question is present, and its specificity is the probability that it yields a negative result if the condition in question is absent. PPV and NPV are more clinically relevant than sensitivity and specificity. However, sensitivity and specificity are more popular measures of a test’s performance because, unlike PPV and NPV, they are generally taken to be intrinsic properties of the test, independent of the prevalence of the condition in the population.
Spectrum bias is the phenomenon that sensitivity and specificity are not, in fact, intrinsic properties of medical tests. Like PPV and NPV, they vary when they are applied to different populations. There are both theoretical and empirical studies supporting the claim that spectrum bias exists. At least one study I have looked at purports to show that sensitivity and specificity vary with features of the population almost as much as PPV and NPV. One part at least of the explanation for this phenomenon is that medical conditions typically are not truly dichotomous; they can be present to varying extents. Misclassification is more likely for individuals who are close to the classification cutoff. As a result, sensitivity and specificity are lower for populations in which many individuals are close to the cutoff than they are for populations without this feature.
If spectrum bias afflicts error statistical tests generally, then an advocate of error statistics cannot deny the relevance of PPV and NPV on the grounds that they are not intrinsic properties of tests without also impugning their preferred error rates α and β.
I need to find out more about spectrum bias and its prevalence and severity before I can be confident that this argument is a good one. However, it does seem promising and is not likely to have been considered before within the philosophy of science, where spectrum bias seems to be largely unknown.
A blog about my research at the intersection of the philosophy of science and statistics.
Friday, February 25, 2011
Wednesday, February 23, 2011
A Refinement of Ioannidis' Argument
I've briefly written up in Word an idea for a refinement of Ioannidis' argument that would yield results that are relevant to error statistics and the base-rate fallacy. You can download the file here. (Unfortunately, the figures don't show up properly in the Google docs viewer that the link brings up--you need to download the file and view it in Word.)
Monday, February 21, 2011
Teaching Reflection (Baranger Award Application Materials III)
The third and final portion of the Baranger Award Application is the Teaching Reflection:
Please submit a brief (300 words) description of how your sample teaching material (submitted below) reflects your teaching philosophy. You may wish to address how this material is useful to students and/or how it contributes to student learning. Additionally, you might consider how you might revise the materials now that you have had an opportunity to use them in the classroom.
The sample teaching material I am providing is here. My description of it is as follows:
I have provided a handout for a writing lesson that I gave in the course Introduction to the Philosophy of Science. I believe that students should receive writing instruction throughout their studies. At the same time, I cannot let a philosophy course turn into a writing course. To balance those demands, I developed a lesson that aims to help students improve their writing as much as possible while only devoting a single session explicitly to writing. This lesson focuses on three simple but powerful tips and gives those tips names so that I can refer to them for the rest of the term.
This lesson reflects lessons I learned while teaching test-prep courses, as I explain in my description of a teaching challenge I faced. In particular, the lesson is structured so that the ideas are bite-sized and uncluttered. In each of the three main sections, I begin by introducing the core idea of that section through an example. I then use a series of additional examples to introduce a few wrinkles into that core idea. Each example is there to make one simple point, and I resist the temptation to comment on an example beyond that simple point. After the examples, I provide a few notes that sum up the points they are meant to illustrate, and then I restate the main point of the section. The lesson is highly interactive, with students reading examples and suggesting revisions throughout. It ends with a drill that gives students a chance to practice improving some bad passages drawn from actual academic writing. The drill is essential because it allows students to start trying to apply the lesson to realistic cases while their peers and I are there to help them when they run into trouble.
Thursday, February 17, 2011
Teaching Philosophy Revised
Here's a new draft of my statement of teaching philosophy, which I am planning to revise and use as part of my application for the Elizabeth Baranger teaching award. Feedback welcome!
Statement of Teaching Philosophy
Philosophy courses provide excellent opportunities to teach skills and habits of mind that are central to a liberal arts education. Those skills include abilities to analyze and evaluate arguments, to formulate reasonable views about complex issues, and to articulate and defend those views both orally and in writing. Such skills help students become responsible citizens, valuable employees, and thoughtful human beings.
The ability to analyze and evaluate arguments is fundamental for many fields, including not only philosophy but also career fields such as science, medicine, and law. It is also essential for formulating reasonable, nuanced beliefs in a time of extremist commentary. Philosophy courses provide excellent opportunities to teach these skills both because philosophy is a highly contentious discipline and because philosophers attend explicitly to the norms of argumentation. I help my students acquire these critical thinking skills in several ways. For instance, early on in a course I teach a lesson about how to analyze and evaluate arguments. That lesson establishes a framework for talking about arguments that I continue to use throughout the term. I also require students to write a number of Reading Responses in which they choose an argument from one of their readings to analyze and evaluate. I use a peer-review system for these assignments in which students receive frequent feedback on their writing from one another in addition to the feedback they receive from me.
Critical thinking skills are essential, but students should also learn to think synthetically and constructively. Philosophy courses are well suited for teaching those skills because they give students opportunities to present their own views both in written work and in class discussions. I prefer essay topics that are related to but not identical to topics we discuss in class, so that students can use ideas that we have discussed but cannot simply restate them. When students receive their first essay assignment, I teach them a few simple ways to improve the clarity of their writing and give those tips simple names so that I can refer to them throughout the term. I also work to ensure that students feel comfortable sharing their ideas in class while at the same time helping them to improve their oral presentation skills. I tell them from the beginning that it is okay to be wrong in a difficult field such as philosophy, and that it is generally more productive to try out a view and see where it leads than to remain forever sitting on the fence. I reinforce this message by taking students’ ideas seriously, pointing out their merits and raising concerns without shooting them down. At the same time, I ask students to avoid selling their ideas short; for instance, I ask them to avoid the weak phrase “I feel like...” in favor of the more forceful “I think that...” and to avoid expressing statements as if they were questions. I work to build student participation into my lessons as much as possible so that students come to class expecting to speak.
Many topics of debate in our society have at their heart philosophical issues. For instance, debates about whether alternatives to the theory of evolution should be taught in public school science classes often turn on the question of what distinguishes science from non-science, which philosophers of science call the problem of demarcation. I aim to help my students develop a more sophisticated perspective on those debates and a greater appreciation for the importance of philosophy by highlighting such connections. In one case, I gave my students a New York Times op-ed piece by Deborah Tannen and asked them to comment on it in light of Karl Popper’s philosophy of science. I was thrilled to see that they were able to identify what appears to be an ad-hoc maneuver by Tannen to save her favored theory--a no-no according to Popper. I then gave them the option to write a Reading Response in which they applied Popper’s philosophy to Tannen’s article and gave their own view about whether they agree with what Popper’s theory says about this case.
I aim to persuade my students that they need philosophy to think about issues they care about. In addition, I aim to give them skills that will allow them to think clearly and carefully about those issues and to be eloquent in sharing their thoughts with others. Such skills are vital not only in the workplace, but also in private life and democratic society.
Monday, February 14, 2011
Teaching Philosophy (Baranger Award Application Materials II)
I solicited and received nominations for the Elizabeth Baranger Excellence in Teaching Award, which aims to "recognize and reward outstanding teaching by graduate students at Pitt." I'll be posting here drafts of my application materials. Feedback is welcome!
The application requires a "Statement of Teaching Philosophy," which the application website describes as follows:
The following is a statement of teaching philosophy that I wrote for a different occasion. I am not entirely happy with it, and I think that it is too focused on the subject of philosophy for the Baranger Award application (as opposed to an application for a job in a philosophy department), but it is a start:
College courses in many disciplines primarily present stories of intellectual triumphs, such as ingenious methods, surprising discoveries, and successful theories. By contrast, a typical philosophy course primarily presents stories of intellectual failures: everyday concepts that resist analysis, simple paradoxes that resist resolution, and compelling questions that resist definitive answers. From a pedagogical perspective, this feature of a typical philosophy course generates both challenges and opportunities. One major challenge is to avoid giving students the impression that philosophy is pointless because it never makes any progress. One major opportunity is to help students become more reflective and critical about their beliefs.
I sympathize with students who complain about the fact that philosophers seem unable to solve any of the major problems they set themselves. I used to respond to students who came to me with this complaint by pointing out that philosophy does make progress of sorts---we now know that many seemingly plausible positions cannot be made to work. However, they often find this answer unsatisfying because all of the progress philosophers make seems to be negative; we know a lot in many cases about which answers will not work, but have learned little about the answer that will work. A better response, I now think, is to point out that this feature of philosophy tells us something important: it is exceedingly difficult to develop defensible views about many very basic issues. As a former professor of mine put it, philosophy really teaches you that you can't just say any old thing.
Philosophy is less a body of information than a set of skills and habits of mind. Students in philosophy courses should learn to read a document, understanding its author's position, identifying his or her argument for that position, and evaluating the strength of that argument. They should also learn to develop their own positions on complex issues and to present cogent arguments for those positions. These skills are fundamental for critical thinking and thus are useful not only for philosophy, but also for many professions and for everyday life. Because philosophy is a highly contentious discipline, a philosophy course provides excellent opportunities for teaching these skills. To some extent, students will pick these skills up naturally through the process of learning about and doing philosophy. However, making explicit what these skills involve can accelerate this process. In addition, it can help to give students opportunities to practice these skills and to receive feedback on their performance.
As a philosophy teacher, my primary goal is to help my students acquire the argumentative skills of a good philosopher. In addition, I aim to help my students to understand and internalize the specific content of the course. Educational resarch suggests that testing students on a given body of material helps them internalize that material more effectively than simply reviewing that material with them, so I give frequent small quizzes. Educational research also suggests that teaching something to someone else is one of the most effective ways to internalize it, so I have my students review their answers with one another before we discuss them together. This peer review method has the additional benefit that weaker students can receive more personalized attention than I can give them. Of course, I also review the material with them myself to prevent misconceptions from taking root and to answer questions that the peer review process fails to resolve.
Another distinctive feature of my teaching, besides peer review, is my interactive style of lecturing. My basic approach is not to provide any information that I could elicit from the students. As a result, in a one-hour recitation with twenty students, nearly every student speaks in every session. I believe that this practice reinforces what the students have learned better than me reciting all of the information to them. It also keeps the students alert and engaged, which is crucial for their learning.
Teaching philosophy is an excellent opportunity to help students acquire both critical thinking skills and a reflective habit of mind. I have developed some techiniques to try to make the most of this opportunity, and I look forward to continuing to improve my pedagogical skills.
The application requires a "Statement of Teaching Philosophy," which the application website describes as follows:
For purposes of the A&S GSO award, you should consider the statement to be an explication of your pedagogical goals, methods, and theories. Although you may reference established pedagogical theories, what we as a committee are most interested in is your own understanding and how you put those ideas into practice, not your knowledge of the current jargon of teaching theory. For the purposes of this award, you do not have to focus exclusively on concrete examples in this component, because your teaching philosophy should be complementary to your example of and reflection on your teaching materials and the other application materials.
One of the primary goals of this statement is for you to demonstrate, or gain, a consciousness of the processes of learning in and out of your teaching environment. However you choose to do it, let your readers know how you think learning happens, what the best ways to facilitate this are, and how you put these ideas into practice. You may choose to address some of the following questions in your philosophy:
- What are your objectives as a teacher? What methods do you use to achieve these goals? How do you assess and evaluate your effectiveness in achieving your objectives?
- What do you believe about teaching? About learning? How is this played out in your classroom?
- Why is teaching important to you?
- Focus on how you go about teaching, with concrete examples where necessary or appropriate, and a reflection on how students react(ed) to concepts and/or innovations. You may reference other materials you have submitted.
- Share insights about teaching in your specific discipline (importance of the field, theoretical grounding when necessary).
- It is acceptable to talk about your mistakes in order to demonstrate what you learned from them.
We recognize that teachers at different levels of teaching will have differing amounts and quality of experiences from which to draw. We are more interested in how well you were able to work within the parameters you were given. Your Teaching Philosophy should not exceed 750 words.
The following is a statement of teaching philosophy that I wrote for a different occasion. I am not entirely happy with it, and I think that it is too focused on the subject of philosophy for the Baranger Award application (as opposed to an application for a job in a philosophy department), but it is a start:
College courses in many disciplines primarily present stories of intellectual triumphs, such as ingenious methods, surprising discoveries, and successful theories. By contrast, a typical philosophy course primarily presents stories of intellectual failures: everyday concepts that resist analysis, simple paradoxes that resist resolution, and compelling questions that resist definitive answers. From a pedagogical perspective, this feature of a typical philosophy course generates both challenges and opportunities. One major challenge is to avoid giving students the impression that philosophy is pointless because it never makes any progress. One major opportunity is to help students become more reflective and critical about their beliefs.
I sympathize with students who complain about the fact that philosophers seem unable to solve any of the major problems they set themselves. I used to respond to students who came to me with this complaint by pointing out that philosophy does make progress of sorts---we now know that many seemingly plausible positions cannot be made to work. However, they often find this answer unsatisfying because all of the progress philosophers make seems to be negative; we know a lot in many cases about which answers will not work, but have learned little about the answer that will work. A better response, I now think, is to point out that this feature of philosophy tells us something important: it is exceedingly difficult to develop defensible views about many very basic issues. As a former professor of mine put it, philosophy really teaches you that you can't just say any old thing.
Philosophy is less a body of information than a set of skills and habits of mind. Students in philosophy courses should learn to read a document, understanding its author's position, identifying his or her argument for that position, and evaluating the strength of that argument. They should also learn to develop their own positions on complex issues and to present cogent arguments for those positions. These skills are fundamental for critical thinking and thus are useful not only for philosophy, but also for many professions and for everyday life. Because philosophy is a highly contentious discipline, a philosophy course provides excellent opportunities for teaching these skills. To some extent, students will pick these skills up naturally through the process of learning about and doing philosophy. However, making explicit what these skills involve can accelerate this process. In addition, it can help to give students opportunities to practice these skills and to receive feedback on their performance.
As a philosophy teacher, my primary goal is to help my students acquire the argumentative skills of a good philosopher. In addition, I aim to help my students to understand and internalize the specific content of the course. Educational resarch suggests that testing students on a given body of material helps them internalize that material more effectively than simply reviewing that material with them, so I give frequent small quizzes. Educational research also suggests that teaching something to someone else is one of the most effective ways to internalize it, so I have my students review their answers with one another before we discuss them together. This peer review method has the additional benefit that weaker students can receive more personalized attention than I can give them. Of course, I also review the material with them myself to prevent misconceptions from taking root and to answer questions that the peer review process fails to resolve.
Another distinctive feature of my teaching, besides peer review, is my interactive style of lecturing. My basic approach is not to provide any information that I could elicit from the students. As a result, in a one-hour recitation with twenty students, nearly every student speaks in every session. I believe that this practice reinforces what the students have learned better than me reciting all of the information to them. It also keeps the students alert and engaged, which is crucial for their learning.
Teaching philosophy is an excellent opportunity to help students acquire both critical thinking skills and a reflective habit of mind. I have developed some techiniques to try to make the most of this opportunity, and I look forward to continuing to improve my pedagogical skills.
Teaching Challenge (Baranger Award Application Materials I)
I solicited and received nominations for the Elizabeth Baranger Excellence in Teaching Award, which aims to "recognize and reward outstanding teaching by graduate students at Pitt." I'll be posting here drafts of my application materials. Feedback is welcome!
The first item I have written is a response to the following prompt:
The first item I have written is a response to the following prompt:
In an essay of no more than 500 words, please (a) describe a challenge you faced and overcame as a teacher, explaining (b) how you dealt with it and (c) what you learned from it.
With regards to (a), consider answering some of the following questions: Was the challenge one of how you relayed information to students, how you assessed students, how you organized material, or perhaps with what kind of an attitude you approached the course? Were any particular aspects of your teaching philosophy put under scrutiny as a result of facing the challenge? Do you think this type of a challenge is commonly faced by graduate student teachers?
With regards to (b), consider answering some of the following questions: Was there any previous planning (for example, a well-made syllabus or a comprehensive teaching philosophy), which prepared you for the challenge that arose? Did you seek help from other graduate students or faculty members? Did you attempt to overcome the challenge the first time you encountered it, or was it only after realizing that the challenge was an on-going element that you decided to address it?
With regards to (c), consider answering some of the following questions: Is the challenge something to be prevented from semester to semester, or do you look forward to facing it again? Have you rethought your teaching philosophy in light of the experience? How has experiencing the challenge forced you to rethink the attitude you take into the classroom?
Here is my response:
The first course I ever taught aimed to prepare high school students for the ACT college entrance exam. I loved my students and spent hours crafting each lesson. My students liked me too and thought that I was very smart. There was only one problem: from the beginning of the course to the end, my students’ practice test scores barely improved. I was teaching, but my students were not learning. One of my students summed up her experience in the course as follows: “I understand your lessons, and I feel like I’ve learned a lot. But my scores aren’t getting any better, and I can’t figure out why.”
Those results bothered me. I refined my lessons, and in subsequent courses my students did slightly better. The improvements were not dramatic, however, until I received additional training to teach a second test type. My trainer noticed a general problem with my teaching and called me on it repeatedly: I was talking too much. Students would tune out during my elaborate explanations. I needed to identify the most important points of the lesson and punch those points one at a time.
In general, I realized, I was focusing too much on developing thorough, logically correct lessons, and not enough on presenting material in a way that helped students learn. I streamlined my lessons, breaking them down into bite-sized pieces each of which was focused on one major point. I read about pedagogical techniques and to incorporate them into my teaching. (Many of the techniques I use are described in my statement of teaching philosophy.) And I continued to monitor my students’ test results to see what was working and what was not.
This experience taught me, among other things, the importance of getting objective feedback about what my students are actually learning. If I had only received feedback in the form of student evaluations, then I would have thought that the first course I taught was going rather well. However, the data I received from their tests results showed otherwise. It is difficult in my field (History and Philosophy of Science) to get data on student learning that is as clear as the data I got from my test-prep courses, but it is possible to get useful information from student responses in class and on written assignments by paying attention to deficiencies in their responses and reflecting on how one’s teaching might have contributed to those problems.
Academic tradition says that a teacher’s responsibility is to present the material, and a student’s responsibility is to learn it. That attitude lets teachers off the hook too easily. Students bear some responsibility for their own learning, of course, but a good educator helps to bridge the gap between where a student is and where he or she should be by the end of a course through skillful pedagogy and sensitivity to the student’s point of view.
Friday, February 11, 2011
Oil and Water
Between 1896 and 1906, J. J. Thomson and his students performed a series of experiments that led to the "cloud method" for measuring the charge on a gaseous ion. Around 1908, Millikan began working to improve upon the cloud method. He first discovered that increasing the voltage he used allowed him to experiment on single droplets rather than an entire cloud, using what he called the "balanced water-drop" method. Depending on whose account you believe, either Millikan or his graduate student Harvey Fletcher thought to use watch oil instead of water because watch oil would evaporate only very slowly.
Today I learned why water drops were so unsatisfactory. It is impossible to experiment on them with an atomizer! I spent several hours trying to figure out why I wasn't getting any water drops in the chamber only to realize that I was getting some, but they disappeared as soon as they arrived. I knew that evaporation was an issue, but I didn't know that it happened so fast.
Sunday, February 6, 2011
Ioannidis' Argument
John Ioannidis is a “meta-researcher” who has written several well-known papers about the reliability of research that uses frequentist hypothesis testing. (The Atlantic published a reasonably good article about Ioannidis and his work in November 2010.) One of his most-cited papers, called “Why Most Published Research Findings Are False,” presents a more general version of the example I presented in my last post. Steven Goodman and Sander Greenland wrote a response arguing that Ioannidis’ analysis is overly pessimistic because it does not take into account the observed significance value of research findings. Ioannidis then responded to defend his original position. I’m planning to work through this exchange, with an eye toward the question whether an argument like Ioannidis’ could be used to present the base-rate fallacy objection to error statistics in a way that is more consistent with frequentist scruples than Howson’s original presentation.
Ioannidis’ argument generalizes the example I gave in my previous post: it uses the same kind of reasoning, but with variables instead of constants. Ioannidis idealizes science as consisting of well-defined fields i=1, …, n, each with a characteristic ratio Ri of “true relationships” (for which the null hypothesis is false) to “no relationships” (the null is true) among those relationships it investigates, and with characteristic Type I and Type II error rates αi and βi. Under these conditions, he shows, the probability that a statistically significant result in field i reflects a true relationship is (1- βi)Ri/(Ri- βiRi- αi).
It’s somewhat difficult to see where this expression comes from in Ioannidis’ presentation. This blog post by Alex Tabarrok presents Ioannidis’ argument in a way that’s easier to follow, including the following diagram:
Here’s the idea. You start with, say, 1000 hypotheses in a given field, at the top of the diagram. The ratio of true hypotheses to false hypotheses in the field is R, so a simple algebraic manipulation (omitting subscripts) shows that the fraction of all hypotheses that are true hypotheses is R(1+R), while the fraction that are false is 1- R(1+R). That brings us to the second row from the top in the diagram: if R is, say ¼ (so that there is one true hypothesis investigated for every four false hypotheses investigated), then on average out of 1000 hypotheses 200 will be true and 800 false. Of the 200 true hypotheses, some will generate statistically significant results, while others will not. The probability that an investigation of a true relationship in this field yields a statistically significant result is, by hypothesis, β. If β is, say, .6, then on average there will be 120 positive results out of the 200 true hypotheses investigated. Similarly, of the 800 false hypotheses, some will generate statistically significant results; the probability that any one hypothesis will do so is α. Letting α=.05, then, on average there will be 40 positive results for the 800 false hypotheses investigated. Thus, the process of hypothesis testing in this field yields on average 40 false positives for every 120 true positives, giving it a PPV of .75. Running through the example with variables instead of numbers, one arrives at a PPV of (1- β)R/(R- βR- α). This result implies that PPV is greater than .5 if and only if (1- β)R > α.
Ioannidis goes on to model the effects of bias on PPV and to develop a number of “corollaries” about factors that affect the probability that a given research funding is true (e.g. “The hotter a scientific field… the less likely the research findings are to be true”). He argues that for most study designs in most scientific fields, the PPV of a published positive result is less than .5. He then makes some suggestions for raising this value. These elaborations are quite interesting, but for the moment I would like to slow down and examine the idealizations in Ioannidis’ argument.
The idea that science consists of well-defined “fields” with uniform Type I and Type II error rates across all experiments is certainly an idealization, but a benign one so far as I can tell. The assumption that each field has an (often rather small) characteristic ratio R of “true relationships” to “no relationships” is more problematic. First, what do we mean by a “true relationship?” One of the most basic kinds of relationship that researchers investigate is simple probabilistic dependence: there is a “true relationship” of this kind between X and Y if and only if P(X & Y) ≠ P(X)*P(Y). However, if probabilistic dependence is representative of the relationships that scientists investigate, then one might reasonably claim that the ratio of “true relationships” to “no relationships” in a given field is always quite high, because nearly everything is probabilistically relevant to nearly everything else, if only very slightly. In fact, a common objection to null hypothesis testing is that (point) null hypotheses are essentially always false, so that testing them serves no useful purpose.
One could avoid this objection by replacing “no relationship” with, say, “negligible relationship” and “true relationship” with “non-negligible relationship.” However, for the argument to go through one would then have to reconceive of α as the probability of rejecting the null given that the true discrepancy from the null is non-negligible, and of β as the probability of failing to reject the null given that the true discrepancy from the null is negligible. Fortunately, these probabilities would generally be nearly the same as the nominal α and β.
The assumption that each field has a characteristic R is more problematic than the assumption that each field has a characteristic α and β for a second reason as well: ascribing a value for R to a test requires choosing a reference class for that test. The assumption that each field has a characteristic α and β is simply a computational convenience; α and β are defined for a particular test even though this assumption is false. By contrast, the assumption that each field has a characteristic R is more than a convenience: something like it must be at least approximately true for R to be well defined in a particular case. This point seems to me the Achilles’ heel of Ioannidis’ argument, and of attempts to persuade frequentists to treat PPV as a test operating characteristic on par with α and β. A frequentist could reasonably object that there is no principled basis for choosing a particular reference class to use in a particular case in order to estimate R. And even with a particular reference class, there are significant challenges to obtaining a reasonable estimate for R.
Thursday, February 3, 2011
Positive Predictive Value as an Operating Characteristic
The Mayo and Howson papers I examined in my last few posts came out of a symposium at the 1996 meeting of the Philosophy of Science Association. In this post, I turn my attention to the third paper that came out of that symposium, this one by Ronald Giere.
Giere takes a somewhat neutral, third-party stance on the debate between Mayo and Howson, although his sympathies seem to lie more with error statistics. He contrasts Mayo and Howson’s views as follows: Howson attempts to offer a logic of scientific inference, analogous to deductive logic, whereas Mayo aims to describe scientific methods with desirable operating characteristics.
This distinction does seem to capture how Howson and Mayo think about what they are doing. However, it does not make me any more sympathetic to error statistics, because it seems to me a mistake to try to separate method from logic. The operating characteristics of a scientific method are desirable to the extent that they allow one to draw reliable inferences, and the extent to which they allow one to draw reliable inferences depends on logical considerations.
Nevertheless, the logic/method distinction is useful for understanding the perspective of frequentists such as Mayo. In fact, one may be able to use the insight this distinction provides to recast the base-rate fallacy objection in a way that will strike closer to home for a frequentist. The key is to present the objection in terms of the positive predictive value of a test and to argue that positive predictive value (PPV) is an operating characteristic on par with Type I and Type II error rates. In fact, a test can have low rates of Type I and Type II error (low α and β), but still have low positive predictive value. Consider the following (oversimplified) example:
Suppose that in a particular field of research, 9/10 of the null hypotheses tested are true. For simplicity, I will assume that all of the tests in this field use the same α and β levels: the conventional α=.05, and the lousy but fairly common β=.5. The following 2x2 table displays the most probable set of outcomes out of 1000 tests:
Test rejects H0 | Test fails to reject H0 | ||
H0 is true | 45 | 855 | 900 |
H0 is false | 50 | 50 | 100 |
95 | 905 |
Intuitively, PPV is the probability that a positive result is genuine. In more frequentist terms, it is the frequency of false nulls among cases in which the test rejects the null. In this example, PPV is 45/95 = .53. As the example shows, a test can have low α and β (desirable) without having high PPV, if the base rate of false nulls is sufficiently low. (Thanks to Elizabeth Silver for providing me with this example.)
Superficially, this way of presenting the base-rate objection appears to be closer to the frequentist framework than Howson’s way of presenting it. However, a frequentist might object that PPV is not an operating characteristic of a test in the same way that Type I and Type II error rates are. Type I error rates, one might think, are genuine operating characteristics because they do not depend on any features of the subject matter to which the test is applied. One simply stipulates a Type I error rate and chooses acceptance and rejections for one’s test statistic that yield that error rate. By contrast, to calculate PPV one has to take into account the fraction of true nulls within the subject area in question. Thus, PPV is not an intrinsic feature characteristic of a test, but an extrinsic feature of the test relative to a subject area.
This objection ignores the fact that Type I error rates are calculated on the basis of assumptions about the subject matter under test—most often, assumptions of normality. As a result, Type I error rates are not intrinsic features of tests either, but of tests as applied to subject areas in which (typically) things are approximately normal. Normality assumptions may be more widely applicable and more robust than assumptions about base rates, but they are nonetheless features of the subject matter rather than features of the test itself. Type II error rates are even more obviously features of the test relative to a subject matter, because they are typically calculated for a particular alternative hypothesis that is taken to be plausible or relevant to the case at hand.
A frequentist could respond simply by conceding the point: PPV is an operating characteristic of a test that is relevant to whether one can conclude that the null is false on the basis of a positive result. To do so, however, would be to abandon the severity requirement and to move closer to the Bayesian camp.
The example given above uses the same kind of reasoning that John Ioannidis uses in his paper “Why Most Published Research Findings are False.” It might be useful to move next to that paper and the responses it received.
Before moving on, I'd like to note a couple other interesting other moves Giere makes in his paper. First, he characterizes Bayesianism and error statistics as extensions of the rival research programs that Carnap and Reichenbach were developing around 1950, but without those programs' foundationalist ambitions. Second, Giere emphasizes a point that I think is very important: Bayesianism (as it is typically understood in the philosophy of science) is concerned with the probability that propositions are true. It is not concerned (at least in the first instance) with how close to the truth any false propositions may be. Yet, in many (if not all) scientific applications, the truth is not an attainable goal. Even staunch scientific realists admit that our best scientific theories are very probably false. Where they differ from anti-realists is that they claim that our best theories are close to and/or tending toward the truth. One might think that the emphasis in real scientific cases on approximate truth rather than probable truth favors error statistics over Bayesianism. However, when one moves into real scientific cases one should also move into real Bayesian methods, which include Bayesian methods of model building, which are Bayesian in that involve conditioning on priors but are not like the Bayesian methods that philosophers tend to focus on because they aim to produce models that are approximately true rather than models that have a high posterior probability. Unlike Bayesian philosophers, perhaps, Bayesian statisticians have develop a variety of methods that can handle the notion of approximate truth just as well as error-statistical methods.
Before moving on, I'd like to note a couple other interesting other moves Giere makes in his paper. First, he characterizes Bayesianism and error statistics as extensions of the rival research programs that Carnap and Reichenbach were developing around 1950, but without those programs' foundationalist ambitions. Second, Giere emphasizes a point that I think is very important: Bayesianism (as it is typically understood in the philosophy of science) is concerned with the probability that propositions are true. It is not concerned (at least in the first instance) with how close to the truth any false propositions may be. Yet, in many (if not all) scientific applications, the truth is not an attainable goal. Even staunch scientific realists admit that our best scientific theories are very probably false. Where they differ from anti-realists is that they claim that our best theories are close to and/or tending toward the truth. One might think that the emphasis in real scientific cases on approximate truth rather than probable truth favors error statistics over Bayesianism. However, when one moves into real scientific cases one should also move into real Bayesian methods, which include Bayesian methods of model building, which are Bayesian in that involve conditioning on priors but are not like the Bayesian methods that philosophers tend to focus on because they aim to produce models that are approximately true rather than models that have a high posterior probability. Unlike Bayesian philosophers, perhaps, Bayesian statisticians have develop a variety of methods that can handle the notion of approximate truth just as well as error-statistical methods.
Subscribe to:
Posts (Atom)