outcomes in ways that favor our initial expectations. If we are interested in informally testing the effectiveness of vitamin C with the data of our own experience, it may be wise to specify in advance that “success” or “improvement” be defined as a reduction in the number of days with a cold. If not, we run the risk of reading too much into every moment’s respite from post-nasal drip or any temporary reduction in our fever-induced nagging of loved ones.
To stretch this idea a bit further (and pursue a theme introduced earlier), the methods of science protect an investigator from juggling the meaning of different results by deliberately making the investigator rigid and “unintelligent” in the same way that computers are rigid and unintelligent. Experimental results, like the input to a computer, must fall into certain pre-specified slots according to pre-specified rules or they are not processed at all. As scientists, we willingly sacrifice some “intelligence” and flexibility for the benefit of objectivity.
This is not to suggest, of course, that all of science is such a rigid, constrained process. A distinction must be made between the processes involved in generating versus testing ideas; between what philosophers of science have referred to as the “context of discovery” and the “context of justification.” In the context of discovery, “anything goes” in science as in everyday life; it is in the context of justification that scientists become more conservative. As Sir Peter Medawar has noted, science works “… in a rapid reciprocation of guesswork and checkwork, proposal and disposal, conjecture and refutation.” 12 Flashes of inspiration are followed by rigorous test. When asked on a talk show to explain the secret of his success, two-time Nobel Laureate Linus Pauling once replied that “… you need to have a lot of ideas, and then you have to throw away the bad ones.” Much of the scientific enterprise can be construed as the use of formal procedures for determining when to throw out bad ideas, a set of procedures that we might be well advised to adopt in our everyday lives. We humans seem to be extremely good at generating ideas, theories, and explanations that have the ring of plausibility. 13 We may be relatively deficient, however, in evaluating and testing our ideas once they are formed. One of the biggest impediments to doing so is our failure to realize that when we do not precisely specify the kind of evidence that will count as support for our position, we can end up “detecting” too much evidence for our preconceptions.
Another way of stating this is that our expectations can often be confirmed by any of a set of “multiple endpoints” after the fact, some of which we would not be willing to accept as criteria for success beforehand. 14 When a psychic predicts that “a famous politician will die this year,” it is important to specify then and there the range of events that will constitute a success. Otherwise, we are likely to be overly impressed by various tenuous connections between the prediction and any of a number of subsequent events. Suppose Armand Hammer dies within the year: Is that a successful prediction? (He is an industrialist rather than a politician, but he has served as this country’s ambassador-without-portfolio to Moscow for several generations.) Or suppose the President is shot in an unsuccessful assassination attempt: Does that count? Without specifying the meaning of all possible outcomes, the test is no longer objective, and we run the risk that our initial hypotheses will receive apparent support too easily.
The problem of multiple endpoints is most severe when the subject under investigation is inherently fuzzy and hard to define. For instance, suppose someone claims that day care during infancy hinders “personal adjustment” in later life. Well, what is “personal adjustment” and how does one measure it? The number of friends during