IPT Journal - "A Paradigm Shift for Expert Witnesses"

A Paradigm Shift for Expert Witnesses

Ralph Underwager and Hollida Wakefield^*

ABSTRACT: The recent United States Supreme Court decision in Daubert v. Merrell Dow Pharmaceuticals dramatically changes the criteria by which scientific testimony is admitted as evidence in court. The unanimous ruling states that the criterion of the scientific status of a theory is its falsifiability, refutability, or testability. This, in effect, replaces the Frye test with the Popperian principle of falsification as the determinant of scientific knowledge. If properly understood and followed this ruling is likely to render inadmissible testimony based on such concepts and theories as the child sexual abuse accommodation syndrome and claims that childhood sexual abuse has been repressed.

A nation's justice system determines factual issues that cannot be settled in any other forum. In this search for truth the courts have, over the centuries, developed policies and methods for evaluating arguments and evidence. These methods and policies permit the court to deal with human events within a social, political, and historical context. While neither perfect, nor totally reliable, the justice system has the virtue of being highly flexible, thus allowing the legal professionals who function within it to interpret a great deal of evidence and resolve a wide variety of questions. Consequently, no other profession has thought as much about the nature of evidence and the inferences that properly may be drawn from that evidence.

In recent years, the courts have both accepted and invited greater participation in the justice system by scientists. A special category of evidence, opinion as contrasted with fact, has been developed which permits scientists to offer, as evidence, an opinion. However, across the years, the courts have been frequently troubled with resolving the question of how to determine whether or not an opinion, in a specific instance, has sufficient validity and reliability to be accepted as evidence. Although frequently criticized and understood to be unsatisfactory (e.g. Giannelli, 1980; Imwinkelried, 1992; McCord, 1986) the Frye rule, since its adoption in 1923, has been used by the courts to determine the evidentiary value and therefore, admissibility, of an expert opinion.¹

Understanding how science works is the continuing effort of cliometric metatheoreticians (Meehl, 1992b). This understanding is key to the effective use of science as an adjunct to the truth-seeking process of the courts. Kuhn (1962) introduced a new philosophy of science predicated upon the conceptual scheme of scientific revolutions. This view holds that within a field of science, paradigm shifts occur periodically in which some totally new and different model suddenly emerges and becomes dominant. Before this shift, scientists continue to work out the implications of the prevailing model-what Kuhn calls "normal science." When a scientific revolution occurs and an entirely new model takes over-as when quantum physics superseded Newtonian physics-new questions and problems emerge, new discoveries are made and new techniques are developed. It is just such a paradigm shift-the principle of falsification proposed by Popper (1959)-that forms the basis for a new set of criteria to be applied to decisions about admissibility of scientific expert testimony. The consequence is an analogous revolution of science in the courtroom.

Daubert v. Merrell Dow Pharmaceuticals

A unanimous U.S. Supreme Court decision issued on June 28, 1993 dramatically changes the expectations and roles of scientists who provide testimony as expert witnesses. The Frye rule is now superseded by this decision. The Court's ruling in Daubert v. Merrell Dow Pharmaceuticals² effectively establishes, as the law of the land, the Popperian principle of falsification as the determinant of scientific knowledge.

The initial reactions of the legal profession to the court's decision suggest that the full impact of this decision has not yet been perceived. Both sides in the original dispute viewed the decision as a victory. Commentators do not see that much will change although they expect more litigation over the admissibility of scientific testimony (Reuben, 1993). Such an underestimation of impact is not surprising since it would be unrealistic to expect attorneys and judges to be knowledgeable about the philosophy of science.

In one form or another, most states have adopted the Federal Rules of Evidence as the guide for determining the admissibility of evidence within court proceedings. Consequently, this Supreme Court ruling will have an effect far beyond federal trials. While it may take several years to fully integrate Daubert into case law and precedent, this decision will eventually change, markedly, what is admitted as scientific expert evidence in all trials. The result will be nothing less than a revolutionary paradigm shift that replaces a naive logical positivism with the contemporary understanding of the nature of science. The decision sets forth this momentous shift in this paragraph:

Ordinarily, a key question to be answered in determining whether a theory or technique is scientific knowledge that will assist the trier of fact will be whether it can be (and has been) tested. "Scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry." Green at 645 (1992). See also C. Hempel, Philosophy of Natural Science () 49 (1966) ("[T]he statements constituting a scientific explanation must be capable of empirical test"); K. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge () 37 (5th ed. 1989) ("[T]he criterion of the scientific status of a theory is its falsifiability, or refutability, or testability") (Daubert, pp. 12-13).

Falsifiability thus specifically replaces the Frye test as the arbiter of what kind of scientific expert witness is admissible. The opinion is based on the judgment that the Federal Rules of Evidence supersede the Frye test. Rule 402 is cited as establishing the baseline for what constitutes admissible evidence:

All relevant evidence is admissible, except as otherwise provided by the Constitution of the United States, by Act of Congress, by these rules, or by other rules prescribed by the Supreme Court pursuant to statutory authority. Evidence which is not relevant is not admissible (Daubert, p. 6).

The Court then specifically applies Rule 702 in its determination of the admissibility of scientific evidence in the resolution of a disputed issue:

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise (Daubert, p. 8).

The Court then states that the trial judge must initially determine that an expert is proposing to testify regarding ". . . (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact in issue" (Daubert, p. 12). While the majority opinion expresses confidence that trial judges have the capacity to determine this question, the partial dissent by Chief Justice Rehnquist concludes:

I defer to no one in my confidence in federal judges; but I am at a loss to know what is meant when it is said that the scientific status of a theory depends on its "falsifiability," and I suspect some of them will be, too (p. 3-4).

In this instance, Rehnquist is almost assuredly correct. Judges will neither readily understand this principle nor consistently apply it without training and education. Neither judges nor attorneys can acquire an adequate understanding of this paradigm shift in a few hours at a conference seminar. Unfortunately, many psychologists, themselves, have little or no understanding of falsifiability. This may be even more true of psychiatrists and social workers, few of whom will understand this principle and how it changes the rules of the game. Proper and effective assimilation of this revolutionizing construct into the justice system will require a major educational and training effort for all the players involved-scientists, attorneys, and judges.

Consequently, scientists must engage in an educational program to support understanding of this shift and what it means practically in the real world of the courtroom. One such implication is that a major role of scientific testimony may be to proffer an opinion as to the lack of credible scientific data to support a specific claim (Meehl, 1989). To do this properly, the scientist must have a clear understanding of the current state of philosophy of science and be able to communicate this understanding to lay audiences.

Falsifiability

To understand the concept of falsifiability one must recognize the difference between two different meanings of the word "falsifiable" (Popper, 1992, p. xxii). In the first meaning, falsifiable is a logical-technical term related to the criterion of falsifiability. This means that the theory or concept in question must be capable of being falsified-that is, be precise and specific enough to have something count against it. In the second meaning, falsifiable is used in the sense that the theory or concept in question can be "definitively or conclusively or demonstrably falsified (demonstrably falsifiable)."

The first sense refers only to the potential logical possibility of a theory or concept being falsified in principle. The second sense includes a persuasive practical demonstration using scientific procedures to produce a proof of falsity. That is, do credible research studies produce results that mean the theory cannot be maintained? There has been some discussion suggesting that final empirical proofs of falsity cannot be obtained but Popper maintains the potential uncertainty of falsity should not be taken too seriously (Popper, 1992). Thus the principle of falsification provides two basic ways in which a judge may determine that proffered expert testimony does not meet the criterion of falsifiability. The first is whether or not it is, in principle, falsifiable and the second is whether or not it has been falsified.

The practice of blood letting provides an example from which an understanding of the different meanings of these two uses of "falsifiable" can be promoted. For about two hundred years, physicians believed that drawing off blood cured disease. The standard treatment was to open a vein and draw a basin of blood. George Washington died from this treatment. For centuries physicians reasoned that a person who was bled and lived was saved by the procedure. Conversely, a person who died was considered to have been so sick that nothing would have helped anyway. With this line of reasoning, it is impossible to falsify bleeding as a specific treatment for disease. This represents the first meaning of term "falsifiable." Subsequent research predicated on germ theory and other medical advances eventually proved blood letting to be a false cure, thus showing the theory was demonstrably falsifiable-the second meaning of the term.

Sir Karl Popper, who advanced the concept of falsifiability, and is specifically quoted in the Daubert decision, demonstrates at great length in his two volumes, The Open Society and Its Enemies, that Freudian theory, like Marxism, is unfalsifiable. That Freudian theory has so pervaded psychological and psychiatric theory and thought during this century, illustrates the potential of the Daubert decision to fundamentally change what had been traditionally accepted with a field. Popper's critique of Freud draws heavily on the first meaning of the criterion of falsifiability. Freudian theory uses a convoluted conceptual structure that explains all human behavior after the fact. Adherents to Freudian psychoanalytic theory offer authoritative sounding explanations for all human behaviors, from individual quirks and slips of the tongue to large-scale social phenomena such as religion. It is unscientific because nothing can count against it. There is no point at which it is subject to falsification. This type of theory attempts to provide a post hoc explanation for every possible event, however, it is incapable of predicting any particular event. A successful scientific theory, by contrast, predicts outcomes from a discrete set of events with an ascertainable degree of reliability

Although Freudian concepts may serve a purpose in psychotherapy, its lack of falsifiability means that psychiatric testimony based on Freudian psychoanalytic concepts should now be inadmissible as scientific evidence. It would hardly be possible for a judge — who must follow this Supreme Court decision — to rule otherwise when the Daubert decision specifically quotes Popper and holds that falsifiability is the determinant of what is scientific, and when that same judge is also confronted with Popper's assessment of Freudian dynamic thought as a premier example of failure to meet the criterion of falsification.

With some variations, American psychiatry is, by and large, Freudian in its orientation. The theories and concepts used by psychiatrists postulate all manner of internal processes and hidden events that cannot be falsified because they are not testable. It is also the case that wherever Freudian theory has been subjected to empirical tests, it has either failed or, at best, been inconclusive as a predictor of human behavior. It could be that the only material about which a Freudian psychiatrist will be permitted to testify is the organic assumptions of chemical imbalance as the origin and cause of emotional distress, and the subject matter of neurology. The use of the American Psychiatric Association's nosology, DSM-III-R (American Psychiatric Association, 1987) may also be questioned on the grounds of its lack of reliability and testability (Kirk & Kutchins, 1992).

Grand theories that are so global, indeterminate, and lacking in precision that they can be used to explain everything and anything are ruled out as science by the principle of falsification. Theories and concepts that are developed to provide emotional security and are not meant to be changed or subjected to empirical evaluation, however comforting they may be, are not scientific and cannot be offered to the courts as such.

There is no shortage of these fuzzy, grand, cosmic explanatory systems. In an era when we are, supposedly, at least somewhat sophisticated about science, all manner of false knowledge abounds. The astrology column's daily horoscope is frequently described as the most read section of the paper. Otherwise intelligent people firmly believe in UFO alien abductions, conspiracy theories, satanic ritual abuse in day care centers, extrasensory perception, and parapsychology. Highly questionable taxonomic entities such as Multiple Personality Disorder-which is most likely a phenomenon generated by the behaviors of the therapist (Aldridge-Morris, 1989; Fahy, 1988; Orne, Dinges, & Orne, 1984; Spanos, Weekes, & Bertrand, 1985; Weissberg, 1993)-are accepted in the official nomenclature. Thought is cheap and any verbally fluent and somewhat intelligent person can come up with an unfalsifiable theory of the universe, a complex, yet untestable, description of human motivations that are plausible sounding formulations used to form postdictions or totally false accounts of prior events.

On the other hand, science is hard. It is imperfect. As Popper says, it is not an episteme, a way of knowing. Real scientists laboriously count and recount, reviewing mounds of data sheets. They do not proffer speculation because they are aware that other, junior scientists will always be collecting more data, looking to challenge the accepted wisdom. Actually producing the theories that serve as scientific explanations of observed phenomena is a difficult task. Nevertheless, the general public can be helped to understand the purpose, principles and methodology of science. It is not necessary to be a working scientist to know what science is.

Attainment of this goal of public education, however, confronts and challenges the tendency of humans to seek confirmation and support of beliefs already held and to ignore or not perceive disconfirming evidence (Arkes & Harkness, 1980; Dawes, 1992; West, 1992). In an experiment comparing physical scientists, psychologists, and fundamentalist clergymen on capacity to use falsification in reasoning, Mahoney and DeMonbreun (1977) report no differences between the groups in reasoning ability and in the generation of confirmatory rather than disconfirmatory experiences to test their hypotheses: "Both scientists and nonscientists showed marked tendencies to confirm rather than disconfirm their hypotheses and, contrary to the popular image, they did not differ in this respect" (p. 236). These findings suggest that an apparent human bias must be overcome for falsification to be understood and used.

The findings about confirmatory bias further suggest that the choice to approach a determination of scientific status by the principle of falsification must be self- consciously and structurally maintained. Even being trained as a scientist does not overcome this confirmatory bias nor guarantee straight thinking (Meehl, 1992b). Therefore, if the decision of the Supreme Court is to be carefully and adequately integrated into American jurisprudence there must be an immediate effort to train judges and attorneys to understand these issues in the philosophy of science and to assimilate carefully, step-by-step, the import of this ruling into case law. Failure in this endeavor could be chaotic and result in a disservice to all parties involved in the resolution of disputes and application of criminal justice.

Expert Testimony in Cases of Alleged Child Sexual Abuse

An area where the principle of falsifiability, testability, and degree of error may have a major impact is in the kind of testimony frequently offered by the prosecution in child sexual abuse cases. In case after case, this testimony presents as science, conjecture and speculation of a nature that any claim proves, supports, or is "consistent with" abuse, and consequently is not falsifiable.

The nature of such testimony-as unfalsifiable-becomes clear when the observations used to support allegations of abuse are delineated. Abuse is supported when:

The child make any statement which may be broadly interpreted to imply abuse. Since children cannot lie about abuse and cannot talk about things they have not directly experienced, all such statements must be taken at face value, regardless of their face validity or incredibility.

The child initially denies abuse but later discloses after "disclosure therapy." The child will initially deny because the child needs time to overcome shame and embarrassment, develop a trusting relationship with the therapist, and then feel safe enough to disclose the abuse. In the interval, it is perfectly proper to use leading and suggestive questioning, coercion, persuasion, and any other methods of social influence, to assist the child in disclosing the terrible secret.

The child initially acknowledges some abuse, but then recants or retracts earlier statements. Though recanted or retracted, the abuse allegation is still true because the child is under pressure from the perpetrator or the family to recant and is scared. Recantation is described as typical of children who have been abused and, therefore, cannot disprove abuse.

The child "discloses" abuse years after the period encompassed by the allegation. Even given demonstrable adult social influence triggering the disclosure, the assumption is that it is typical of abused children that they will delay disclosure.

The child "discloses" abuse immediately after the period encompassed by the allegation. The child is viewed as overwhelmed by the abuse, or is too fearful to maintain the secret, or has found him/herself in a "safe" situation within which to disclose the abuse.

The medical examination finds genital "trauma" indicative of sexual abuse. Medical findings of genital "trauma" may or may not take into consideration the baseline rates of similar genital findings in non-abused children.

The medical examination finds no genital "trauma" which, given the frequency of fondling and non-penetrative sexual contact as the predominate act of sexual abuse, is consistent with the child having been abused.

The child is calm and cooperative during a genital examination. Such a child is viewed as experienced in having the genitals touched and examined by virtue of the abusive experience (i.e., has been desensitized to genital contact).

The child struggles and resists the genital examination, or becomes emotionally distraught by the exam. Such a child has previously been traumatized by sexual abuse leading to reluctance to having anyone else touch or examine the genitals.

In the absence of physical findings, the adult accompanying the child to the medical exam, informs the physician of the abuse history. The physician, though finding no physical evidence of sexual abuse, renders the conclusion, "abuse by history," as a result of the report of abuse by the accompanying adult.

The same inability to falsify the claim of abuse may found in the interpretation of the behavior of the person accused. Thus abuse is supported when:

The accused proclaims innocence. Such individuals are in denial regarding the abuse.

	The accused passes the polygraph or penile plethysmograph. Such individuals are in even greater denial regarding the abuse.
	The accused shows little or no emotion when being confronted or questioned about the abuse. The accused is thus either in denial, is sociopathic or is a master manipulator and has little concern for society's mores and values.
	The accused becomes emotional or tearful when confronted or questioned about the abuse. The accused is overwhelmed by guilt and disgust over his behavior.
	The accused becomes angry or defiant when confronted or questioned about abuse. The accused is defending against his actions by projecting blame and responsibility onto others.
	The accused requests to speak to an attorney before questioning. If the accused had nothing to hide, an attorney would not be needed.
	The accused cooperates with the interrogator and, in attempting to identify the source of the abuse accusation, talks about innocuous or ambiguous behavior. The statements of the accused are viewed as confessions as he is struggling to admit the true nature of the abusive acts.

As this sampling of typical testimony illustrates, almost any circumstance, behavior or observation can be rationalized as supporting the conclusion that sexual abuse occurred. What makes such testimony, and its underlying theory, not falsifiable is the fact that there is no circumstance, behavior, or observation which could be used to conclude that abuse did not occur. Consequently, there are no circumstances under which one could endeavor to prove the underlying theory false.

The widely disseminated lists of behavioral indicators, many of which are contradictory, are frequently offered as evidence to support the accusation of abuse. In testimony, all manner of behaviors are declared to be typical of abused children, all absent scientific evidence to support the claims. Depression or mania, hyperactivity or hypoactivity, social aggression or social withdrawal, heightened modesty or no modesty, poor hygiene or excessive concern about cleanliness, overly compliant or oppositional all are offered as evidence of abuse. This is done in spite of the fact that reasoning backward in time from observed symptoms to some prior entity or event is to commit the logical error of affirming the consequence. There are no behavioral indicators, including the absence of any problem behaviors, that can falsify abuse.

Consequently, in "disclosure therapy" a child may draw scribbled shapes on paper and use black or red crayons. These observations may support a therapist's testimony that these are the colors used by children who have been sexually abused and that abused children typically include phallic shapes in their drawings-despite the fact that most children's drawing include elongated shapes which might be interpreted, by one so inclined, as a phallic symbol. In hundreds of cases reviewed by the authors, there has never been an instance of a child's drawings being interpreted as supporting the absence of abuse.

To illustrate the extent of the rationalizations which must be offered in support of an abuse conclusion in the absence of evidence-and thus to typify a nonfalsifiable, unscientific theory-a young girl suspected of having been abused drew a picture of herself and her sister with their father. All were smiling and the two girls' arms were raised in the air. The child's description of the behavior reflected in the drawing was that they were cheering at a game. However, the therapist testified, in spite of the child's explanation, that the upraised arms meant the girls were crying for help. She interpreted this as indicating that the child had, in fact, been abused, despite the fact that the child denied abuse.

Testimony of this sort is, in the truest sense, quixotic communication. Don Quixote provides an example of a theory that cannot be disconfirmed. Don Quixote, when confronted by Sancho Panza with disconfirming evidence about Mambrino's helmet, transforms the contradiction into a verification (Lee, 1987). To Don Quixote the helmet is miraculous while to Sancho Panza it is an ordinary barber's basin. Don Quixote reasons to himself as follows:

Mambrino's helmet, that object of immense value, appear[s] to everyone a barber's basin, thus protecting its owner from persecution by all those who would understand its true meaning (Lee, p. 555).

By this line of reasoning all contrary assertions are transformed by mental alchemy into evidence that supports Don Quixote's view and thus nothing appears unexplained, troublesome, or puzzling. Everything proves the reasoner right and so the subjective experience is to make stronger and more certain, the erroneous belief.

When the principle of falsification is properly applied to the courtroom, this kind of testimony should be declared inadmissible and not helpful to the finder of fact. It cannot assist in arriving at the most accurate decision possible. It can only increase the level and magnitude of the error generated.

Syndrome Evidence

There are a number of psychological syndromes about which experts testify in a wide variety of civil and criminal litigation which may not be testable or falsifiable when subjected to the analysis now required by Daubert. Myers (1993) notes that both diseases and syndromes share the medically and forensically important feature of diagnostic value. Both point, with varying degrees of certainty, to particular causes. However, whereas the relationship between symptoms and etiology is clear with many diseases, this relationship is often unclear or unknown with respect to syndromes. The certainty with which a syndrome points to a particular cause varies with the syndrome.

Myers (1993) discusses the difference between two syndromes often offered in expert testimony in cases of alleged child abuse. The battered child syndrome has high certainty since a child with the symptoms of the syndrome is very likely to have suffered nonaccidental injury. In this syndrome, research evidence has accumulated which demonstrates that nonaccidental injuries can be successfully discriminated from accidental injuries by the nature of the injuries. The predictions from this theory, therefore, meet the criterion of falsifiability of the Daubert decision and consequently, evidence regarding this syndrome has high probative value and, in fact, has been approved by every appellate court to consider it.

This may be contrasted with the child sexual abuse accommodation syndrome (CSAAS) which does not point with any certainty to sexual abuse. The fact that a child shows behaviors of the CSAAS does not help determine whether the child was sexually abused since observation of those behaviors does not allow one to reliably discriminate the child who has been abused from a child who has not-both may share similar symptoms. The CSAAS is a nondiagnostic syndrome.

Despite the lack of probative value of the CSAAS, it has been frequently offered by prosecutors as substantive evidence of sexual abuse. However, it does not meet the test of falsifiability when used to support abuse since there is nothing that can count against it. Therefore, Daubert should lead to the judicial decision that use of the CSAAS is inadmissible.

Repression and Claims of Recovered Memories

A careful application of the Daubert decision within the justice system may well also result in expert testimony supporting claims of recovered repressed memories of childhood abuse being declared inadmissible. This type of case provides an example of the second meaning of the criterion of falsifiability. Repression, a Freudian theoretical concept, has been falsified (Bower, 1990; Garry & Loftus, 1993; Holmes, 1990; Wakefield & Underwager, 1992). Although proponents of recovered repressed memories offer three studies to support a claim of repression (Briere & Conte, 1989; Herman & Schatzow, 1987; Williams, 1992), none of them really assess repression nor do any of them provide any credible scientific evidence.

Faced with the massive weight of over 60 years of research that falsifies the concept of repression, a reasonable judge must rule that testimony based upon the concept is not scientific, cannot be relevant or helpful to the finder of fact, and therefore, it is not admissible. Such rulings would make it impossible for civil suits seeking monetary damages based upon a claim of recovered repressed memories to be pursued. It should result in the repeal of laws that have been passed in a number of states that essentially permit legal actions based on claims of recovered repressed memory whenever abuse is remembered or subsequent to the alleged victim claiming to have recognized that they were damaged by alleged childhood abuse. Any criminal convictions based upon evidence or testimony that derives from a claim of a recovered repressed memory should be reversed and remanded for a new trial or dismissed.

Issues in Psychology as a Science

Popperian falsifiability makes psychology much more vulnerable to the impact of data falsifying its theories than, for example, physics. Meehl (1967, 1978) argues that the dominant method of assessing research in psychology, significance testing, is nothing more than statistical games and that such tests are very soft and weak measures. Meehl (1978) maintains that "the null hypothesis, taken literally, is always false" (p. 822). Furthermore, when one study may produce a .50 correlation and another study a .34 level, although the difference is statistically significant, it may make very little difference in the real world (Meehl, 1978). Therefore, support for theories subjected to statistical significance testing is weak while the meaning of a falsification is much more decisive (Bowers, 1977; Dar, 1987).

Schmidt (1992) describes the great difficulty in getting psychologists to move away from significance testing to a more useful and accurate approach. He asserts that significance testing has lead to serious errors in interpreting the meaning of data and retarded the accumulation of knowledge. Meehl's basic criticism is that most journal articles reviewing an area of psychology consist of a listing of all the relevant studies located within some set of specified parameters and then counting those for and those against a particular proposition (Hedges & Olkin, 1980). This nose counting is preposterous, according to Meehl (1978), because by the principle of falsification a single refutation is far more powerful than multitudinous corroborations:

(T)he whole idea of simply counting noses is wrong because a theory that has seven facts for it and three facts against it is not in good shape, and it would not be considered so in any developed science (p. 823).

This argument is essential in dealing with the courtroom advocates and judges who may be influenced by a nose counting or box score approach. That professionals are so easily influenced by this type of thinking may be due to the fact that most relatively bright and competent people have learned in school that science proceeds by the inductive method of making observations, testing them, and accepting the results as proof of the hypothesis. This naive view of science is no longer even partially correct as a description of science. The conceptual advance in the philosophy of science represented by Popperian thought in the principle of falsification simply has not caught on with the society.

Psychology is responding to these criticisms by developing the methods of meta-analysis and effect size. Meta-analysis (Meehl, 1992a; Mullen et al.,1985) has demonstrated the weakness of individual research studies and pointed out the problems in sampling error, measurement error, and other artifacts in individual studies. Contrary to popular belief no single study can resolve an issue or answer a question. "Only meta-analytic analysis across studies can control chance and other statistical and measurement artifacts and provide a trustworthy foundation for conclusions" (Schmidt, 1992, pp. 1179-1180). Effect size is a method of assessing the probability that an investigation will lead to statistically significant results. Whereas meta-analysis has been accepted and is being more widely used, effect size has been relatively ignored in the conduct of psychological research (Cohen, 1992: Strube, 1985). Both of these approaches to the interpretation of data will be necessary to assess the impact of the Supreme Court's decision on the admissibility of expert testimony based on the science of psychology.

When attempting to deal with scientific testimony that is counter to a proposition advanced in the courtroom, attorneys may ask if there are other studies that contradict that testimony. When the accurate answer is that there are contradictory studies, the questioning stops and the advocate thinks the witness has been impeached and the impact of the scientific testimony lessened or removed. When a judge, who may have a naive view of scientific research, is confronted with opposing experts, the judge may come to view the process as an undesirable battle of the experts rather than as a fact finding process. The result may be that relevant scientific testimony is not admitted since the judge may assume that all studies are on the same footing and have equal validity and reliability. Then one simply adds up the box score to see which side has the greatest number of corroborative studies and that is the winner. This approach to science ignores the existence of both good and bad research, a fact every scientist is aware of. All studies are not equal nor does bad science meet the minimal qualifications of research design, execution and replication. The box score approach ignores the greater significance that must be given to falsification.

The Supreme Court decision in Daubert remedies this misunderstanding and makes the box score approach unacceptable. No matter how many corroborative studies there are, a single instance of a well-done and credible study that falsifies the theory or concept or technique, may outweigh them all and make the corroborative material inadmissible.

A review of all the briefs, motions, and amicus curiae briefs submitted to the court in this case strongly suggests that the decision is primarily based on the amicus brief submitted by the Carnegie Commission on Science, Technology, and Government as Amicus Curiae in Support of Neither Party (Berger, Gallagher, & Esty, 1992). This is the only brief that contains reference to the principle of falsifiability, testability, and replication. It is the suggested framework of this brief as a replacement for the Frye Rule that is adopted by the Supreme Court in its decision.

This brief also suggests the inclusion of the degree of error as a factor for judges to consider when evaluating the scientific quality of claims for admission as testimony. The brief advises the court that a judge may have to consider study design, data collection, or error rate to determine whether the methodology used was so skewed as to justify exclusion. The decision includes the language:

Additionally, in the case of a particular scientific technique, the court ordinarily should consider the known or potential rate of error . . . and the existence and maintenance of standards controlling the technique's operation (Berger, Gallagher, & Esty, 1992 p. 14-15).

For example, if this factor is considered by judges and applied in a rational manner, much of the testimony based on a medical examination for evaluation of a child sexual abuse allegation will be inadmissible. The best scientific evidence that gives an indication of the potential error rate for medical examinations concludes that the error rate when a physician claims a medical examination supports penile penetration is 63%, for digital penetration 73%, and for a general conclusion of abuse over 70% (Paradise, 1989; Zeitlin, 1987). Such a high error rate, always in the direction of false positives, can only confuse the entire process.

The Supreme Court's inclusion of error rate as a factor in assessing the admissibility of evidence opens the door to the scientific analyses of the error rates of the entire system of child protection, law enforcement, and the justice system in dealing with allegations of child sexual abuse. Any decision-making structure built on error is liable to produce error at an indeterminate, unrecognized, but significant level that causes harm (Gambrill, 1990). Evaluation of the extent of error and thus the potential for harm may best be done by the application of Bayes Theorem (Fischhoff & Beyth-Marom, 1983). Here, across at least 26 years, every scientist who has analyzed the error rate of the decision making process has concluded that the error is always in the direction of an unacceptable level of false positives. The lowest ratio is 3 false positives to every true positive while the highest is an astonishing 2000 to 1 (Horner, 1992).

The Bayes approach rests upon the principle that the degrees of belief in an ideally rational person conform to the mathematical principles of probability theory (Horwich, 1982). This concept should be acceptable to a scientifically trained person or to anyone that respects science. A number of scientists have applied Bayesian inference to child sexual abuse in the interest of assessing the level of error and type of error produced by the system. Every Bayesian analysis of the decisions made by the child abuse system that we have found concludes that the most probable and most frequent type of error is false positive, that is, identifying an individual as abused or an abuser when it is not true (e.g., Altemeier, O'Connor, Vietze, Sandler, & Sherrod, 1984; Caldwell, Bogat, & Davidson, 1988; Gambrill, 1990; Horner, 1992; Horner & Guyer, 1991a, 1991b; Kotelchuck, 1982; Milner, Gold, Ayoub, & Jacewitz, 1984; Paradise, 1989; Realmuto, Jensen, & Wescoe, 1990; Starr, 1979; Wakefield & Underwager, 1988; Zeitlin, 1987). This is true even when a 95% accuracy level for the decision making is assumed as Gambrill (1990) does. Starr (1979) assumes a procedure that is 83% accurate in correctly identifying abusive situations. Still he reports a ratio of 20 false positives to one true positive.

Lindsay and Read (1993) apply Bayesian inference to the issue of recovered repressed memories of childhood sexual abuse and report that, even using the most extreme numbers suggested by the proponents of recovered repressed memories and assuming an unrealistic accuracy of diagnosis of 90%, one-third of the decisions that the memories are accurate are going to be false positives. Going to 80% accuracy, still wildly unrealistic for any real world diagnosis, means 56% of the diagnosis of repressed memories would be wrong.

This information on the error rates is established by the Supreme Court as a vital factor in assessing the scientific nature of proffered testimony. It can hardly be ruled that this is not relevant to the finder of fact in weighing and assessing the evidence presented. Even if a judge should ignore the error rate of the entire system and permit testimony by social workers, mental health professionals, and law enforcement agents, the error rate of the system is now relevant to the jury in weighing that evidence.

Although we have selected psychology as the science to discuss relative to Daubert decision and have examined the effect of this decision on the evidence we are familiar with in claims of childhood sexual abuse, the same basic principles will apply to all sciences and to all trials in which scientific evidence may be a factor. The same situation will pertain in physics, biology, economics, astronomy, sociology, chemistry, and the practice of medicine.

Conclusion

The first step for any advocate and judge is to understand the nature of the criterion of falsifiability and this means understanding the philosophy of science. Understanding the implications of the Daubert decision will most likely require significant effort since this is a revolutionary change that shifts the entire enterprise into new and untried ground. The second step is applying the decision to specific cases. The process of working out the implications of this decision will likely take many years, countless cases, and a multitude of confused attorneys and irate judges. Change never comes easily and the justice system may be more resistant to change than many other institutions. But change it must.

One thing is certain. Scientists who offer expert testimony in the courtroom need to be knowledgeable and skilled in dealing with the philosophy of science and the issues raised by the establishment of the criterion of falsifiability as the determinant of science.

References

Aldridge-Morris, R. (1989). Multiple Personality: An Exercise in Deception ()(). Hillsdale: Lawrence Erlbaum Associates.

Altemeier, W. A., O'Connor, S., Sherrod, K. B., Tucker, D., & Vietze, P. (1986). Outcome of abuse during childhood among pregnant low income women. Child Abuse & Neglect, 10, 319-330.

American Psychiatric Association (1987). DSM-III-R (Diagnostic and Statistical Manual (Third Edition-Revised) ()(). Washington, DC: Author.

Arkes, H. R., & Harkness, A. R. (1980). Effects of making a diagnosis on subsequent recognition of symptoms. Journal of Experimental Psychology: Human Learning and Memory, 6, 568-575.

Berger, M. A. Gallagher, S. G., & Esty, E. H. (December 2, 1992) No, 92-102 In the Supreme Court of the United States, October term, 1992. Daubert et al. v. Merrell Dow Pharmaceuticals. Brief of the Carnegie Commission on Science, Technology, and Government as Amicus Curiae in Support of Neither Party.

Bowers, K. S. (1977). Science and the limits of logic: A response to the Mahoney-Demonbreun paper. Cognitive Therapy and Research, 1, 239-246.

Bower, G. H. (1990). Awareness, the unconscious, and repression: An experimental psychologist's perspective. In J. L. Singer (Ed.), Repression and Dissociation: Implications for Personality Theory, Psychopathology and Health () (pp.209-231). Chicago: The University of Chicago Press.

Briere, J., & Conte, J. (1989, August). Amnesia in adults molested as children: Testing theories of repression. Paper presented at the Annual Meeting of the American Psychological Association, New Orleans, LA.

Caldwell, R. A., Bogat, G. A. & Davidson, W. S. (1988). The assessment of child abuse potential and prevention of child abuse and neglect: A policy analysis. American Journal of Community Psychology, 16, 609-624.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

Dar, R. (1987). Another look at Meehl, Lakatos, and the scientific practices of psychologists. American Psychologist, 42, 145-151.

Dawes, R. M. (1992). Why believe that for which there is no good evidence? Issues in Child Abuse Accusations, 44(4), 214-218.

Fahy, T. A. (1988). The diagnosis of multiple personality disorder: A critical review. British Journal of Psychiatry, 153, 597-606.

Fischhoff, B., & Beyth-Marom, R. (1983). Hypothesis evaluation from a Bayesian perspective. Psychological Review, 90, 239-260.

Gambrill, E. (1990). Critical Thinking in Clinical Practice (). San Francisco: Jossey-Bass Publishers.

Garry, M., & Loftus, E. F. (1993, April ). Women who remember too much. Paper presented at the False Memory Syndrome Foundation conference, Valley Forge: PA.

Giannelli, P. C. (1980). The admissibility of novel scientific evidence: Frye v. United States, a half-century later. Columbia Law Review, 80, 1197-1250.

Green, (1992). Expert witnesses and sufficiency of evidence in toxic substances litigation: The legacy of Agent Orange and Bendectin litigation, 86 Nw U.L. Rev. (Cited in William Daubert, et ux., etc., et al., Petitioners v Merrell Dow Pharmaceutical, Inc., Supreme Court of the United States, No. 92-102, decided on 6/28/93.)

Hedges, L. V. & Olkin, I. (1980). Vote counting methods in research synthesis. Psychological Bulletin, 88, 359-369.

Hempel, C. (1966). Philosophy of natural science. (Cited in William Daubert, et ux., etc., et al., Petitioners v Merrell Dow Pharmaceutical, Inc., Supreme Court of the United States, No. 92-102, decided on 6/28/93.)

Herman, J. L., & Schatzow, E. (1987). Recovery and verification of memories of childhood sexual trauma. Psychoanalytic Psychology, 4(1), 1-14.

Holmes, D. S. (1990). The evidence for repression: An examination of sixty years of research. In J. L. Singer (Ed.), Repression and Dissociation: Implications for Personality Theory, Psychopathology and Health () (pp. 85-102). Chicago: The University of Chicago Press.

Horner, T. M. (1992). Expertise in regard to determinations of child sexual abuse. Unpublished manuscript.

Horner, T. M., & Guyer, M. J. (1991). Prediction, prevention, and clinical expertise in child custody cases in which allegations of child sexual abuse have been made: I. Predictable rates of diagnostic error in relation to various clinical decision making strategies. Family Law Quarterly, 25, 217-252.

Horner, T. M., & Guyer, M. J. (1991). Prediction, prevention, and clinical expertise in child custody cases in which allegations of child sexual abuse have been made: II. Prevalence rates of child sexual abuse and the precision of 'tests' constructed to diagnose it. Family Law Quarterly, 25, 381-409.

Imwinkelried, E. J. (1992). Attempts to limit the scope of the Frye standard for the admission of scientific evidence: Confronting the real cost of the general acceptance test. Behavioral Sciences and the Law, 10, 441-454.

Kirk, S. A., & Kutchins, H. (1992). The Selling of DSM: The Rhetoric of Science in Psychology ()(). New York: Aldine De Gruyter.

Kotelchuck, M. (1982). Child abuse and neglect: Prediction and misclassification. In R.H. Starr, Jr. (Ed.). Child Abuse Prediction: Policy Implications () (pp. 67-104). Cambridge, MA: Ballinger.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions ()(). Chicago: The University of Chicago Press.

Lee, A. S. (1987, June). Quixotic communication: The case of expert witness testimony. Knowledge: Creation, Diffusion, Utilization, 8(4), 549-585.

Lindsay, D. S. & Read, J. D. (1993). Psychotherapy and memories of childhood sexual abuse: A cognitive perspective. Unpublished manuscript.

Mahoney, M. J., & DeMonbreun, B. G. (1977). Psychology of the scientist: An analysis of problem-solving bias. Cognitive Therapy and Research, 1, 229-238.

McCord, D. (1987). Syndromes, profiles and other mental exotica: A new approach to the admissibility of nontraditional psychological evidence in criminal cases. Oregon Law Review, 66, 19-108.

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103-115.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834.

Meehl, P. E. (1989). Law and the fireside inductions (with postscript): some reflections of a clinical psychologist. Behavioral Sciences & the Law, 7, 521-550.

Meehl, P. E. (1992a). The miracle argument for realism: An important lesson to be learned by generalizing from carrier's counter-examples.[Monograph]. Study History, Philosophy, & Science, 23(2), 267-282.

Meehl, P. E. (1992b). Cliometric metatheory: The actuarial approach to empirical, history-based philosophy of science. Psychological Reports Monographs, 1-V71.

Milner, J. S., Gold, R. G., Ayoub, C., & Jacewitz, M. M. (1984). Predictive validity of the child abuse potential inventory. Journal of Consulting and Clinical Psychology, 52, 879-884.

Mullen, B., Atkins, J. L., Champion, D. S., Edwards, C., Hardy, D., Story, J. E., & Vanderklok, M. (1985). The false consensus effect: A meta-analysis of 115 hypothesis tests. Journal of Experimental Social Psychology, 21, 262-283.

Myers, J. E. B. (1993). Expert testimony describing psychological syndromes. Pacific Law Journal, 24, 1449-1464.

Orne, M. T., Dinges, D. F., & Orne, E. C. (1984). On differential diagnosis of multiple personality in the forensic context. International Journal of Clinical and Experimental Hypnosis, 32(2), 118-169.

Paradise, J. E. (1989). Predictive accuracy and the diagnosis of sexual abuse: A big issue about a little tissue. Child Abuse & Neglect, 13, 169-176.

Popper, K. (1959). The Logic of Scientific Discovery (). London: Hutchinson.

Popper, K. R. (1992/1956/1983). Realism and the Aim of Science (). New York: Routledge.

Popper, K. (1989). Conjectures and refutations: The growth of scientific knowledge, 5th (Cited in William Daubert, et ux., etc., et al., Petitioners v Merrell Dow Pharmaceutical, Inc., Supreme Court of the United States, No. 92-102, decided on 6/28/93.)

Popper, K. The Open Society and Its Enemies ()(). Lawrenceville, NJ: Princeton University Press.

Realmuto, G., Jensen, J., & Wescoe, S. (1990). Specificity and sensitivity of sexually anatomically correct dolls in substantiating abuse: A pilot study. Journal of the American Academy of Child Adolescent Psychiatry, 29, 743-746.

Reuben, R. C. (1993, June 29). Justices adopt new scientific evidence test. Los Angeles Daily Journal, pp. 1, 10.

Schmidt, F. L. (1992). What do data really mean? American Psychologist, 47, 1173-1181.

Spanos, N. P., Weekes, J. R., & Bertrand, L. D. (1985). Multiple personality: A social psychological perspective. Journal of Abnormal Psychology, 94, 362-376.

Starr, R. H. (1979). Child abuse. American Psychologist, 34, 872-878.

Strube, M. J. (1985). Power analysis for combining significance levels. Psychological Bulletin, 98, 595-599.

Wakefield, H., & Underwager, R. (1988). Accusations of Child Sexual Abuse ()(). Springfield, IL: CC Thomas.

Wakefield, H., & Underwager, R. (1992). Uncovering Memories of Alleged Sexual Abuse: The Therapists Who Do It. Issues in Child Abuse Accusations, 4(4), 197-213.

Weissberg, M. (1993). Multiple personality disorder and iatrogenesis: The cautionary tale of Anna O. The International Journal of Clinical and Experimental Hypnosis, 41(1), 15-32.

West, R. (1992). Assessment of evidence versus consensus or prejudice. Journal of Epidemiology & Community Health, 46(4), 321-322.

Williams, L. M. (1992). Adult memories of childhood Abuse: Preliminary findings from a longitudinal study. APCAC Advisor, Summer, pp. 10-21.

Zeitlin, H. (1987, October 10). Investigation of the sexually abused child. The Lancet, pp. 842-845.

¹ Frye v. United States, 293 F. 1013, 1014, a 1923 decision of the United States Court of Appeals for the D.C. Circuit. Under the Frye test a scientific technique is not admissible unless the technique is "generally accepted" in the scientific community. Giannelli (1980) notes that the Frye rule envisions a process by which a novel technique must pass through an "experimental" stage where it is scrutinized by the scientific community. Only after it has passed successfully through this process and has entered into the "demonstrable" stage can it be admissible. Under the Frye rule it is not enough that a qualified expert or experts believe the technique is valid and reliable, it must be generally accepted by the relevant scientific community. [Back]

² William Daubert, et ux., etc., et al., Petitioners v Merrell Dow Pharmaceutical, Inc., Supreme Court of the United States, No. 92-102, decided on 6/28/93. [Back]

^* Ralph Underwager and Hollida Wakefield are psychologists at the Institute for Psychological Therapies, 5263 130th Street East, Northfield, MN 55057-4880. [Back]

(If you came here from the Library, click here to return.)

A Paradigm Shift for Expert Witnesses

Ralph Underwager and Hollida Wakefield*

Ralph Underwager and Hollida Wakefield^*