A Paradigm Shift for Expert Witnesses
Ralph Underwager and Hollida Wakefield*
ABSTRACT: The recent United States Supreme Court decision
in Daubert v. Merrell
Dow Pharmaceuticals dramatically changes the criteria
by which scientific testimony is
admitted as evidence in court. The unanimous ruling
states that the criterion of the
scientific status of a theory is its falsifiability,
refutability, or testability. This, in effect,
replaces the Frye test with the Popperian principle
of falsification as the determinant of
scientific knowledge. If properly understood and followed
this ruling is likely to render
inadmissible testimony based on such concepts and theories
as the child sexual
abuse accommodation syndrome and claims that childhood
sexual abuse has been
repressed.
A nation's justice system determines factual issues
that cannot be settled in any other
forum. In this search for truth the courts have, over
the centuries, developed policies
and methods for evaluating arguments and evidence. These
methods and policies
permit the court to deal with human events within a
social, political, and historical
context. While neither perfect, nor totally reliable,
the justice system has the virtue of
being highly flexible, thus allowing the legal professionals
who function within it to
interpret a great deal of evidence and resolve a wide
variety of questions.
Consequently, no other profession has thought as much
about the nature of evidence
and the inferences that properly may be drawn from that
evidence.
In recent years, the courts have both accepted and invited
greater participation in the
justice system by scientists. A special category of
evidence, opinion as contrasted with
fact, has been developed which permits scientists to
offer, as evidence, an opinion.
However, across the years, the courts have been frequently
troubled with resolving
the question of how to determine whether or not an opinion,
in a specific instance, has
sufficient validity and reliability to be accepted as
evidence. Although frequently
criticized and understood to be unsatisfactory (e.g.
Giannelli, 1980; Imwinkelried,
1992; McCord, 1986) the Frye rule, since its adoption
in 1923, has been used by the
courts to determine the evidentiary value and therefore,
admissibility, of an expert opinion.1
Understanding how science works is the continuing effort
of cliometric
metatheoreticians (Meehl, 1992b). This understanding
is key to the effective use of
science as an adjunct to the truth-seeking process of
the courts. Kuhn (1962)
introduced a new philosophy of science predicated upon
the conceptual scheme of
scientific revolutions. This view holds that within
a field of science, paradigm shifts
occur periodically in which some totally new and different
model suddenly emerges
and becomes dominant. Before this shift, scientists
continue to work out the
implications of the prevailing model-what Kuhn calls
"normal science." When a
scientific revolution occurs and an entirely new model
takes over-as when quantum
physics superseded Newtonian physics-new questions and
problems emerge, new
discoveries are made and new techniques are developed.
It is just such a paradigm
shift-the principle of falsification proposed by Popper
(1959)-that forms the basis for
a new set of criteria to be applied to decisions about
admissibility of scientific expert
testimony. The consequence is an analogous revolution
of science in the courtroom.
Daubert v. Merrell Dow Pharmaceuticals
A unanimous U.S. Supreme Court decision issued on June
28, 1993 dramatically
changes the expectations and roles of scientists who
provide testimony as expert
witnesses. The Frye rule is now superseded by this decision.
The Court's ruling in
Daubert v. Merrell Dow Pharmaceuticals2 effectively
establishes, as the law of the
land, the Popperian principle of falsification as the
determinant of scientific
knowledge.
The initial reactions of the legal profession to the
court's decision suggest that the full
impact of this decision has not yet been perceived.
Both sides in the original dispute
viewed the decision as a victory. Commentators do not
see that much will change
although they expect more litigation over the admissibility
of scientific testimony
(Reuben, 1993). Such an underestimation of impact is
not surprising since it would be
unrealistic to expect attorneys and judges to be knowledgeable
about the philosophy
of science.
In one form or another, most states have adopted the
Federal Rules of Evidence as
the guide for determining the admissibility of evidence
within court proceedings.
Consequently, this Supreme Court ruling will have an
effect far beyond federal trials.
While it may take several years to fully integrate Daubert
into case law and precedent,
this decision will eventually change, markedly, what
is admitted as scientific expert
evidence in all trials. The result will be nothing less
than a revolutionary paradigm shift
that replaces a naive logical positivism with the contemporary
understanding of the
nature of science. The decision sets forth this momentous
shift in this paragraph:
Ordinarily, a key question to be answered in determining
whether a theory or
technique is scientific knowledge that will assist the
trier of fact will be whether it can
be (and has been) tested. "Scientific methodology
today is based on generating
hypotheses and testing them to see if they can be falsified;
indeed, this methodology
is what distinguishes science from other fields of human
inquiry." Green at 645 (1992).
See also C. Hempel, Philosophy of Natural Science ()
49
(1966) ("[T]he statements
constituting a scientific explanation must be capable
of empirical test"); K. Popper,
Conjectures and Refutations: The Growth of Scientific
Knowledge ()
37 (5th ed. 1989)
("[T]he criterion of the scientific status of a
theory is its falsifiability, or refutability, or
testability") (Daubert, pp. 12-13).
Falsifiability thus specifically replaces the Frye test
as the arbiter of what kind of
scientific expert witness is admissible. The opinion
is based on the judgment that the
Federal Rules of Evidence supersede the Frye test. Rule
402 is cited as establishing
the baseline for what constitutes admissible evidence:
All relevant evidence is admissible, except as otherwise
provided by the Constitution
of the United States, by Act of Congress, by these rules,
or by other rules prescribed
by the Supreme Court pursuant to statutory authority.
Evidence which is not relevant is
not admissible (Daubert, p. 6).
The Court then specifically applies Rule 702 in its
determination of the admissibility of
scientific evidence in the resolution of a disputed
issue:
If scientific, technical, or other specialized knowledge
will assist the trier of fact to
understand the evidence or to determine a fact in issue,
a witness qualified as an
expert by knowledge, skill, experience, training, or
education, may testify thereto in the
form of an opinion or otherwise (Daubert, p. 8).
The Court then states that the trial judge must initially
determine that an expert is
proposing to testify regarding ". . . (1) scientific
knowledge that (2) will assist the trier of
fact to understand or determine a fact in issue" (Daubert, p. 12). While the majority
opinion expresses confidence that trial judges have
the capacity to determine this
question, the partial dissent by Chief Justice Rehnquist
concludes:
I defer to no one in my confidence in federal judges;
but I am at a loss to know what is
meant when it is said that the scientific status of
a theory depends on its "falsifiability,"
and I suspect some of them will be, too (p. 3-4).
In this instance, Rehnquist is almost assuredly correct.
Judges will neither readily
understand this principle nor consistently apply it
without training and education.
Neither judges nor attorneys can acquire an adequate
understanding of this paradigm
shift in a few hours at a conference seminar. Unfortunately,
many psychologists,
themselves, have little or no understanding of falsifiability.
This may be even more true
of psychiatrists and social workers, few of whom will
understand this principle and
how it changes the rules of the game. Proper and effective
assimilation of this
revolutionizing construct into the justice system will
require a major educational and
training effort for all the players involved-scientists,
attorneys, and judges.
Consequently, scientists must engage in an educational
program to support
understanding of this shift and what it means practically
in the real world of the
courtroom. One such implication is that a major role
of scientific testimony may be to
proffer an opinion as to the lack of credible scientific
data to support a specific claim
(Meehl, 1989). To do this properly, the scientist must
have a clear understanding of
the current state of philosophy of science and be able
to communicate this
understanding to lay audiences.
Falsifiability
To understand the concept of falsifiability one must
recognize the difference between
two different meanings of the word "falsifiable"
(Popper, 1992, p. xxii). In the first
meaning, falsifiable is a logical-technical term related
to the criterion of falsifiability.
This means that the theory or concept in question must
be capable of being
falsified-that is, be precise and specific enough to
have something count against it. In
the second meaning, falsifiable is used in the sense
that the theory or concept in
question can be "definitively or conclusively or
demonstrably falsified (demonstrably
falsifiable)."
The first sense refers only to the potential logical
possibility of a theory or concept
being falsified in principle. The second sense includes
a persuasive practical
demonstration using scientific procedures to produce
a proof of falsity. That is, do
credible research studies produce results that mean
the theory cannot be maintained?
There has been some discussion suggesting that final
empirical proofs of falsity
cannot be obtained but Popper maintains the potential
uncertainty of falsity should not
be taken too seriously (Popper, 1992). Thus the principle
of falsification provides two
basic ways in which a judge may determine that proffered
expert testimony does not
meet the criterion of falsifiability. The first is whether
or not it is, in principle, falsifiable
and the second is whether or not it has been falsified.
The practice of blood letting provides an example from
which an understanding of the
different meanings of these two uses of "falsifiable"
can be promoted. For about two
hundred years, physicians believed that drawing off
blood cured disease. The
standard treatment was to open a vein and draw a basin
of blood. George Washington
died from this treatment. For centuries physicians reasoned
that a person who was
bled and lived was saved by the procedure. Conversely,
a person who died was
considered to have been so sick that nothing would have
helped anyway. With this
line of reasoning, it is impossible to falsify bleeding
as a specific treatment for disease.
This represents the first meaning of term "falsifiable."
Subsequent research predicated
on germ theory and other medical advances eventually
proved blood letting to be a
false cure, thus showing the theory was demonstrably
falsifiable-the second
meaning of the term.
Sir Karl Popper, who advanced the concept of falsifiability,
and is specifically quoted
in the Daubert decision, demonstrates at great length
in his two volumes, The Open
Society and Its Enemies, that Freudian theory, like
Marxism, is unfalsifiable. That
Freudian theory has so pervaded psychological and psychiatric
theory and thought
during this century, illustrates the potential of the
Daubert decision to fundamentally
change what had been traditionally accepted with a field.
Popper's critique of Freud
draws heavily on the first meaning of the criterion
of falsifiability. Freudian theory uses
a convoluted conceptual structure that explains all
human behavior after the fact.
Adherents to Freudian psychoanalytic theory offer authoritative
sounding explanations
for all human behaviors, from individual quirks and
slips of the tongue to large-scale
social phenomena such as religion. It is unscientific
because nothing can count
against it. There is no point at which it is subject
to falsification. This type of theory
attempts to provide a post hoc explanation for every
possible event, however, it is
incapable of predicting any particular event. A successful
scientific theory, by contrast,
predicts outcomes from a discrete set of events with
an ascertainable degree of
reliability
Although Freudian concepts may serve a purpose in psychotherapy,
its lack of
falsifiability means that psychiatric testimony based
on Freudian psychoanalytic
concepts should now be inadmissible as scientific evidence.
It would hardly be
possible for a judge who must follow this Supreme Court decision
to rule otherwise
when the Daubert decision specifically quotes Popper
and holds that falsifiability is
the determinant of what is scientific, and when that
same judge is also confronted with
Popper's assessment of Freudian dynamic thought as a
premier example of failure to
meet the criterion of falsification.
With some variations, American psychiatry is, by and
large, Freudian in its orientation.
The theories and concepts used by psychiatrists postulate
all manner of internal
processes and hidden events that cannot be falsified
because they are not testable. It
is also the case that wherever Freudian theory has been
subjected to empirical tests, it
has either failed or, at best, been inconclusive as
a predictor of human behavior. It
could be that the only material about which a Freudian
psychiatrist will be permitted to
testify is the organic assumptions of chemical imbalance
as the origin and cause of
emotional distress, and the subject matter of neurology.
The use of the American
Psychiatric Association's nosology, DSM-III-R (American
Psychiatric Association,
1987) may also be questioned on the grounds of its lack
of reliability and testability
(Kirk & Kutchins, 1992).
Grand theories that are so global, indeterminate, and
lacking in precision that they can
be used to explain everything and anything are ruled
out as science by the principle of
falsification. Theories and concepts that are developed
to provide emotional security
and are not meant to be changed or subjected to empirical
evaluation, however
comforting they may be, are not scientific and cannot
be offered to the courts as such.
There is no shortage of these fuzzy, grand, cosmic explanatory
systems. In an era
when we are, supposedly, at least somewhat sophisticated
about science, all manner
of false knowledge abounds. The astrology column's daily
horoscope is frequently
described as the most read section of the paper. Otherwise
intelligent people firmly
believe in UFO alien abductions, conspiracy theories,
satanic ritual abuse in day care
centers, extrasensory perception, and parapsychology.
Highly questionable
taxonomic entities such as Multiple Personality Disorder-which
is most likely a
phenomenon generated by the behaviors of the therapist
(Aldridge-Morris, 1989; Fahy, 1988; Orne, Dinges, & Orne, 1984; Spanos,
Weekes, & Bertrand, 1985;
Weissberg, 1993)-are accepted in the official nomenclature.
Thought is cheap and
any verbally fluent and somewhat intelligent person
can come up with an unfalsifiable
theory of the universe, a complex, yet untestable, description
of human motivations
that are plausible sounding formulations used to form
postdictions or totally false
accounts of prior events.
On the other hand, science is hard. It is imperfect.
As Popper says, it is not an
episteme, a way of knowing. Real scientists laboriously
count and recount, reviewing
mounds of data sheets. They do not proffer speculation
because they are aware that
other, junior scientists will always be collecting more
data, looking to challenge the
accepted wisdom. Actually producing the theories that
serve as scientific explanations
of observed phenomena is a difficult task. Nevertheless,
the general public can be
helped to understand the purpose, principles and methodology
of science. It is not
necessary to be a working scientist to know what science
is.
Attainment of this goal of public education, however,
confronts and challenges the
tendency of humans to seek confirmation and support
of beliefs already held and to
ignore or not perceive disconfirming evidence (Arkes
& Harkness, 1980; Dawes,
1992; West, 1992). In an experiment comparing physical
scientists, psychologists, and
fundamentalist clergymen on capacity to use falsification
in reasoning, Mahoney and
DeMonbreun (1977) report no differences between the
groups in reasoning ability
and in the generation of confirmatory rather than disconfirmatory
experiences to test
their hypotheses: "Both scientists and nonscientists
showed marked tendencies to
confirm rather than disconfirm their hypotheses and,
contrary to the popular image,
they did not differ in this respect" (p. 236).
These findings suggest that an apparent
human bias must be overcome for falsification to be
understood and used.
The findings about confirmatory bias further suggest
that the choice to approach a
determination of scientific status by the principle
of falsification must be self-
consciously and structurally maintained. Even being
trained as a scientist does not
overcome this confirmatory bias nor guarantee straight
thinking (Meehl, 1992b).
Therefore, if the decision of the Supreme Court is to
be carefully and adequately
integrated into American jurisprudence there must be
an immediate effort to train
judges and attorneys to understand these issues in the
philosophy of science and to
assimilate carefully, step-by-step, the import of this
ruling into case law. Failure in this
endeavor could be chaotic and result in a disservice
to all parties involved in the
resolution of disputes and application of criminal justice.
Expert Testimony in Cases of Alleged Child Sexual Abuse
An area where the principle of falsifiability, testability,
and degree of error may have a
major impact is in the kind of testimony frequently
offered by the prosecution in child
sexual abuse cases. In case after case, this testimony
presents as science, conjecture
and speculation of a nature that any claim proves, supports,
or is "consistent with"
abuse, and consequently is not falsifiable.
The nature of such testimony-as unfalsifiable-becomes
clear when the
observations used to support allegations of abuse are
delineated. Abuse is supported
when:
As this sampling of typical testimony illustrates, almost
any circumstance, behavior or
observation can be rationalized as supporting the conclusion
that sexual abuse
occurred. What makes such testimony, and its underlying
theory, not falsifiable is the
fact that there is no circumstance, behavior, or observation
which could be used to
conclude that abuse did not occur. Consequently, there
are no circumstances under
which one could endeavor to prove the underlying theory
false.
The widely disseminated lists of behavioral indicators,
many of which are
contradictory, are frequently offered as evidence to
support the accusation of abuse. In
testimony, all manner of behaviors are declared to be
typical of abused children, all
absent scientific evidence to support the claims. Depression
or mania, hyperactivity or
hypoactivity, social aggression or social withdrawal,
heightened modesty or no
modesty, poor hygiene or excessive concern about cleanliness,
overly compliant or
oppositional all are offered as evidence of abuse. This
is done in spite of the fact that
reasoning backward in time from observed symptoms to
some prior entity or event is
to commit the logical error of affirming the consequence.
There are no behavioral
indicators, including the absence of any problem behaviors,
that can falsify abuse.
Consequently, in "disclosure therapy" a child
may draw scribbled shapes on paper
and use black or red crayons. These observations may
support a therapist's testimony
that these are the colors used by children who have
been sexually abused and that
abused children typically include phallic shapes in
their drawings-despite the fact
that most children's drawing include elongated shapes
which might be interpreted, by
one so inclined, as a phallic symbol. In hundreds of
cases reviewed by the authors,
there has never been an instance of a child's drawings
being interpreted as
supporting the absence of abuse.
To illustrate the extent of the rationalizations which
must be offered in support of an
abuse conclusion in the absence of evidence-and thus
to typify a nonfalsifiable,
unscientific theory-a young girl suspected of having
been abused drew a picture of
herself and her sister with their father. All were smiling
and the two girls' arms were
raised in the air. The child's description of the behavior
reflected in the drawing was
that they were cheering at a game. However, the therapist
testified, in spite of the
child's explanation, that the upraised arms meant the
girls were crying for help. She
interpreted this as indicating that the child had, in
fact, been abused, despite the fact
that the child denied abuse.
Testimony of this sort is, in the truest sense, quixotic
communication. Don Quixote
provides an example of a theory that cannot be disconfirmed.
Don Quixote, when
confronted by Sancho Panza with disconfirming evidence
about Mambrino's helmet,
transforms the contradiction into a verification (Lee,
1987). To Don Quixote the helmet
is miraculous while to Sancho Panza it is an ordinary
barber's basin. Don Quixote
reasons to himself as follows:
Mambrino's helmet, that object of immense value, appear[s]
to everyone a barber's
basin, thus protecting its owner from persecution by
all those who would understand
its true meaning (Lee, p. 555).
By this line of reasoning all contrary assertions are
transformed by mental alchemy
into evidence that supports Don Quixote's view and thus
nothing appears
unexplained, troublesome, or puzzling. Everything proves
the reasoner right and so
the subjective experience is to make stronger and more
certain, the erroneous belief.
When the principle of falsification is properly applied
to the courtroom, this kind of
testimony should be declared inadmissible and not helpful
to the finder of fact. It
cannot assist in arriving at the most accurate decision
possible. It can only increase
the level and magnitude of the error generated.
Syndrome Evidence
There are a number of psychological syndromes about
which experts testify in a wide
variety of civil and criminal litigation which may not
be testable or falsifiable when
subjected to the analysis now required by Daubert. Myers
(1993) notes that both
diseases and syndromes share the medically and forensically
important feature of
diagnostic value. Both point, with varying degrees of
certainty, to particular causes.
However, whereas the relationship between symptoms and
etiology is clear with
many diseases, this relationship is often unclear or
unknown with respect to
syndromes. The certainty with which a syndrome points
to a particular cause varies
with the syndrome.
Myers (1993) discusses the difference between two syndromes
often offered in expert
testimony in cases of alleged child abuse. The battered
child syndrome has high
certainty since a child with the symptoms of the syndrome
is very likely to have
suffered nonaccidental injury. In this syndrome, research
evidence has accumulated
which demonstrates that nonaccidental injuries can be
successfully discriminated
from accidental injuries by the nature of the injuries.
The predictions from this theory,
therefore, meet the criterion of falsifiability of the
Daubert decision and consequently,
evidence regarding this syndrome has high probative
value and, in fact, has been
approved by every appellate court to consider it.
This may be contrasted with the child sexual abuse accommodation
syndrome
(CSAAS) which does not point with any certainty to sexual
abuse. The fact that a child
shows behaviors of the CSAAS does not help determine
whether the child was
sexually abused since observation of those behaviors
does not allow one to reliably
discriminate the child who has been abused from a child
who has not-both may
share similar symptoms. The CSAAS is a nondiagnostic
syndrome.
Despite the lack of probative value of the CSAAS, it
has been frequently offered by
prosecutors as substantive evidence of sexual abuse.
However, it does not meet the
test of falsifiability when used to support abuse since
there is nothing that can count
against it. Therefore, Daubert should lead to the judicial
decision that use of the
CSAAS is inadmissible.
Repression and Claims of Recovered Memories
A careful application of the Daubert decision within
the justice system may well also
result in expert testimony supporting claims of recovered
repressed memories of
childhood abuse being declared inadmissible. This type
of case provides an example
of the second meaning of the criterion of falsifiability.
Repression, a Freudian
theoretical concept, has been falsified (Bower, 1990;
Garry & Loftus, 1993; Holmes,
1990; Wakefield & Underwager, 1992). Although proponents
of recovered repressed
memories offer three studies to support a claim of repression
(Briere & Conte, 1989;
Herman & Schatzow, 1987; Williams, 1992), none of
them really assess repression
nor do any of them provide any credible scientific evidence.
Faced with the massive weight of over 60 years of research
that falsifies the concept of
repression, a reasonable judge must rule that testimony
based upon the concept is
not scientific, cannot be relevant or helpful to the
finder of fact, and therefore, it is not
admissible. Such rulings would make it impossible for
civil suits seeking monetary
damages based upon a claim of recovered repressed memories
to be pursued. It
should result in the repeal of laws that have been passed
in a number of states that
essentially permit legal actions based on claims of
recovered repressed memory
whenever abuse is remembered or subsequent to the alleged
victim claiming to have
recognized that they were damaged by alleged childhood
abuse. Any criminal
convictions based upon evidence or testimony that derives
from a claim of a
recovered repressed memory should be reversed and remanded
for a new trial or
dismissed.
Issues in Psychology as a Science
Popperian falsifiability makes psychology much more
vulnerable to the impact of data
falsifying its theories than, for example, physics.
Meehl (1967, 1978) argues that the
dominant method of assessing research in psychology,
significance testing, is nothing
more than statistical games and that such tests are
very soft and weak measures.
Meehl (1978) maintains that "the null hypothesis,
taken literally, is always false" (p.
822). Furthermore, when one study may produce a .50
correlation and another study a
.34 level, although the difference is statistically
significant, it may make very little
difference in the real world (Meehl, 1978). Therefore,
support for theories subjected to
statistical significance testing is weak while the meaning
of a falsification is much
more decisive (Bowers, 1977; Dar, 1987).
Schmidt (1992) describes the great difficulty in getting
psychologists to move away
from significance testing to a more useful and accurate
approach. He asserts that
significance testing has lead to serious errors in interpreting
the meaning of data and
retarded the accumulation of knowledge. Meehl's basic
criticism is that most journal
articles reviewing an area of psychology consist of
a listing of all the relevant studies
located within some set of specified parameters and
then counting those for and those
against a particular proposition (Hedges & Olkin,
1980). This nose counting is
preposterous, according to Meehl (1978), because by
the principle of falsification a
single refutation is far more powerful than multitudinous
corroborations:
(T)he whole idea of simply counting noses is wrong because
a theory that has seven
facts for it and three facts against it is not in good
shape, and it would not be
considered so in any developed science (p. 823).
This argument is essential in dealing with the courtroom
advocates and judges who
may be influenced by a nose counting or box score approach.
That professionals are
so easily influenced by this type of thinking may be
due to the fact that most relatively
bright and competent people have learned in school that
science proceeds by the
inductive method of making observations, testing them,
and accepting the results as
proof of the hypothesis. This naive view of science
is no longer even partially correct
as a description of science. The conceptual advance
in the philosophy of science
represented by Popperian thought in the principle of
falsification simply has not
caught on with the society.
Psychology is responding to these criticisms by developing
the methods of meta-analysis and effect size. Meta-analysis (Meehl, 1992a;
Mullen et al.,1985) has
demonstrated the weakness of individual research studies
and pointed out the
problems in sampling error, measurement error, and other
artifacts in individual
studies. Contrary to popular belief no single study
can resolve an issue or answer a
question. "Only meta-analytic analysis across studies
can control chance and other
statistical and measurement artifacts and provide a
trustworthy foundation for
conclusions" (Schmidt, 1992, pp. 1179-1180). Effect
size is a method of assessing the
probability that an investigation will lead to statistically
significant results. Whereas
meta-analysis has been accepted and is being more widely
used, effect size has been
relatively ignored in the conduct of psychological research
(Cohen, 1992: Strube,
1985). Both of these approaches to the interpretation
of data will be necessary to
assess the impact of the Supreme Court's decision on
the admissibility of expert
testimony based on the science of psychology.
When attempting to deal with scientific testimony that
is counter to a proposition
advanced in the courtroom, attorneys may ask if there
are other studies that contradict
that testimony. When the accurate answer is that there
are contradictory studies, the
questioning stops and the advocate thinks the witness
has been impeached and the
impact of the scientific testimony lessened or removed.
When a judge, who may have
a naive view of scientific research, is confronted with
opposing experts, the judge may
come to view the process as an undesirable battle of
the experts rather than as a fact
finding process. The result may be that relevant scientific
testimony is not admitted
since the judge may assume that all studies are on the
same footing and have equal
validity and reliability. Then one simply adds up the
box score to see which side has
the greatest number of corroborative studies and that
is the winner. This approach to
science ignores the existence of both good and bad research,
a fact every scientist is
aware of. All studies are not equal nor does bad science
meet the minimal
qualifications of research design, execution and replication.
The box score approach
ignores the greater significance that must be given
to falsification.
The Supreme Court decision in Daubert remedies this
misunderstanding and makes
the box score approach unacceptable. No matter how many
corroborative studies
there are, a single instance of a well-done and credible
study that falsifies the theory
or concept or technique, may outweigh them all and make
the corroborative material
inadmissible.
A review of all the briefs, motions, and amicus curiae
briefs submitted to the court in
this case strongly suggests that the decision is primarily
based on the amicus brief
submitted by the Carnegie Commission on Science, Technology,
and Government as
Amicus Curiae in Support of Neither Party (Berger, Gallagher,
& Esty, 1992). This is
the only brief that contains reference to the principle
of falsifiability, testability, and
replication. It is the suggested framework of this brief
as a replacement for the Frye
Rule that is adopted by the Supreme Court in its decision.
This brief also suggests the inclusion of the degree
of error as a factor for judges to
consider when evaluating the scientific quality of claims
for admission as testimony.
The brief advises the court that a judge may have to
consider study design, data
collection, or error rate to determine whether the methodology
used was so skewed as
to justify exclusion. The decision includes the language:
Additionally, in the case of a particular scientific
technique, the court ordinarily should
consider the known or potential rate of error . . .
and the existence and maintenance of
standards controlling the technique's operation (Berger,
Gallagher, & Esty, 1992 p.
14-15).
For example, if this factor is considered by judges
and applied in a rational manner,
much of the testimony based on a medical examination
for evaluation of a child sexual
abuse allegation will be inadmissible. The best scientific
evidence that gives an
indication of the potential error rate for medical examinations
concludes that the error
rate when a physician claims a medical examination supports
penile penetration is
63%, for digital penetration 73%, and for a general
conclusion of abuse over 70%
(Paradise, 1989; Zeitlin, 1987). Such a high error rate,
always in the direction of false
positives, can only confuse the entire process.
The Supreme Court's inclusion of error rate as a factor
in assessing the admissibility
of evidence opens the door to the scientific analyses
of the error rates of the entire
system of child protection, law enforcement, and the
justice system in dealing with
allegations of child sexual abuse. Any decision-making
structure built on error is liable
to produce error at an indeterminate, unrecognized,
but significant level that causes
harm (Gambrill, 1990). Evaluation of the extent of error
and thus the potential for harm
may best be done by the application of Bayes Theorem (Fischhoff &
Beyth-Marom,
1983). Here, across at least 26 years, every scientist
who has analyzed the error rate
of the decision making process has concluded that the
error is always in the direction
of an unacceptable level of false positives. The lowest
ratio is 3 false positives to every
true positive while the highest is an astonishing 2000
to 1 (Horner, 1992).
The Bayes approach rests upon the principle that the
degrees of belief in an ideally
rational person conform to the mathematical principles
of probability theory (Horwich,
1982). This concept should be acceptable to a scientifically
trained person or to
anyone that respects science. A number of scientists
have applied Bayesian inference
to child sexual abuse in the interest of assessing the
level of error and type of error
produced by the system. Every Bayesian analysis of the
decisions made by the child
abuse system that we have found concludes that the most
probable and most frequent
type of error is false positive, that is, identifying
an individual as abused or an abuser
when it is not true (e.g., Altemeier, O'Connor, Vietze, Sandler, & Sherrod, 1984;
Caldwell, Bogat, & Davidson, 1988; Gambrill, 1990;
Horner, 1992; Horner & Guyer,
1991a, 1991b; Kotelchuck, 1982; Milner, Gold, Ayoub,
& Jacewitz, 1984; Paradise,
1989; Realmuto, Jensen, & Wescoe, 1990; Starr, 1979;
Wakefield & Underwager,
1988; Zeitlin, 1987). This is true even when a 95% accuracy
level for the decision
making is assumed as Gambrill (1990) does. Starr (1979)
assumes a procedure that
is 83% accurate in correctly identifying abusive situations.
Still he reports a ratio of 20
false positives to one true positive.
Lindsay and Read (1993) apply Bayesian inference to
the issue of recovered
repressed memories of childhood sexual abuse and report
that, even using the most
extreme numbers suggested by the proponents of recovered
repressed memories and
assuming an unrealistic accuracy of diagnosis of 90%,
one-third of the decisions that
the memories are accurate are going to be false positives.
Going to 80% accuracy, still
wildly unrealistic for any real world diagnosis, means
56% of the diagnosis of
repressed memories would be wrong.
This information on the error rates is established by
the Supreme Court as a vital
factor in assessing the scientific nature of proffered
testimony. It can hardly be ruled
that this is not relevant to the finder of fact in weighing
and assessing the evidence
presented. Even if a judge should ignore the error rate
of the entire system and permit
testimony by social workers, mental health professionals,
and law enforcement
agents, the error rate of the system is now relevant
to the jury in weighing that
evidence.
Although we have selected psychology as the science
to discuss relative to Daubert
decision and have examined the effect of this decision
on the evidence we are familiar
with in claims of childhood sexual abuse, the same basic
principles will apply to all
sciences and to all trials in which scientific evidence
may be a factor. The same
situation will pertain in physics, biology, economics,
astronomy, sociology, chemistry,
and the practice of medicine.
Conclusion
The first step for any advocate and judge is to understand
the nature of the criterion of
falsifiability and this means understanding the philosophy
of science. Understanding
the implications of the Daubert decision will most likely
require significant effort since
this is a revolutionary change that shifts the entire
enterprise into new and untried
ground. The second step is applying the decision to
specific cases. The process of
working out the implications of this decision will likely
take many years, countless
cases, and a multitude of confused attorneys and irate
judges. Change never comes
easily and the justice system may be more resistant
to change than many other
institutions. But change it must.
One thing is certain. Scientists who offer expert testimony
in the courtroom need to be
knowledgeable and skilled in dealing with the philosophy
of science and the issues
raised by the establishment of the criterion of falsifiability
as the determinant of
science.
References
Aldridge-Morris, R. (1989). Multiple Personality: An Exercise in Deception
()(). Hillsdale:
Lawrence
Erlbaum Associates.
Altemeier, W. A., O'Connor, S., Sherrod, K. B., Tucker,
D., & Vietze, P. (1986).
Outcome of abuse during childhood among pregnant low
income women.
Child Abuse & Neglect, 10, 319-330.
American Psychiatric Association
(1987). DSM-III-R (Diagnostic
and Statistical Manual (Third Edition-Revised)
()(). Washington, DC: Author.
Arkes, H. R., & Harkness, A. R. (1980). Effects
of making a diagnosis on subsequent
recognition of symptoms. Journal of Experimental Psychology:
Human Learning and
Memory, 6, 568-575.
Berger, M. A. Gallagher, S. G., & Esty, E. H. (December
2, 1992) No, 92-102 In the
Supreme Court of the United States, October term, 1992.
Daubert et al. v. Merrell Dow Pharmaceuticals. Brief of the Carnegie Commission on
Science, Technology, and
Government as Amicus Curiae in Support of Neither Party.
Bowers, K. S. (1977). Science and the limits of logic:
A response to the Mahoney-Demonbreun paper. Cognitive Therapy and
Research, 1,
239-246.
Bower, G. H. (1990). Awareness, the unconscious, and
repression: An experimental
psychologist's perspective. In J. L. Singer (Ed.), Repression
and Dissociation:
Implications for Personality Theory, Psychopathology
and Health ()
(pp.209-231).
Chicago: The
University of Chicago Press.
Briere, J., & Conte, J. (1989, August). Amnesia
in adults molested as children: Testing
theories of repression. Paper presented at the Annual
Meeting of the American Psychological Association, New Orleans, LA.
Caldwell, R. A., Bogat, G. A. & Davidson, W. S.
(1988). The assessment of child abuse
potential and prevention of child abuse and neglect:
A policy analysis. American
Journal of Community Psychology, 16, 609-624.
Cohen, J. (1992). A power primer. Psychological Bulletin,
112(1), 155-159.
Dar, R. (1987). Another look at Meehl, Lakatos, and
the scientific practices of
psychologists. American
Psychologist, 42, 145-151.
Dawes, R. M. (1992). Why believe that for which there
is no good evidence? Issues in
Child Abuse Accusations, 44(4), 214-218.
Fahy, T. A. (1988). The diagnosis of multiple personality
disorder: A critical review. British Journal of
Psychiatry, 153, 597-606.
Fischhoff, B., & Beyth-Marom, R. (1983). Hypothesis
evaluation from a Bayesian
perspective. Psychological Review,
90, 239-260.
Gambrill, E. (1990). Critical Thinking in Clinical Practice ().
San Francisco: Jossey-Bass
Publishers.
Garry, M., & Loftus, E. F. (1993, April ). Women
who remember too much. Paper
presented at the False Memory Syndrome Foundation conference,
Valley Forge: PA.
Giannelli, P. C. (1980). The admissibility of novel
scientific evidence: Frye v. United
States, a half-century later. Columbia Law
Review, 80,
1197-1250.
Green, (1992). Expert witnesses and sufficiency of evidence
in toxic substances
litigation: The legacy of Agent Orange and Bendectin
litigation, 86 Nw U.L. Rev. (Cited
in William Daubert, et ux., etc., et al., Petitioners
v Merrell Dow Pharmaceutical, Inc.,
Supreme Court of the United States, No. 92-102, decided
on 6/28/93.)
Hedges, L. V. & Olkin, I. (1980). Vote counting
methods in research synthesis. Psychological Bulletin,
88, 359-369.
Hempel, C. (1966). Philosophy of natural science. (Cited
in William Daubert, et ux.,
etc., et al., Petitioners v Merrell Dow Pharmaceutical, Inc., Supreme Court of the
United States, No. 92-102, decided on 6/28/93.)
Herman, J. L., & Schatzow, E. (1987). Recovery and
verification of memories of
childhood sexual trauma. Psychoanalytic Psychology,
4(1), 1-14.
Holmes, D. S. (1990). The evidence for repression: An
examination of sixty years of
research. In J. L. Singer (Ed.), Repression
and Dissociation:
Implications for Personality Theory, Psychopathology
and Health ()
(pp.
85-102). Chicago: The
University of Chicago Press.
Horner, T. M. (1992). Expertise in regard to determinations
of child sexual abuse.
Unpublished manuscript.
Horner, T. M., & Guyer, M. J. (1991). Prediction,
prevention, and clinical expertise in
child custody cases in which allegations of child sexual
abuse have been made: I.
Predictable rates of diagnostic error in relation to
various clinical decision making
strategies. Family
Law Quarterly, 25, 217-252.
Horner, T. M., & Guyer, M. J. (1991). Prediction,
prevention, and clinical expertise in
child custody cases in which allegations of child sexual
abuse have been made: II.
Prevalence rates of child sexual abuse and the precision
of 'tests' constructed to
diagnose it. Family
Law Quarterly, 25, 381-409.
Imwinkelried, E. J. (1992). Attempts to limit the scope
of the Frye standard for the
admission of scientific evidence: Confronting the real
cost of the general acceptance
test. Behavioral Sciences
and the
Law, 10, 441-454.
Kirk, S. A., & Kutchins, H. (1992). The Selling
of DSM: The Rhetoric of Science in Psychology ()(). New York:
Aldine De Gruyter.
Kotelchuck, M. (1982). Child abuse and neglect: Prediction
and misclassification. In R.H. Starr, Jr. (Ed.). Child Abuse Prediction: Policy
Implications
() (pp. 67-104).
Cambridge, MA: Ballinger.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions ()().
Chicago: The University of Chicago Press.
Lee, A. S. (1987, June). Quixotic communication: The
case of expert witness
testimony. Knowledge: Creation, Diffusion, Utilization, 8(4), 549-585.
Lindsay, D. S. & Read, J. D. (1993). Psychotherapy
and memories of childhood sexual
abuse: A cognitive perspective. Unpublished manuscript.
Mahoney, M. J., & DeMonbreun, B. G. (1977). Psychology
of the scientist: An analysis
of problem-solving bias. Cognitive Therapy and
Research, 1, 229-238.
McCord, D. (1987). Syndromes, profiles and other mental
exotica: A new approach to
the admissibility of nontraditional psychological evidence
in criminal cases. Oregon
Law Review, 66, 19-108.
Meehl, P. E. (1967). Theory-testing in psychology and
physics: A methodological
paradox. Philosophy of Science, 34, 103-115.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks:
Sir Karl, Sir Ronald, and
the slow progress of soft psychology. Journal of Consulting
and Clinical Psychology, 46, 806-834.
Meehl, P. E. (1989). Law and the fireside inductions
(with postscript): some reflections
of a clinical psychologist. Behavioral Sciences & the
Law, 7, 521-550.
Meehl, P. E. (1992a). The miracle argument for realism:
An important lesson to be
learned by generalizing from carrier's counter-examples.[Monograph].
Study History,
Philosophy, & Science, 23(2), 267-282.
Meehl, P. E. (1992b). Cliometric metatheory: The actuarial
approach to empirical,
history-based philosophy of science. Psychological Reports Monographs, 1-V71.
Milner, J. S., Gold, R. G., Ayoub, C., & Jacewitz,
M. M. (1984). Predictive validity of the
child abuse potential inventory. Journal of Consulting
and Clinical Psychology, 52,
879-884.
Mullen, B., Atkins, J. L., Champion, D. S., Edwards,
C., Hardy, D., Story, J. E., & Vanderklok, M. (1985). The false consensus effect: A
meta-analysis of 115 hypothesis
tests.
Journal of Experimental Social Psychology, 21,
262-283.
Myers, J. E. B. (1993). Expert testimony describing
psychological syndromes. Pacific
Law Journal, 24, 1449-1464.
Orne, M. T., Dinges, D. F., & Orne, E. C. (1984).
On differential diagnosis of multiple
personality in the forensic context. International Journal of Clinical and Experimental
Hypnosis, 32(2), 118-169.
Paradise, J. E. (1989). Predictive accuracy and the
diagnosis of sexual abuse: A big
issue about a little tissue.
Child Abuse & Neglect,
13, 169-176.
Popper, K. (1959). The Logic of Scientific Discovery ().
London: Hutchinson.
Popper, K. R. (1992/1956/1983). Realism and the Aim
of Science (). New York:
Routledge.
Popper, K. (1989). Conjectures and refutations: The
growth of scientific knowledge,
5th (Cited in William Daubert, et ux., etc., et al.,
Petitioners v Merrell Dow
Pharmaceutical, Inc., Supreme Court of the United States,
No. 92-102, decided on
6/28/93.)
Popper, K. The Open Society and Its Enemies ()(). Lawrenceville,
NJ: Princeton University
Press.
Realmuto, G., Jensen, J., & Wescoe, S. (1990). Specificity
and sensitivity of sexually
anatomically correct dolls in substantiating abuse:
A pilot study. Journal
of the American
Academy of Child Adolescent Psychiatry, 29,
743-746.
Reuben, R. C. (1993, June 29). Justices adopt new scientific
evidence test. Los
Angeles Daily Journal, pp. 1, 10.
Schmidt, F. L. (1992). What do data really mean? American
Psychologist, 47,
1173-1181.
Spanos, N. P., Weekes, J. R., & Bertrand, L. D.
(1985). Multiple personality: A social
psychological perspective. Journal of Abnormal Psychology,
94, 362-376.
Starr, R. H. (1979). Child abuse. American
Psychologist, 34, 872-878.
Strube, M. J. (1985). Power analysis for combining significance
levels. Psychological Bulletin,
98, 595-599.
Wakefield, H., & Underwager, R. (1988). Accusations
of Child Sexual Abuse ()().
Springfield, IL: CC Thomas.
Wakefield, H., & Underwager, R. (1992). Uncovering Memories of Alleged Sexual
Abuse: The Therapists Who Do It. Issues in Child Abuse
Accusations, 4(4), 197-213.
Weissberg, M. (1993). Multiple personality disorder
and iatrogenesis: The cautionary
tale of Anna O. The International Journal of Clinical and Experimental
Hypnosis, 41(1), 15-32.
West, R. (1992). Assessment of evidence versus consensus
or prejudice. Journal of
Epidemiology & Community Health, 46(4), 321-322.
Williams, L. M. (1992). Adult memories of childhood
Abuse: Preliminary findings from
a longitudinal study. APCAC Advisor, Summer, pp. 10-21.
Zeitlin, H. (1987, October 10). Investigation of the
sexually abused child. The Lancet,
pp. 842-845.
1 Frye v. United States, 293 F. 1013, 1014, a 1923 decision
of the United States Court
of Appeals for the D.C. Circuit. Under the Frye test
a scientific technique is not
admissible unless the technique is "generally accepted"
in the scientific community.
Giannelli (1980) notes that the Frye rule envisions
a process by which a novel
technique must pass through an "experimental"
stage where it is scrutinized by the
scientific community. Only after it has passed successfully
through this process and
has entered into the "demonstrable" stage
can it be admissible. Under the Frye rule it
is not enough that a qualified expert or experts believe
the technique is valid and
reliable, it must be generally accepted by the relevant
scientific community. [Back]
2 William Daubert, et ux., etc., et al., Petitioners
v Merrell Dow Pharmaceutical, Inc.,
Supreme Court of the United States, No. 92-102, decided
on 6/28/93. [Back]
* Ralph Underwager and Hollida Wakefield are psychologists at the
Institute for Psychological Therapies,
5263 130th Street East,
Northfield, MN 55057-4880.
[Back]
(If
you came here from the Library, click here
to return.)
|