Reliable Classification vs. Idiosyncratic Opinion: A Reply to Gardner

Terence W. Campbell*

Allow me to begin this reply to Gardner by specifying what I will address, and what I will not.  Until Gardner reads my articles outlining the rumor model for assessing false allegations of sexual abuse (Campbell, 1992a, b), his hypercritical but ill-informed comments are undeserving of any response.  To belabor the obvious, supporting my position regarding the serious shortcomings of Gardner's "Indicators" does not require me to defend my own model.

In his response to my original article, Gardner deserves credit for recognizing his own facility for pedantic excess as he debated obscure issues of grammatical protocol; and I have no desire to imitate him in that regard.  I seriously doubt that such pettiness really interests readers.  Suffice to say, my use of sic merely conforms to the stylistic requirements of any APA-style journal including this particular publication.  Perhaps I should reassure Gardner that I used this same term in a previous article that appeared in this very journal (Campbell, 1992c, p.120).  Given this information, Gardner might feel less singled out for criticism.

Disregarding Gardner's tone of self-righteous animus, and his related penchant for ad hominem arguments, enables me to redirect attention to the fundamental question raised by my original article: Can Gardner's "Indicators of Pedophilia" be used in a sufficiently reliable manner to support exert testimony in a forensic setting?  My answer to this question is obviously an emphatic "No," and surprisingly enough, Gardner seems to agree with me more often than not.  I would furthermore insist that considerations of intellectual honesty and responsible scholarship dictate that Gardner and I confine ourselves to issues that are directly relevant to this question.  To do otherwise merely distracts readers from the issue at hand.

Above all else, Gardner's indicators flounder as a result of the many shortcomings related to clinical judgment.  These shortcomings were clearly outlined in my original article, but Gardner preferred to ignore them.  Despite his preference to the contrary, however, Gardner must contend with the many problems undermining the reliability and validity of clinical judgment in defending his own indicators.  Unfortunately, Gardner's response to my original article neglects to address these problems, and moreover, he manages to compound them.

For example, Gardner scolds me for disregarding his emphasis on "the quality and quantity of the criteria satisfied."  Nevertheless, addressing considerations of the "quality and quantity" of Gardner's indicators again resorts to clinical judgment.  Gardner offers no well-defined decision-making rules specifying exactly how an evaluator should weigh the "quality and quantity" of his indicators.  Without well-defined decision-making rules for guiding the endeavor Gardner recommends, evaluators must rely on their intuitive impressions.  In other words, Gardner's recommendations regarding the "quality and quantity" of his indicators create more problems than they solve.  In fact, an item by item evaluation of Gardner's 24 indicators demonstrates how the vast majority of them suffer the unreliable effects of clinical judgment.

1) History of Family Influences Conducive to the Development of significant Psychopathology

My major criticism of this particular indicator emphasized: "Without well-specified decision rules for defining family violence, familial alcoholism, psychopathy, and serious psychiatric disturbance, the ill-defined ambiguity of these terms guarantees their inconsistent application in practice."  In response to this criticism, Gardner replied: "I suggest that the most blatant manifestations of family dysfunction be utilized, e.g., 'history of violence, alcoholism, drug abuse, psychopathy, serious psychiatric disturbance, and suicide'."

Unfortunately, Gardner's recommendations regarding "blatant manifestations" merely beg the question.  He provides no well-defined decision-making rules for discriminating between mild, moderate, and "blatant" manifestations of family of origin pathology.  Consequently, Gardner has again premised his argument upon the notorious unreliability of clinical judgment; and as result, this particular indicator cannot withstand well-informed cross-examination in a forensic setting.

2) Longstanding History of Emotional Deprivation.

In response to my criticisms of this indicator, Gardner replied: "Again, I am in agreement that this criterion might be difficult to apply in certain cases.  I am in agreement, also, that it might be misinterpreted."  Therefore, I submit that Gardner himself has closed the case for this indicator.  Like the previous indicator, this one also could not withstand well-informed cross-examination in a forensic setting.

3) Intellectual Impairment

In response to my original criticisms of this indicator, Gardner explained: "I recognize that some scientific studies provide support for this criterion and others do not.  I have openly admitted that this is one of the weaker criteria, which is certainly deserving of further study (as are all of them)."  Beyond my unqualified agreement with Gardner's own assessment of the limitations related to this indicator, these very limitations raise another difficult problem associated with practically all of his indicators.

Figure 1
Co-variation Matrix Applied to Gardner's Indicators.

Indicator Present Indicator Absent
Pedophilia Present A B
Pedophilia Absent C D

Figure 1 represents a 2 x 2 co-variation matrix (Arkes, 1989).  This matrix illustrates the most serious problems sabotaging the reliability and validity of Gardner's indicators.  Cells A and D in Figure 1 represent classificatory "hits," and cells B and C represent classificatory "misses" or errors.  In particular, cell B corresponds to false-negative errors wherein one of Gardner's criteria indicates that pedophilia is absent, but in fact it is present.  Conversely, cell C corresponds to false-positive errors wherein one of Gardner s criteria indicates that pedophilia is present, but in fact it is absent.

Like so many other mental health professionals who ignore the classificatory problems associated with base rates, Gardner seems to focus excessively on the frequency with which cases fall into cells A and D, and he overlooks the frequency with which cases fall into cells B and C.1  Without data available to determine the frequency with which Gardner's various indicators lead to false-positive and false-negative errors, their use in a forensic setting could result in an unacceptable number of classificatory mistakes.

In particular, the relative infrequency with which pedophilia occurs throughout the male population alarmingly increases the probability of Gardner's indicators resulting in false-positive classifications.  Consequently, one can argue that Gardner's indicators have not yet developed beyond an experimental stage (which he seems to suggest himself); and as result, his indicators cannot satisfy the Frye test (Frye v. U.S., 1923).  The Frye test demands that expert testimony be premised on evidence and principles that enjoy general acceptance in the relevant scientific or professional community.  In other words, the unavailability of reliability and validity data to support Gardner's indicators precludes their use in a forensic setting.  Just as courts have excluded evidence related to Summit's "Child Sexual Abuse Accommodation Syndrome" because of the frequency with which it results in false-positive classifications (Ewing, 1992; Myers et al., 1989), Gardner's indicators deserve the same fate for the same reasons.

The problems of false-positive errors are especially applicable to Gardner's index of "Intellectual Impairment."  Gardner does not specify what he means by intellectual impairment — does this criterion correspond to a below average IQ of 99 or less, or does it correspond to a formal DSM-III-R (American Psychiatric Association, 1987) diagnosis of mental retardation with an IQ of 70 or below?  Though defining "Intellectual Impairment in terms of DSM-III-R criteria would reduce the frequency of false-positive errors, using this criterion still creates the risk of an unacceptable frequency of mistaken classifications.

4) Childhood History of Sexual Abuse

It would be more appropriate to disregard Gardner's argumentative rhetoric responding to my criticism of this indicator, and instead apply the co-variation matrix table to it.  Once again, Gardner reports no data allowing a court to determine the frequency with which this criterion leads to false-positive and false-negative classifications.  In his response to my original article, Gardner also neglects to address the problems related to defining a childhood history of sexual abuse.  For example, has a child who witnessed the indecent exposure of an adult been sexually abused?  The unavailability of decision-making rules for borderline situations such as these make this indicator inherently unreliable.

5) Longstanding History of Very Strong Sexual Urges

Despite Gardner's protests to the contrary, this particular indicator still amounts to a "definitional nightmare."  Gardner suggests that specifying the age at which masturbation began allows one to define this indicator more reliably.  Pleased as Gardner seems with his suggestion, it merely leads to more problems involving how to define masturbation?  For example, does the self-stimulatory rocking of most infants qualify as masturbation?  This indicator also overlooks the data reporting significant variations in the frequency of sexual outlet as a result of social class (Berelson & Steiner, 1964; Kinsey et al., 1948).  Thus, what is normative sexual behavior for one social class is not for another; and this consideration further underscores the status of this criterion as a "definitional nightmare."  Though Gardner regards this indicator as particularly significant, there is an alarming likelihood that it could result in an unacceptable frequency of false-positive classifications.

6) Impulsivity

In his response to my criticisms of this indicator, Gardner neglected to address the single, most important problem — "how do we reliably define impulsiveness?"  Without a reliable definition of impulsiveness, Gardner can only rely on the massive shortcomings of clinical judgment to identify what qualifies as impulsivity.  I would argue that resorting to clinical judgment to assess impulsiveness leads to an inordinate number of false-positive and false-negative classifications when attempting to identify pedophiles.

7) Feelings of Inadequacy and Compensatory Narcissism

In response to my criticism of this indicator, Gardner replies: "I recognize the difficulties in objectifying feelings of inadequacy.  The compensatory narcissism that derives from it is easier to assess."  Gardner's confidence in reliably assessing compensatory narcissism amounts to another example of his gratuitous overconfidence.  I would remind Gardner that the diagnostic class of "Personality Disorders," which includes "Narcissistic Personality Disorder," fails to satisfy the recommended inter-rater reliability standards for DSM-III (American Psychiatric Association, 1980, p. 470).2

Though DSM-III provides decision-making criteria for diagnosing Personality Disorders in general — and Narcissistic Personality Disorder in particular — this diagnostic class does not qualify as reliable.  Given the ill-defined criteria that Gardner uses to assess "compensatory narcissism," it is unlikely that the inter-rater reliability for this indicator would fare any better than the DSM-III diagnosis of Personality Disorders.  Therefore, this is another instance of Gardner's faith in his indicators — in this instance the ease with which compensatory narcissism can be assessed — remaining unsupported by relevant data.  Once again, then, we have another indicator that cannot survive objective scrutiny.

8) Coercive-Dominating Behavior

In reacting to my original article, Gardner insists: "There is very strong evidence in the scientific literature for this type of pedophile."  Whether his assessment of the literature related to this indicator is accurate is not the issue.  Instead, the issue involves the now too-familiar question of how reliably can any evaluator assess "Coercive-Dominating Behavior" using Gardner's criteria?  As pointed out in my original article, "This index involves multiple categories of behavior (anti-social, aggressiveness, overt and covert domination) which are so poorly defined that they defy reliable classification."  Gardner has failed to respond to this criticism; and as a result, I submit that this particular indicator warrants repudiation as inherently unreliable.

9) Passivity and Impaired Self-Assertion

Gardner acknowledges the problems related to reliably assessing the traits associated with this indicator.  Therefore, he seems to agree that this indicator also warrants repudiation by virtue of its inherent unreliability.

10) History of Substance Abuse

In responding to my criticisms of this indicator, Gardner begrudgingly acknowledges, "Of course there are borderline situations."  I could not agree more, and Gardner's failure to develop decision-making rules for these "borderline situations" can only reduce the reliability of this indicator to an unacceptable level.  Applying considerations of base rate data to this indicator also leads to the conclusion that it would result in an unacceptable frequency of false-positive classifications.  Because the incidence of substance abuse far exceeds the incidence or pedophilia, this indicator will inevitably misclassify a large number of non-pedophiles as pedophiles.

11) Poor Judgment

In responding to my criticisms of this indicator, Gardner admits: "I recognize that this is one of the more difficult criteria to objectively assess."  Nevertheless, he proceeds to admonish me for challenging the validity of this criterion.  Gardner needs to carefully review the explanations of reliability and validity in my original article.  Perhaps, then, he will understand that while reliable criteria may be valid, unreliable criteria — by definition — are always invalid (Anastasi, 1982).  Therefore, given the inherent unreliability of this indicator, it can never be established as valid.

12) Impaired Sexual Interest in Age-Appropriate Women

In response to my criticisms of this indicator, Gardner protests my alleged disregard of the references he cites to support it, and he lamely argues that, "Every criterion will have its borderline subjects."  Again, however, Gardner overlooks the fundamental problem undermining almost all of his criteria — how do two or more evaluators reliably use this indicator and the others?  Until Gardner can satisfactorily deal with this question via the development of well-defined decision-making rules, his indicators will continue to qualify only as an experimental procedure.  To belabor the obvious, courts do not typically admit expert testimony premised upon experimental procedures.

13) Presence of Other Sexual Deviations

As I pointed out in my original article, this indicator is more conducive to reliable definition; but that most certainly does not guarantee its validity.  Establishing the validity of this particular indicator, and all of Gardner's other criteria for that matter, necessitates the use of the co-variation matrix presented in Figure 1.  Without the availability of data to indicate the frequency with which this criterion would classify a sample of subjects into cells A, B, C, and D, its validity is yet to be established.

14) Psychosis

My comments for Indicator #13 are equally applicable to this criterion.

15) Immaturity and/or Regression

In responding to my criticisms of this indicator, Gardner contends: "The fact that it may be hard to objectively (or reliably) define immaturity in some individuals, the fact that it may be difficult to provide objective criteria for regression, does not preclude the validity of this criterion."  Unfortunately, Gardner's argument disregards more than 70 years of accumulated data related to the relationship between the reliability and validity of assessment procedures, and as result, his argument is ill-informed.  Quite simply, the validity of any assessment procedure can never exceed its reliability (Anastasi, 1982; Cronbach, 1970).  Therefore, the unavailability of objective (or reliable) criteria with which to define immaturity and/or regression most certainly does preclude the validity of this indicator.

16) Large Collection of Child Pornographic Materials

This is the indicator on which Gardner and I most likely share the greatest agreement.  Nevertheless, I would suggest that he could strengthen this indicator by defining it in terms of the "Possession of any child pornographic material."  Additionally, "child pornographic material" could be defined as any material that would subject an individual to federal prosecution if he or she were to send it through the U.S. mail.  By virtue of how they have been re-defined, these criteria now qualify as reliable.  Consequently, Gardner would not have to engage in name-calling (e.g., "zealot") directed at evaluators who use this indicator in ways other than he intends.  Such misuse of this indicator is precluded by the redefined criteria related to it.

17) Career Choice That Brings Him in Contact with Children

My comments for Indicator #13 are also applicable to this particular criterion.  Additionally, this is another indicator that would result in an unacceptable frequency of false-positive classifications despite Gardner's affinity for it.  Because the frequency of males who choose careers that bring them into contact with children far exceeds the incidence of pedophilia, this indicator misclassifies many well-adjusted males as pedophiles.

18) Recent Rejection by a Female Peer or Dysfunctional Heterosexual Relationship

My comments for Indicator #13 related to the covariation matrix are also applicable to this indicator.

Additionally, I would also emphasize that this particular criterion would result in an unacceptable frequency of cases falling into cell C, or being classified as false-positives.  The rationale supporting this assertion is clearly outlined in my original article.

19) Unconvincing Denial

Because of the massive reliability problems related to this indicator, my comments for Indicator #13 related to the covariation matrix are again applicable to this particular criterion.

20) Use of Rationalizations and Cognitive Distortions That Justify Pedophilia

I suspect that Gardner failed to carefully read my comments in response to this indicator; and therefore, I will repeat them.  I emphasized, "This index does not qualify as an 'indicator' of pedophilia; instead, it conclusively confirms pedophilia when a suspect satisfies it."  Consequently, carefully reading my comments clearly reveals that I neither trivialize nor offhandedly reject this criterion.  I only emphasized that it is much more than a mere indicator.  As a result, it seems Gardner would rather argue than accept the credit I gave him.  Nevertheless, I would still insist that Gardner deserves credit for specifying an important characteristic of pedophiles via this particular criterion.

20) Resistance to Taking a Lie Detector Test

Gardner unfortunately overlooks the most serious problem created by this indicator.  He speaks of a population of "... pedophiles who refuse to take the test for the reason they fear it will disclose their pedophilia," and I would agree with him that this population most certainly does exist.  Gardner should also consider the population of suspects falsely accused of child sexual abuse who are disinclined to undergo polygraph examination because of that device's unreliability.  Then, when we consider the two populations together, we must ask what decision-making rules does Gardner provide for reliably discriminating between these two populations?  To belabor the obvious, Gardner offers no more than clinical judgment for discriminating between these two populations; and as previously emphasized, clinical judgment does not suffice for such discriminations.

22) lack of Cooperation in the Evaluative Examination

Gardner has offered no well-defined decision-making rules for determining exactly what qualifies as "lack of cooperation in the evaluative examination."  Consequently, this indicator merely invites unreliable speculation and conjecture.

23) Duplicity Unrelated to the Sex-Abuse Denial and Psychopathic Tendencies

Gardner has failed to respond adequately to the major shortcoming of this indicator — how does an evaluator reliably identify "duplicity" and "psychopathic tendencies?"  Without well-defined decision making-rules for these criteria, they also invite the unreliable outcomes of clinical judgment.  I am pleased to know that Gardner rejects the "psychopathic deviant (sic)" scale of the MMPI for reliably discriminating between pedophiles and non-pedophiles.  I would remind him that he "seemed" to conclude otherwise as a result of citing the work of Haugaard and Repucci (1988) involving the Psychopathic Deviate scale of the MMPI.

24) Excessively Moralistic Attitudes

Despite Gardner's suggestion to the contrary, I have no confidence whatsoever in the validity of this indicator.  Instead, I would insist that assessing this criterion via the covariation matrix of Figure 1 would most likely result in an unacceptable frequency of false-positive classifications (cell C in figure 1).  Gardner attempts to dismiss my position by claiming, "... he quickly raises his old argument of the difficulties in objectifying this criterion, the problems of inter-rater reliability and the dangers of one's own values interfering with assessing it."

Allow me to commend Gardner for accurately summarizing my position regarding this indicator, and as a result, perhaps he will deal with these issues more substantively in his subsequent response.  I would also remind him that however "old" my arguments related to this index are, their supposed age does not invalidate them.  Instead the familiarity of these criticisms merely correspond to Gardner's facility for relying excessively on clinical judgment again and again.


Homer and Guyer (1991) have carefully examined the classification problems endemic to child sexual abuse litigation.  In their cogent analysis of the classification errors committed by self-styled "validators" of sexual abuse, they emphasized:

Experts who cannot or will not convincingly specify the population with which the targeted individual is being compared and who cannot provide clearly reasoned and documented prevalence rates with which to calculate the likelihood of classification errors, are highly likely to make errors of classification.  In our opinion, such experts should be precluded from testifying in sexual abuse oases on grounds that their testimony is prejudicial and not at all probative of the issue before the court (p. 401).

Though Gardner clearly specifies the population with which he compares accused pedophiles, he most certainly cannot document the prevalence of classification errors attributable to his indicators.  Consequently, expert testimony premised upon Gardner's indicators is too likely to be prejudicial.

Throughout his response to my original article, the sum and substance of Gardner's reactions involve his shrill protests to the effect that — "But this is not how I intended to use the indicators."  Discrepancies between Gardner's thinking, and how others apply his indicators, result in him censuring any evaluator who deviates from what he intended.  It would be more appropriate, however, for Gardner to acknowledge that his criteria rely on such vague and ill-defined terms that they invite distortion and misuse.  Therefore, self-styled "validators" who twist Gardner's indicators to serve their own biased agenda can do so because of the indicators' inherent unreliability.  Ultimately, then, the exceedingly vague and ill-defined terms undermining Gardner's indicators encourage their exploitation.

Reliable classification necessitates clearly defined criteria to reduce the ambiguity that otherwise leads to conjecture and speculation.  Unfortunately, however, Gardner's response to my original article offers only his idiosyncratic opinion as a guide for the use of his indicators.  To belabor the obvious, idiosyncratic opinion is never a satisfactory substitute for reliable classification.

Gardner's willingness to establish his own idiosyncratic opinion as the standard for reliably using his indicators essentially requires that other evaluators attempt to read his mind.  This expectation demands clairvoyance which is as presumptuous as it is impossible to satisfy.  Thus, I conclude this reply as I concluded my original article: "Gardner's previously acknowledged reputation as a courageous figure deserves continued respect, but his 'Indicators of Pedophilia' do not."


Anastasi, A. (1982). Psychological Testing (5th ed.) (Out of Print)(Paperback - 1989 Edition). New York: The Macmillan Company.

American Psychiatric Association (1987). Diagnostic and Statistical Manual of Mental Disorders (3rd edition-revised) (Out of Print)(Out of Print). Washington, DC: Author.

American Psychiatric Association (1980). Diagnostic and Statistical Manual of Mental Disorders (3rd edition). Washington, DC: Author.

Arkes, H. R. (1989). Principles in judgment/decision making research pertinent to legal proceedings. Behavioral Sciences & the Law, 7, 429-456.

Berelson, B., & Steiner, G. A. (1964). Human Behavior: An Inventory of Scientific Findings (Out of Print). New York: Harcourt, Brace & World.

Campbell, T. W. (1992a). False allegations of sexual abuse and their apparent credibility. American Journal of Forensic Psychology, 10(4), 21-35.

Campbell. T. W. (1992b). Allegations of sexual abuse II: Case example of a criminal defense. American Journal of Forensic Psychology, 10(4), 37-48.

Campbell. T. W. (1992c). False allegations of sexual abuse and the persuasiveness of play therapy. Issues in Child Abuse Accusations, 4(3), 118-124.

Cronbach, L. J. (1970) Essentials of Psychological Testing (Hardcover). New York: Harper & Row.

Ewing, C. P. (1992 July). Judicial notebook: Child sexual abuse "validation" on trial — and retrial. APA Monitor, p. 14.

Frye v. U.S., 293 Fed. 1013, 1014 (D.C. Cir. 1923).

Haugaard, J. J., & Reppucci, N. D. (1988). The Sexual Abuse of Children (Hardcover). San Francisco: Jossey-Bass.

Horner, T. M., & Guyer, M. J. (1991). Prediction, prevention, and clinical expertise in child custody cases in which allegations of child sexual abuse have been made. II. Prevalence rates of child sexual abuse and the precision of "tests" constructed to diagnose it. Family Law Quarterly, 25, 381-409.

Kinsey. A. C., Pomeroy, W. B., Martin. C. E., & Gebhard, P. (1948). Sexual Behavior in the Human Male (Hardcover Reprint edition). Philadelphia: W.B. Saunders.

Myers, J. E., Bays, J., Becker, J., Berliner, L., Corwin, D. L, & Saywitz, K. J. (1989). Expert testimony in child sexual abuse litigation. Nebraska Law Review, 68, 1-145.

1 The concept of base rate clarifies the enormous problems undermining any attempt to assess or predict events that occur very infrequently.  For example, if 5% of the adult male population are pedophiles, accurately identifying this population subset is exceedingly difficult.  An evaluator who capitalizes on this base rate information can classify all males as non-pedophiles, and claim an accuracy rate of 95%.  Consequently, the merits of any set of indicators depend on whether their use results in greater classificatory accuracy than merely resorting to the relevant base rate.  [Back]

2 DSM-III specifies a kappa coefficient of .70 or greater as corresponding to an acceptable level of inter-rater reliability for its diagnostic categories (p. 468).  Phase one and phase two of the DSM-III field trials reported kappa coefficients of .56 and .65 respectively for Personality Disorders as a diagnostic Class.  Consequently, one can legitimately argue that Narcissistic Personality Disorder — one instance of Personality Disorder — is an inherently unreliable diagnosis.  I should also clarify that it is necessary to cite DSM-III in this regard because DSM-III-R does not report any kappa coefficients corresponding to the inter-rater reliabilities of its diagnostic categories.  [Back]

* Terence W. Campbell is a clinical and forensic psychologist at 36040 Dequindre, Sterling Heights, MI 48310.  [Back]

[Back to Volume 5, Number 3]  [Other Articles by this Author]

Copyright 1989-2014 by the Institute for Psychological Therapies.
This website last revised on April 15, 2014.
Found a non-working link?  Please notify the Webmaster.