Rational bias in forensic science



Similar documents
Strengthening Forensic Science in the United States: A Path Forward

STATEMENT OF PAUL C. GIANNELLI

NATIONAL COMMISSION ON FORENSIC SCIENCE. Ensuring That Forensic Analysis is Based Upon Task-Relevant Information

The NAS Report and Foundations of Good Science

Legal view of digital evidence

Decades of Successful Sex Crimes Defense Contact the Innocence Legal Team Now

How To Decide A Case In The Uk

Overview of the NAS Report on Forensic Science in the U.S.

DNA for Defense Attorneys. Chapter 6

Forensic Science Testing: The Forensic Filler-Control Method for Controlling Contextual Bias, Estimating Error Rates, and Calibrating Analysts Reports

RECOMMENDATION. Issues Covered During Hearings and Deliberations

Introduction to Forensic Science and the Law. FBI Building Washington, DC

Genetic Witness: Science, Law, and Controversy in the Making of DNA Profiling

ASA Board Policy Statement on Forensic Science Reform

Contextual information renders experts vulnerable to making erroneous identifications

Opening Statement Prof. Constantine Gatsonis Co-Chair, Forensic Science Committee

Confirmation bias in biometric and forensic identification. Tim Valentine

Glossary. To seize a person under authority of the law. Police officers can make arrests

Managing Contextual Bias in Forensic Science

The Legal System in the United States

How to Use the California Identity Theft Registry

CHAPTER 5 FORENSIC SCIENCE

Massachusetts Major City Chiefs. Best Practices in Eyewitness Identification and the Recording of Suspect Interviews

TOP TEN TIPS FOR WINNING YOUR CASE IN JURY SELECTION

INTRODUCTION FORENSIC SCIENCE AND THE LAW CHAPTER 1 OBJECTIVES

In Your Blood Forensic DNA Databases

10 Victims and the law 57

THE SCIENTIFIC COMMUNITY SPEAKS: The NAS Report: A Year Later. Robert Epstein Assistant Federal Defender Defender Association of Philadelphia

Computer Forensics: an approach to evidence in cyberspace

STATE OF ARIZONA, Appellee, ROY MATTHEW SOVINE, Appellant. No. 1 CA-CR

Lesson 4. Preventing and Policing White-Collar Crime

A Victim s Guide to Understanding the Criminal Justice System

Stages in a Capital Case from

Conviction Integrity Unit Best Practices October 15, 2015

DNA & CRIME VICTIMS: WHAT VICTIMS NEED TO KNOW

How To Find Out If Watching Ctv Makes A Person Guilty Of A Crime Without Scientific Evidence

Executive Summary. Summary - 1

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

VOIR DIRE 2/11/2015 STATE OF TEXAS VS JANE DOE 1. CONVERSATION - ONLY TIME YOU CAN ASK THE LAWYERS QUESTIONS 2. NO RIGHT OR WRONG ANSWER

Forensic Science - A Path Forward Committee on Finding the Perfect Career

In 2009, the nation s 411 publicly funded crime

ARREST! What Happens Now?

MINNESOTA JUDICIAL TRAINING UPDATE

CRIMINAL LAW & YOUR RIGHTS MARCH 2008

Being a witness in a criminal trial

THE SUPREME COURT OF THE REPUBLIC OF PALAU HANDBOOK FOR TRIAL JURORS

The Police Have Left Word That They Want to Speak With You

U.S. MILITARY INVESTIGATIONS: TYPES & PROCEDURES

How To Get Your Criminal History From The Justice Department

The Facts About Forensic DNA Analysis and DNA Databases. dnasaves.org

When you open the newspaper, what types of stories are you most interested in reading? If you answered crime stories, you are not alone.

Glossary of Terms Acquittal Affidavit Allegation Appeal Arraignment Arrest Warrant Assistant District Attorney General Attachment Bail Bailiff Bench

Fifty years of media and forensic science. Prof Niamh Nic Daeid Professor of Forensic Science Centre for Forensic Science University of Strathclyde

Introduction to Forensic Science. So what is Forensic Science? Major Contributors. Chapter 1: Intro to FS

Forensic Science: Crime Scene Basics. T. Trimpe

Are DNA tests infallible?

Strengthening Forensic Science in the United States: A Path Forward

Sexual Assault of a Child VOIR DIRE QUESTIONS

COMMUNITY UNIT SCHOOL DISTRICT 200. Course Description

YOUR CONTACT DETAILS (ADDRESS, PHONE, etc.):

GENERAL ASSEMBLY OF NORTH CAROLINA SESSION 2011 SESSION LAW HOUSE BILL 27

DNA & CRIME VICTIMS: WHAT VICTIM ASSISTANCE PROFESSIONALS NEED TO KNOW

NOT ACTUAL PROTECTION: ACTUAL INNOCENCE STANDARD FOR CRIMINAL DEFENSE ATTORNEYS IN CALIFORNIA DOES NOT ELIMINATE ACTUAL LAWSUITS AND ACTUAL PAYMENTS

Rules of Professional Conduct For Attorneys Wisconsin Supreme Court Rule 20


A Victim s Guide to the Capital Case Process

Criminal Law Review Conference - 3 December Lord Justice Treacy. Keynote address

CRIMINAL COURT IN MINNESOTA: Understanding the Process so You can Sleep at Night

IN THE UNITED STATES DISTRICT COURT FOR THE WESTERN DISTRICT OF MISSOURI WESTERN DIVISION

Information for Crime Victims and Witnesses

Scaled Questions During Jury Selection

Introduction to Crime Scene Dynamics

If you are in doubt, or think you may not be qualified to serve on a jury for one of the above or any other reasons, please notify the judge.

DOLCEFINO CONSULTING

COURT OF APPEALS EIGHTH DISTRICT OF TEXAS EL PASO, TEXAS

Maricopa County Attorney s Office Adult Criminal Case Process

Franklin County State's Attorney Victim Services

Morgan County Prosecuting Attorney Debra MH McLaughlin

INFORMATION / FACT SHEET CRIME TO TRIAL PROCESS CRIMINAL COURT HEARINGS EXPLAINED

Summary of Qualifications: Education: Professional Experience: Kenneth R. Moses

CAREER: FORENSIC SCIENCE TECHNICIAN 1

Courts & Our Legal System

What is the "Code Of Service Discipline"?

William F. Leo Forensic Identification Consultant Credentials

CROSS EXAMINATION OF AN EXPERT WITNESS IN A CHILD SEXUAL ABUSE CASE. Mark Montgomery

TRAYVON S LAW BILL SUMMARY

Collecting a Buccal Swab An Art or a Cinch? By Chantel Marie Giamanco, Forensic Scientist Human Identification Technologies, Inc.

UNDERSTANDING THE CRIMINAL JUSTICE SYSTEM Anne Benson

Decided: May 11, S15A0308. McLEAN v. THE STATE. Peter McLean was tried by a DeKalb County jury and convicted of the

There is an inherent flaw in the United States criminal justice system. The

Sexual Assault and the Criminal Justice System

SUPERIOR COURT OF ARIZONA MARICOPA COUNTY LC DT 01/22/2015 THE HON. CRANE MCCLENNEN HIGHER COURT RULING / REMAND

ILLINOIS DEPARTMENT OF CENTRAL MANAGEMENT SERVICES CLASS SPECIFICATION FORENSIC SCIENTIST SERIES

NEW HAMPSHIRE E SCHOOL BOARDS ASSOCIATION Theodore E. Comstock, Executive Director Barrett M. Christina, Staff Attorney

MINNESOTA S EXPERIENCE IN REVISING ITS JUVENILE CODE AND PROSECUTOR INPUT IN THE PROCESS September 1997

GLOSSARY OF SELECTED LEGAL TERMS

How To Find Out If Watching Ctv Is A Reason To Convict Without Scientific Evidence

Court Findings of Ineffective Assistance of Counsel Claims in Post Conviction Appeals Among the First 255 DNA Exoneration Cases.

TRAVELING FORENSIC EDUCATION PROGRAM

Response to Critiques of Mortgage Discrimination and FHA Loan Performance

Transcription:

Law, Probability and Risk (2010) 9, 69 90 Advance Access publication on January 18, 2010 doi:10.1093/lpr/mgp028 Rational bias in forensic science GLEN WHITMAN Associate Professor of Economics, Department of Economics, California State University, Northridge, 18111 Nordhoff Street, Northridge, CA 91330-8374, USA AND ROGER KOPPL Professor of Economics and Finance, Department of Economics and Finance and Institute for Forensic Science Administration, Silberman School of Business, Fairleigh Dickinson University, Madison, NJ 07940, USA [Received on 26 May 2009; revised on 1 October 2009] The current organization of forensic science induces biases in the conduct of forensic science even if forensic scientists are perfectly rational. Assuming forensic examiners are flawless Bayesian statisticians helps us to identify structural sources of error that we might otherwise have undervalued or missed altogether. Specifically, forensic examiners conclusions are affected not just by objective test results but also by two subjective factors: their prior beliefs about a suspect s likely guilt or innocence and the relative importance they attach to convicting the guilty rather than the innocent. The authorities police and prosecutors implicitly convey information to forensic examiners by their very decision to submit samples for testing. This information induces the examiners to update their prior beliefs in a manner that results in a greater tendency to provide testimony that incriminates the defendant. Forensic results are in a sense contaminated by the prosecution and thus do not provide jurors with an independent source of information. Structural reforms to address such problems of rational bias include independence from law enforcement, blind proficiency testing and separation of test from interpretation. Keywords: forensic science; bias; Bayesian; NAS report; organization. 1. Introduction The report of the National Academy of Sciences (NAS), Strengthening Forensic Science in the United States: A Path Forward (NAS Committee on Identifying the Needs of the Forensic Sciences Community, 2009), notes that dependence on law enforcement creates the risk of bias. Forensic scientists who sit administratively in law enforcement agencies or prosecutors offices, or who are hired by those units, are subject to a general risk of bias (p. 6 2). We will examine this and other sources of bias in forensic science as it is currently organized in the USA. The potential biases we examine are largely attributable to the institutional structure of forensic science rather than the cognitive limits of individual forensic scientists. To draw out the consequences of factors that do not Email: glen.whitman@gmail.com Email: koppl@fdu.edu c The Author [2010]. Published by Oxford University Press. All rights reserved.

70 G. WHITMAN AND R. KOPPL depend on cognitive limits, we will assume that forensic scientists are free of them and in this sense rational. The NAS report identifies three key features of the institutional structure of forensic science in the USA today, namely, fragmentation, lack of oversight and dependence on law enforcement. The report neglects, however, the important fact of monopoly. Forensic science today is characterized by a twofold monopoly. First, evidence is typically examined by one crime lab only. In this sense, the crime lab receiving a bit of evidence has a monopoly on examination of that evidence. Second, that same lab will normally be the only one to offer an interpretation of the results of the examination it performs. No other experts in forensic science will be asked to judge what the evidence means. Our analysis implicitly recognizes that dependence on law enforcement is frequently combined with a monopoly in interpretation for the crime lab. We ignore, however, the monopoly in examination. Thus, our analysis is not meant to cover all ways in which current institutions may induce bias in forensic examinations. The NAS report relies on and contributes to a substantial literature identifying problems in the conduct of forensic science. Much of this literature identifies biases in the work of forensic scientists. Such biases are often regarded as irrational. Risinger et al. (1989), e.g. describe handwriting identification as irrational evidence (p. 779, note 213). Much of the most recent literature draws on standard cognitive science results showing that humans are not rational decision makers in the strict sense laid down by modern statistics and decision theory. 1 Dror and Charlton (2006) point to a class of errors by experts that represent epistemological problems that derive from the mechanisms of human cognition and the workings of the mind (p. 602). They view such errors as deviations from objectivity (p. 614). Dror et al. (2005) show that emotionally charged information, such as a grisly photograph of a murder victim, increases the probability of falsely matching fingerprints; in traditional rational choice models, such affective priming would not alter performance on cognitive tasks. The important article of Risinger et al. (2002) discusses biasing by domain-irrelevant information. While Risinger et al. do not label such induced biases irrational, they do not acknowledge that some induced biases may be consistent with rational choice. In this article, we argue that the current organization of forensic science induces biases in the conduct of forensic science even if forensic scientists are perfectly rational. In particular, we will model them as rational Bayesian statisticians. We recognize, indeed affirm, that they may not be and that irrationalities of human cognition may induce other biases or strengthen the biases we identify. Assuming that forensic examiners are flawless Bayesian statisticians, however, helps us to identify structural sources of error that we might otherwise have undervalued or missed altogether. Specifically, forensic examiners conclusions are affected not just by objective test results but also by two subjective factors: their prior beliefs about a suspect s likely guilt or innocence and the relative importance they attach to convicting the guilty rather than the innocent. The authorities police and prosecutors implicitly convey information to forensic examiners by their very decision to submit samples for testing. This information induces the examiners to update their prior beliefs in a manner that results in a greater tendency to provide testimony that incriminates the defendant. It may seem unnatural or inappropriate to model forensic scientists as rational in the rather narrow sense of flawless Bayesian statistician, although there is precedent for it (Phillips et al., 2001). In doing so, however, we are not saying that forensic scientists always understand and consciously apply statistical reasoning to their jobs. Many of our fellow economists would defend our model as 1 Savage (1972), originally published in 1954, is a classic statement of this conception of rationality.

RATIONAL BIAS IN FORENSIC SCIENCE 71 capturing factors that influence forensic scientists in more or less the way the model suggests, regardless of how those influences are experienced subjectively by the forensic scientist. This idea is a bit like saying that a motorist implicitly calculates a coefficient of friction when deciding whether to pass another motorist in the rain. There is something to this notion and we have supported a highly qualified version of it in the past (Koppl and Whitman, 2004). In this paper, however, we are making an even weaker claim. We are saying that the biases we identify are hard to get rid of. We are saying the biases we identify would exist even if forensic scientists were perfectly rational in the sense we have identified. We are saying that bias is a larger and more difficult problem than we might already have recognized in the context of the flawed mechanisms of human cognition noted by Dror and Charlton (2006) and others. 2 We suggest several changes to the organization of forensic science that would help to reduce the influence of such extrinsic factors. We endorse the NAS call for independence of crime labs from law enforcement (p. 6 7). We endorse the call of Koppl et al. (2008) for sequential unmasking, which requires sequencing the laboratory workflow such that evidentiary samples are interpreted, and the interpretation is fully documented, before reference samples are compared (Krane et al., 2008, p. 1006). In other words, examiners are exposed to potentially biasing information only after making decisions that might be biased by such information. We support blind proficiency testing. We make a qualified defense of evidence line-ups of the sort Miller (1987) describes. Finally, we follow Koppl (2005) in calling for a separation of test and interpretation. Test results would be forwarded for interpretation to forensic experts for both defense and prosecution. 2. Subjective judgement and categorical testimony in forensic science Forensic science evidence is often ambiguous, and the analysis is generally subjective. In spite of such ambiguity and subjective judgement, forensic science testimony is generally categorical, essentially conveying a match or no match decision. The tension exists within an institutional structure that frequently puts crime labs under the administration of law enforcement agencies. 2.1 Institutional dependence and subjective judgement Today, most forensic science work in our criminal justice system is conducted in government crime labs, and most of these labs are organized as a part of a police agency. The NAS report says, The majority of forensic science laboratories are administered by law enforcement agencies, such as police departments, where the laboratory administrator reports to the head of the agency (p. 6 1). The current process does not commonly produce multiple examinations by the defense experts or independent experts (Thompson, 1995, p. 167). Noting this fact, the NAS report says, prosecutors usually have an advantage over most defendants in offering expert testimony in criminal cases (p. S-8). Giannelli (2004) finds that the defense right to expert assistance recognized by the Supreme Court in the 1985 case Ake v. Oklahoma has not been effectively implemented (p. 1419). The NAS report makes repeated references to the extensive role of subjective judgement in forensic science. Forensic examination involves subjective judgement and interpretation at almost every step. Subjective judgements can go one way or the other. Thus, they may depend on factors extrinsic to the evidence. Importantly, for this paper, they allow the match decision to depend on the examiner s preferences and prior beliefs as well as the evidence. 2 Even the question of whether such mechanisms are flawed is contentious (Cosmides and Tooby, 1994).

72 G. WHITMAN AND R. KOPPL The NAS report underplays the importance of subjective judgement in DNA analysis. Even DNA profiling, however, often entails subjective judgement. The NAS report says that several historical facts combined to make DNA typing less subjective than it had been at its inception (p. 5 5). DNA test results are often unambiguous. Nevertheless, DNA tests sometimes produce ambiguous results that are subject to multiple interpretations and [w]hen interpreting ambiguous results... human analysts rely heavily on subjective judgments to distinguish signal from noise, explain anomalies, and account for discrepancies (Thompson and Cole, 2007, p. 34). Figure 1 illustrates an unambiguous case. The biological sample is prepared and run through a machine called a genetic analyser, which produces data that are represented as an electropherogram, which is then interpreted by a forensic scientist. As Fig. 1 reveals, an electropherogram is a squiggly line. The forensic scientist looks at the squiggly line to see where the peaks are. The pattern of peaks is the person s genetic profile. Because each peak represents an allele, the electropherogram will show one or two peaks at each locus depending on whether each parent contributed distinct or identical alleles at that locus. 3 The eletropherogram of Fig. 1 is unambiguous; there is no substantial question about how many peaks there are and where they appear. Figure 2 illustrates some of the ambiguities that can enter DNA profiling. When DNA is degraded or the original sample contains a small quantity of DNA, it can be difficult to distinguish between signal and noise. Is the spot labelled 12 in the figure high enough to be a peak? Perhaps, but it might be noise or a technical artifact, i.e. a random blip with no meaning. The spot labelled OL Allele would be recognized by DNA analysts as a blob (caused by some random impurity) and not a peak. But we do not know whether the blob hides a peak. With small or degraded samples, allelic dropout may cause some peaks to fall away for no particular reason and allelic drop in (contamination) may cause spurious peaks to appear. Thus, many profiles are consistent with the evidence represented in Fig. 2. 4 When dealing with small or degraded samples, it can be hard to know which squiggles of the electropherogram are meaningful peaks and which squiggles are meaningless noise. Thompson (2009) develops this and other examples at length. Under the protocols of most crime labs in the USA, he says, Whether the comparison of profiles results in a finding of inclusion, exclusion or inconclusive is an entirely subjective determination. Mixed samples are another source of ambiguity and subjective judgement. They are particularly challenging when potential contributors have several alleles in common..., when stochastic variations in peak heights occur, or when technical artifacts such as stutter, allelic dropout, and degradation/inhibition occur (Paoletti et al., 2005, p. 1). DNA analysts typically rely on subjective judgement to disentangle the threads and identify the genetic profiles of the separate DNA contributors. To help them correctly reconstruct the separate DNA profiles of the separate contributors to the biological sample, they generally have a suspect s DNA profile before them. In other words, they go into the test with a cheat sheet. Koppl et al. (2008) object to this practice. 3 Each of the marked intervals on the electropherogram corresponds to a locus, a particular stretch of DNA. Half of a person s DNA (ignoring mitochondrial DNA) comes from the person s mother and half from the father. An allele is the pattern found at a locus on one of those halves. Thus, at each locus, there will be one allele contributed by the mother and one possibly identical allele contributed by the father. A particular allele at a locus determines where a peak will appear within the interval of the electropherogram that corresponds to the DNA. Thus, a person who is heterozygotic (or heterozygous ) at a given locus will have two distinct alleles and an electropherogram with two peaks in the relevant interval. A person who is homozygotic (or homozygous ) at a given locus will have two copies of the same allele at that locus and, therefore, an electropherogram with one peak in the relevant interval. 4 Figures 1, 2 show only a three-locus segment of the full electropherogram.

RATIONAL BIAS IN FORENSIC SCIENCE 73 FIG. 1. An unambiguous electropherogram. Figure 1 reflects the fact that DNA analysts may interpret the crime scene evidence with the genetic profiles of competing suspects before them. This practice makes the DNA examination a multiple-choice test. We do not have data on the relative frequency of ambiguous and unambiguous DNA evidence in criminal casework in the USA. Such data would be difficult to gather in part because case files can be difficult to access (Gold, 2008) and the underlying evidence is sometimes destroyed (Greene and Moffiet, 2007). Several considerations suggest that DNA evidence may be ambiguous in a non-trivial fraction of cases, however. First, as Thompson (2009) notes, evidentiary samples are often mixtures

74 G. WHITMAN AND R. KOPPL FIG. 2. An ambiguous electropherogram. and mixtures are often ambiguous. Second, the use of partial samples is increasing (Moore, 2009) and such samples, when arising from degraded evidence, may produce the sort of ambiguity represented by Fig. 2. Finally, there is some evidence that some crime labs may be using very small samples, which gives rise to Low Copy Number or low template analysis (California v. Hector Espino, no. NA07660). The New York City Office of the Chief Medical Examiner issued a letter in November 2006 saying that the lab has been performing Low Copy DNA testing in criminal casework since January 2006. 5 Overall, it seems fair to say, as we have, that DNA evidence may be ambiguous in a non-trivial fraction of cases. Thompson (2009) provides further evidence supporting our view. According to the NAS report, subjective judgement also enters other forensic disciplines, including fingerprint examinations (p. 5 9), firearms identification, which is often called ballistics (p. 5 21), shoeprints and tire tracks (pp. 5 15, 5 20), handwriting comparisons (p. 5 29) and bloodstain pattern analysis (p. 5 29). Dror and Rosenthal (2008), Phillips et al. (2001) and Schwartz (2004, 2005) provide further evidence and analysis. The NAS report notes that the extreme disaggregation of forensic science training, practice and organization has produced different professional cultures and standards for performance, which raises the worrisome prospect that the quality of evidence presented in court, and its interpretation, can vary unpredictably according to jurisdiction (NAS Committee on Identifying the Needs of the Forensic Sciences Community, 2009, p. S 11). Phillips et al. (2001, p. 299) make a similar argument. When forensic labs are operated in conjunction with police and prosecutors, forensic examiners know that most or all of their samples will come from the authorities. We will demonstrate later that this fact alone generates a bias in the interpretation of evidence. But we should also note that typically the authorities explicitly inform the forensic examiners about the facts of the case and their theory of the crime (Thompson, 1995, especially pp. 153 154). 2.2 The match or no match assumption The model to be presented in Section 2 assumes that the forensic lab makes a binary decision after performing an examination: declare either a match or a no match. The NAS report notes, Many 5 A pdf scan of the letter is available from Koppl.

RATIONAL BIAS IN FORENSIC SCIENCE 75 terms are used by forensic scientists in scientific reports and in court testimony that describe findings, conclusions, and degrees of association between evidentiary material (e.g. hairs, fingerprints, fibers) and particular people or objects. Such terms include, but are not limited to match, consistent with, identical, similar in all respects tested, and cannot be excluded as the source of (p. S-15). Because such terms may imply different degrees of association, our assumption that forensic scientists indicate either match or no match is a simplification. Nevertheless, our assumption reasonably approximates the reality of forensic testimony in American courts today. DNA identifications, e.g. are typically accompanied by testimony about the random match probability, which may be as low as one in several quadrillion, essentially zero. Over-claiming tends to produce match or no match testimony. Cole clarifies Friedman (2003, pp. 1063 1064) by defining over-claiming as exaggerating the probative value of knowledge claims (2007, p. 817). It has been identified as a serious problem in forensic science (Garrett and Neufeld, 2009; Cole, 2007, especially pp. 821 825; Friedman, 2003). Vague language such as probable or very likely is not necessarily overclaiming, but it easily slips into it and, in any event, often amounts to match or no match testimony. Match or no match testimony is the standard in latent print testimony (Cole, 2007, pp. 820 821; Phillips et al., 2001, p. 298) and may be the norm in toolmark testimony Cole (2007, p. 821) and bite mark testimony (Cole, 2007, pp. 821 822; see also Nelson, 2005; McRoberts, 2004) Forensic techniques often lack the supporting research necessary to make statements of confidence levels (Phillips et al., 2001, p. 299). At present, forensic testimony in our courts seems to be characterized by exaggerated claims of certainty, minute random match probabilities and oracular pronouncement. Such testimony leaves the jury able to understand only whether or not there has, supposedly, been a match. For this reason, we consider our assumption that the forensic scientist s testimony consists in declaring either match or no match a reasonable approximation of actual practice. 3. The model We will now present a model of forensic decision making that shows how decisions by the authorities to submit samples affect the forensic lab s decisions about how to report test results. Arkes and Mellers (2002) have applied a similar model to juries. The main elements in our model were anticipated in Phillips et al. (2002), some of whose conclusions are similar to some of ours (2002, especially p. 299). Our paper has a different set of purposes than Phillips et al. articulate (2002, p. 294), however, and draws out a different set of implications. The literatures on likelihood ratios and Bayesian approaches to forensic evidence are large (Berry, 1991; Koehler, 1996; Aitken, 2000; Champod, 2000; Biedermann et al., 2008, 2009; Bolck et al., 2009). These literatures are devoted to whether and how different statistical methods should be self-consciously applied by forensic scientists, whereas we are using our model in a semi-descriptive manner to build our even if argument. 3.1 The forensic lab s decision process We assume that the authorities provide the forensic lab with a sample from a suspect on which the lab will run a test. Based on the test result, the lab will report either a match or a no match to a sample from the crime scene.

76 G. WHITMAN AND R. KOPPL Let δ be the lab s prior probability of the suspect being guilty and (1 δ) the prior probability of innocence. For an imaginary lab whose personnel always think that the police suspect just has to be guilty, δ would be close to 1. For an imaginary lab whose personnel always think that the police suspect just has to be innocent, δ would be close to 0. Presumably, the truth typically lies between these imaginary extremes. The value of δ depends, perhaps, not only on some overall assessment of the quality of local policing but also on factors such as the perceived overall rate of criminality in the relevant area and the profile of the suspect as perceived by lab personnel. Guilty and innocent suspects will generate different distributions of test results. Let r be a continuous variable corresponding to the result of the test. For illustrative purposes, it may be helpful to think of r as peak height on an electropherogram and to imagine that a sufficiently small peak height such as zero implies no match and a sufficiently large peak height implies match. In a more realistic example, r would be a vector somehow reflecting the differences between the suspect and crime scene electropherograms. For guilty suspects, test results are characterized by the probability density function f g (r) and for innocent suspects, f n (r). Assume that the mean test result for guilty suspects is strictly greater than that for innocent suspects. Thus, f g (r) tells us which values of r are more likely under the hypothesis that the suspect is guilty and f n (r) tells us which values of r are more likely under the hypothesis that the suspect is innocent. It is a simplification to assume that the probability density function for r depends on whether or not the police suspect is guilty. Someone other than the defendant might be the source of an evidentiary sample. For example, a test may be performed to determine whether DNA found on an alleged rape victim belongs to a consensual partner. To discuss source rather that guilt, however, would require us to enter into the complicated relationships that may exist between source and guilt. As important as such complexities may be in an individual case, they do not seem to affect the logic or inferences we develop in this paper. Given a test result r, the lab must update its belief about the likelihood of guilt. The updated probability will be as follows: Pr(guilty r) = It follows that the updated probability of innocence is: Pr(innocent r) = δ f g (r) δ f g (r) + (1 δ) f n (r). (1 δ) f n (r) δ f g (r) + (1 δ) f n (r). These expressions reflect the idea that guilt will seem to lab personnel the more probable hypothesis (relative to innocence) the higher δ and the higher f g (r) relative to f n (r). If there are some values of r that can only be generated by guilty suspects, and some values of r that can only be generated by innocent suspects, then for those values of r the updated probability of guilt will be either zero or one. But for any r that could in principle be generated by either an innocent or a guilty suspect, the lab needs to decide whether to report the result as a match or not. (Think of the spot labelled 12 in Fig. 2.) We suspect that this is the more typical case in many forensic disciplines. As r gets larger, it becomes more likely that the suspect is guilty, but no r is so high as to be absolutely definitive. Thus, the lab must decide on a threshold value of r above which the lab will report a match (and below which it will report no match). This is the interpretive role of the forensic scientist.

RATIONAL BIAS IN FORENSIC SCIENCE 77 To find the match threshold, we need to know the values the lab places on conviction. Let u be the utility of convicting a guilty suspect, and let d be the disutility of convicting an innocent. 6 We will refer to u and d together as conviction utilities. It is a simplification to interpret u and d as utilities relating to conviction rather than judgements of source. As with our probability density functions discussed earlier, this simplification allows us to ignore the complexities that can arise in the relationship between source and guilt. Moreover, when forensic scientists have outcome preferences other than truth, they seem to be related mostly to questions of guilt and innocence rather than source (Kelly and Wearne, 1998, pp. 15 17; Thompson, 1995, p. 154). The conviction utilities describe something implicit in the analyst s choices and not necessarily something experienced consciously as known preferences or as feelings of pleasure and pain. In the psychological process of choice, these utility values might be experienced or rationalized ex post as differences in other parameters. For example, an analyst with a low disutility of convicting the innocent may experience low d as the belief that false-positive errors are highly unlikely. This reassuring thought might support two rather different beliefs. On Thursday, it might support the analyst s sincere belief that a spot such as the one labelled 12 in Fig. 2 is a peak reflecting the corresponding peak on the suspect s profile. The following Tuesday, it might support the same analyst s equally sincere conviction that an entirely similar spot is a meaningless artifact consistent with the absence of a peak at the corresponding point on the profile of that day s suspect. 7 As we shall note again later, a rational decision maker may not be consciously aware of all determinants of his or her choices. For simplicity, let the lab assume that its finding will translate directly into a verdict. Then, the lab will report a match if and only if the expected value of a conviction is greater than zero: [ which simplifies to: δ f g (r) δ f g (r) + (1 δ) f n (r) ] u [ (1 δ) f n (r) δ f g (r) + (1 δ) f n (r) f g (r) f n (r) > 1 δ d δ u. ] d > 0, This inequality defines the match threshold. Three results are immediately apparent: 1. The threshold depends on the prior likelihood of guilt. The larger is δ, the lower is the righthand side, and thus, the more likely the threshold is to be met. This means that it is important to consider the source of the lab s prior. We will address this question in Section 3.2. 2. The threshold depends on the ratio of the disutility of convicting an innocent to the utility of convicting the guilty. The smaller is this ratio (i.e. the more the lab cares about convicting the guilty relative to not convicting the innocent), the more likely is the lab to announce a match. 3. The threshold depends on the relative likelihood of the test result having derived from an innocent versus a guilty person. If we assume, as seems sensible, that increasing r results in a greater relative likelihood of guilt (i.e. the ratio f g (r)/f n (r) is increasing in r), then a higher r will make the lab more likely to announce a match. 6 In other words, u is the utility gain from convicting rather than not convicting a guilty suspect and d is the utility loss from convicting an innocent rather than not convicting an innocent person. 7 Thompson (2009), gives several forms of evidence tending to show that forensic DNA analysts sometimes shift their criteria for a match based on the DNA profile of the suspect. Dror and Charlton (2006) and Dror et al. (2006) provide examples of the sort of reversal described in the text.

78 G. WHITMAN AND R. KOPPL (There are some special cases in which that assumption is not correct. If r is uniformly distributed, it is possible that an interval will exist in which the ratio f g (r)/f n (r) is not increasing in r, and thus, higher r does not make the lab more likely to announce a match. If r is discrete rather than continuous the lab s prior (δ) can be so high or so low that the lab report will be invariant to the test result.) 3.2 How the authorities influence the lab s decision process In the preceding section, the prior δ was taken as given. But it actually results from the updating of a prior prior, based on implicit information provided by the authorities i.e. the police or prosecutors who submitted the sample to the lab. When the authorities have a potential suspect, they will send a sample to the lab some fraction a of the time for guilty suspects and some fraction b of the time for innocent suspects. We assume a > b; i.e. the authorities ask for the lab s input when there is already independent reason to believe that the suspect is guilty and guilty parties are more likely to have generated such independent reasons. Some evidence supports our expectation that a > b. Risinger et al. (2002) report that Peterson et al. (1984) found that, on average, fewer than 10% of all reports disassociated a suspect from the crime scene or from connection to the victim (pp. 47 48). Risinger et al. note that, This high rate of inculpation comes from the fact that each piece of evidence connected with any suspect has a heightened likelihood of being inculpatory, since investigators do not select suspects or evidence at random, but only those they have some reason to think were connected to the crime. Thus, forensic scientists have a continuing expectation that the evidence before them is inculpatory (p. 48). Let γ be the lab s prior probability that someone chosen at random from the general population is guilty. (The argument would be the same if we considered relatively broad subpopulations such as black males or short white women fitting a victim s description or other relevant information.) Thus, γ is the prior for someone without a sample from that person having been submitted. The table below summarizes the distribution of cases as perceived by the lab: Case sent to lab Case not sent to lab Guilty person γ a γ (1 a) Innocent person (1 γ )b (1 γ )(1 b) Given this distribution, the lab will update its beliefs about as-yet-untested suspects like so: δ = Pr(guilt y submitted) = 1 δ = Pr(innocent submitted) = γ a γ a + (1 γ )b, γ a γ a + (1 γ )b. Thus, the lab s prior belief δ is actually an updated belief which takes into account the implicit information provided by the authorities decision to submit a sample at all. This updating occurs before forensic testing begins. It is straightforward to prove that δ, and thus δ/(1 δ), is increasing in a and decreasing in b. If the authorities were just as likely to send samples from innocent suspects as they were to submit

RATIONAL BIAS IN FORENSIC SCIENCE 79 samples from guilty suspects (i.e. if a and b were equal), then we would have δ/(1 δ) = γ /(1 γ ). But as a increases relative to b, the value of δ/(1 δ) rises, thereby making the lab s match threshold more likely to be satisfied. In short, the very choice to submit a suspect s sample to the lab makes the lab more inclined (than it would be otherwise) to announce a match, indicating that the suspect is guilty. 4. Information bleed Forensic science should reduce the error rates that would otherwise exist in the criminal justice system. If even the best forensic evidence were shown to increase error rates, then judges might be tempted to exclude it as prejudicial. To reduce errors in the criminal justice system, forensic scientists must bring new and independent scientific information to the process. As our model shows, however, the current organization of forensic science encourages forensic scientists to discount the scientific information they generate. This can be seen by expressing the match threshold as f n(r) f g (r) 1 δ δ d u < 1. The scientific information is f n(r) f g (r) 1 δ. It is discounted by the factor δ d u, which is likely to be less than one. In this sense, the organization of forensic science creates an information bleed. The organization of forensic science causes valuable scientific information to bleed out of the system, thus reducing the contribution forensic science makes to the criminal justice system. Information bleed turns a relatively objective scientific result into a relatively subjective personal judgement that depends on non-scientific information in the case file. As discussed in Section 1, forensic experts typically convey their conclusions to juries, rather than conveying evidence revealed in the testing process. In terms of our model, they tell the jury whether there was a match rather than telling the jury the ratio f n(r) f g (r). Forensic testimony incorporates some information provided by the authorities, which means that forensic testimony does not provide an independent source of information to the jury. As Bikhchandani et al. (1992, p. 1009) observe in the similar context of information cascades, the benefit of diverse information sources is attenuated when decision makers discount their information relative to that of others. When people infer information from the actions of prior decision makers, their own actions convey less information to future decision makers. Banerjee (1992, p. 798) rightly says,... the very act of trying to use the information contained in the decisions made by others makes each person s decision less responsive to her own information and hence less informative to others. In this case, the lab may ignore or give too little weight to the ratio f n(r) f g (r). Labs will sometimes engage in a verification procedure in which samples deemed a match by one examiner will be submitted to a second or even a third examiner to confirm the match. This is the case, for instance, with fingerprint matching. Remarkably, a fingerprint verifier is usually informed of the first examiner s conclusion. According to an Office of the Inspector General (OIG) Report,... although the verifier was aware of the fact that the first examiner had made an identification, the verifier would not know which features in the print were relied upon by the initial examiner in reaching his conclusion (OIG, 2006, p. 115). Applying the same logic as in our basic model, we predict that the verifier s updated assessment of the likelihood of guilt (δ) will be even larger than the first examiner s. He will therefore be more likely to announce a match than if he thought he was performing a first-round test. Thus, it is no surprise that a refused verification was as [sic] an extremely unusual event (OIG, 2006, p. 115).

80 G. WHITMAN AND R. KOPPL An underappreciated aspect of the ACE-V method of fingerprint identification encourages further information bleed (ACE-V stands for Analysis, Comparison, Evaluation and Verification. ). As the official guidelines of the Scientific Working Group on Friction Ridge Analysis, Study and Technology (SWGFAST) are currently written, they do not prohibit verification shopping, whereby a failure of verification may be ignored and another verification sought. SWGFAST (2006) guidelines say, When examiners have conflicting conclusions, a quality review shall be conducted. It is the responsibility of the agency to determine whether corrective action is appropriate. The required quality review must be documented, and such documentation must include A review of case documentation. The case file has all case documentation. Thus, the documentation of conflicting conclusions seems not to be intended for the case file, nor do the SWGFAST guidelines require that such documentation or even the fact of conflicting conclusions be included in the case file. While the SWGFAST guidelines sonorously pronounce certain types of conflicting conclusions to imply serious error, they require corrective action only as deemed appropriate (SWGFAST, 2006, p. 6). We might view the failure of SWGFAST guidelines to prohibit verification shopping as an inconsequential omission if we had reason to believe that practices and procedures in local agencies effectively prevented verification shopping. It seems clear that verification shopping is generally recognized to be inappropriate, even unethical. 8 Nevertheless, there are at least two documented cases in which verification shopping seems to have been tolerated as a matter of policy. A Seminole County fingerprint scandal erupted in Spring 2007 when latent print examiner Tara Williamson issued a memo accusing her co-worker Donna Birks of misbehavior and incompetence (Stutzman, 2007a,b; Williamson, 2007). One of her specific charges regarded shopping verifications. Birks could not get verification for a particular print from two persons she approached in the lab. The print was then sent to a retired [fingerprint] examiner [from the same office] who one year earlier medically retired early and had admittedly lost his eye for latent prints. This examiner should not have been deemed competent and not allow to verify the print for such reasons (see SWGFAST Quality Assurance Guidelines for Latent Print Examiners. p. 5. 4.2.4). (The underlining and grammatical error were both in the original memo.) Note that the problem was seeking verification from someone who was not competent. The problem was not shopping the verification. The second example comes from an official report on the case of Brandon Mayfield, whom the FBI mistakenly identified as the source of a print left at the scene of the Madrid train bombing. The [FBI s Latent Print Unit (LPU)] Quality Assurance Manual provided that if the second examiner reached a different conclusion, the matter must be referred to the supervisor and/or the Unit Chief for resolution. No formal statistics regarding the frequency of this occurrence have been maintained by the LPU, but LPU witnesses interviewed by the OIG stated that a refused verification was as an extremely unusual event. One option available to the supervisor was to select another verifier if the first verifier declined to confirm the identification. In that instance, there was no policy requiring that the first verifier s disagreement be documented in the case file. (OIG, 2006, p. 115) The report does not suggest that there was verification shopping in the Mayfield case. But it does reveal that it was considered just fine to shop your verifications. 8 See, http://thinkmarkets.wordpress.com/2008/12/06/the-technical-obsolescence-of-forensic-fraud/ for comments suggesting that most fingerprint examiners recognize that verification shopping is inappropriate, but that some instances of it are thought to be legitimate cases of conflict resolution.

RATIONAL BIAS IN FORENSIC SCIENCE 81 Verification shopping increases information bleed and increases the importance of subjective factors. In terms of the mathematical symbols of our model, verification shopping causes the ratio f n (r) δ f g (r) to depend in part on the ratios 1 δ and d u. Verification shopping is more likely the more ambiguous the underlying evidence is, the higher the prior probability is guilt is and higher the utility of convicting a guilty suspect relative to the disutility of convicting an innocent subject. 5. Applications & extensions The threshold condition for announcing a match suggests numerous applications and extensions. To summarize again, the threshold condition indicates (a) that a match is more likely to be announced when the prior probability of guilt is larger; (b) that a match is more likely to be announced when the ratio of disutility (from convicting an innocent) to utility (from convicting the guilty) is lower and (c) that a match is more likely to be announced when the probability of the test result for an guilty person over the probability of the test result for an innocent person is higher. 5.1 Nature of the crime Forensic conclusions can be affected by the perceived heinousness of the crime. Dror et al. (2005), in a controlled experiment on fingerprint examinations, showed that fingerprint examiners were more likely to find a match when they had been given a description of a crime involving physical harm to a person, such as murder or assault, and they were less likely to find a match when given a description of a crime involving no harm to a person, such as bicycle theft or burglary. 9 As mentioned earlier, these results could be interpreted as a form of irrationality (affective priming) interfering with performance of a cognitive task. But they also can be understood as rational in light of the threshold condition. If the threat of physical violence increases the utility of conviction (u) more than the disutility of conviction (d), which seems likely since violent acts inspire fear and greater desire to prevent future occurrences, then the d/u ratio should be lower for violent crimes, making a match more likely. Simply put, forensic examiners may have different preferences about different types of crime, and they rationally follow their preferences. Dror et al. (2005) also find that the greater tendency to find matches for violent crimes only occurred in the context of ambiguous comparisons (such as when fingerprints were smudged or smeared) and not in the context of unambiguous comparisons (when the prints were nearly identical or clearly different by experimental design). This again fits with the present model as an unambiguous match occurs when either f n (r) or f n (r) is zero; as a result, the threshold condition is automatically met or not met, regardless of the value of d/u. 5.2 Racial prejudice and stereotyping Racial prejudice and stereotyping can affect the forensic examination process in two distinct ways. First, some groups may correctly or not be regarded as more likely to commit offenses in the first place. If examiners are made aware of a suspect s membership in such a group, the prior probability of guilt will be revised upward, making a match finding more likely. 9 This study, unlike Dror and Charlton (2006), used naïve subjects rather than working fingerprint examiners.

82 G. WHITMAN AND R. KOPPL Second, the utility and disutility attached to conviction of guilty and innocent suspects may differ across groups. A white examiner might, consciously or unconsciously, care less about convicting an innocent member of a racial minority especially if he harbors the suspicion that the suspect probably did something wrong anyway, even if not the crime in question. He might also attach greater utility to convicting guilty members of a racial minority if he thinks that group s members are more likely to commit crimes in the future. As a result, d/u will tend to be lower for suspects in disfavored minorities, thereby raising the likelihood of a match being found. By contrast, groups looked upon more favorably and considered less likely to commit crimes than others such as Koreans, Jews and women will tend to have fewer matches found. If ex-post rates of conviction affect the formation of priors about different groups propensities to commit crimes, then we face the disturbing prospect of a feedback effect in which larger priors lead to higher convictions which lead to yet larger priors. Small or even non-existent differences between groups could lead to disproportionate differences in conviction outcomes. Bunzel and Marcoul (2008) reach such a result for arrest rates arising from traffic stops, although their model assumes an irrational overconfidence among police to motivate the initial misperception of relative rates of criminal behavior. 5.3 Pro-prosecution bias If forensic examiners identify with the prosecution more than the defense, perhaps because they are directly employed by the authorities or consider themselves part of that team, then the utility and disutility they attach to convictions will tend to be similar to those of the prosecution i.e. a smaller d/u ratio. As a result, we should expect more matches to be announced. This form of pro-prosecution bias, it should be noted, is not a cognitive error; again, it is simply a difference in preferences that leads to different results. Furthermore, the more confidence forensic examiners have in the competence of the authorities, the more likely they will be to announce a match. This follows from the contribution of the parameters a and b to the lab s updated prior. Greater confidence in the authorities competence corresponds to larger a (greater chance of submitting a sample from a guilty person) and smaller b (lower chance of submitting a sample from an innocent person), both of which will increase the lab s prior δ. Just as with violent-versus-non-violent crimes, the pro-prosecution bias will naturally have its greatest impact when test results are more ambiguous. When test results are unambiguous, the match threshold is automatically satisfied or falsified, and thus, the ratio of conviction utilities is irrelevant. At the other extreme, when the evidence is thoroughly ambiguous offering no further evidence of the suspect s guilt or innocence the lab s prior and conviction utilities will fully determine the outcome. This effect is best illustrated by the case of uniform distributions, where all ambiguous outcomes are interpreted as matches if the lab s prior is large enough. The variety of pro-prosecution bias described here can also help explain forensic malfeasance, i.e. cases in which a forensic examiner actually falsifies or misreports his test results. As shown in the case of discrete distributions, a lab s prior and conviction utilities can lead it to report a match despite a negative test result (so long as there is any chance of a false negative). This result might describe what happened in the case of Texas v. George Rodriguez, as summarized in an expert review of the case. 10 10 We do not know whether Bolding s apparent error was willful or not. We are claiming only that the hypothesis that it was willful might be explained in part by a pro-prosecution bias.

RATIONAL BIAS IN FORENSIC SCIENCE 83 [Houston pathologist] Jim Bolding s trial testimony... contains egregious misstatements of conventional serology. These statements reveal that either the witness lacked a fundamental understanding of the most basic principles of blood typing analysis or he knowingly gave false testimony to support the State s case against George Rodriguez. His testimony is completely contrary to generally accepted scientific principles. (Blake et al., 2004, p. 5) Pro-prosecution bias as a source of malfeasance also seems an accurate explanation for rogue forensic experts such as Steven Hayne and Michael West, both of whom have helped prosecutors convict numerous defendants on the basis of flimsy forensic evidence (Balko, 2007a,b). As Radley Balko reports in an investigative exposé of bad forensics, During the last two decades, there have been more than a dozen high-profile cases in which dubious forensic witnesses conned state and federal courts, sometimes for many years and in hundreds of cases (Balko, 2007b, p. 38). 5.4 Bogus techniques Some techniques arguably have little or no scientific validity. In the extreme, the distributions f n (r) or f n (r) will be identical, and thus, the matching threshold will depend entirely on prior beliefs and conviction utilities. For example, the FBI long attributed a greater epistemic value to compositional analysis of bullet lead ( bullet-lead analysis ) than now seems justified (National Research Council 2004). Solomon (2007) suggests that there may be hundreds of American prisoners today who were wrongly convicted on such evidence. Other questionable techniques include lip-print analysis and ear-print analysis (Moenssens et al., 1995; McRoberts and Possley, 2006; State v. Kunze). Nowdiscredited techniques of fire investigation contributed to the wrongful execution of Cameron Todd Willingham (Mills and Possley, 2004, Grann, 2009). One explanation for the testimony in these cases is simply that forensic experts mistakenly believed in false or unproven theories. But we suggest a complementary explanation: that some forensic analysts apply the techniques and then reach conclusions driven largely or entirely by their priors and conviction utilities. Their decision to behave in this way might be motivated in part by the aforementioned pro-prosecution bias. Our alternative explanation is complementary because (as we noted earlier) a rational decision maker may not be consciously aware of all determinants of his or her choices. 5.5 Battlefield forensics and the war on terror The NAS report notes the role of forensic science in homeland security and the extensive forensicscience capabilities of the U.S. Department of Defense (pp. 11-1 to 11-6). As part of the ongoing war on terror, the U.S. military now employs a set of techniques referred to as battlefield forensics. The training program for battlefield forensics deals with the proper handling, collection, and processing of combat related forensics. This training provides an organic capability to provide competent evidentiary exploitation in a variety of situations (Dudkiewicz, 2006). The goal, as described by Anh Duong of the U.S. Navy, is to [r]apidly process battlefield evidence in-situ to support judicial, tactical & strategic operations (Duong, 2007, p. 13). The results of battlefield forensics can be used, among other things, to decide which persons to capture and detain as terrorists or enemy combatants.

84 G. WHITMAN AND R. KOPPL Fear of terrorism should be expected to drive down the d/u ratio. This result is especially likely when an invading army deals with the local population, and even more so when suspected terrorists are mostly drawn from a different race or nationality, as is the case in Iraq, Afghanistan and other locations in the Middle East. Placing a very high value on American lives and low value on foreign lives (and liberties) naturally leads to a low d/u ratio. In addition, distrust of the local population also will tend to produce stronger initial priors about the likelihood of guilt, and trust in one s military compatriots will cause the updated prior to be yet larger (via the same mechanism described under pro-prosecution bias, above). As Koppl (2005, p. 262) argues, the institutional context can give rise to a kind of motivated reasoning among forensic workers. Lodge and Taber s (2000, p. 185) simple model of motivated political reasoning lists several factors that tend produce biased judgements, including (1) weak consequences from being wrong, (2) complex judgemental tasks, (3) ambiguous evidence and (4) time pressure. All four of these factors would naturally figure prominently in military forensics. No irrationality is necessary to explain such motivated reasoning in the case of battlefield forensics; instead, we need only look to the prior beliefs and utilities of the participants. The result is a very low evidentiary threshold for detaining suspected terrorists. 6. Potential remedies We suggest several possible changes in the way forensic science is performed. The primary goal was to make forensic testimony a better source of independent information to be considered by juries. Koppl (2005) reviews the literature and suggests a suite of reforms. Our discussion is restricted to reforms suggested by our model. 6.1 Independence Crime labs are typically organized under the police. As we have seen, this mode of organization tends to raise the lab s prior, δ, and reduce its d/u ratio, i.e. the relative cost of a type I error. Organizing crime labs under an independent body such as the court or the state or local health department would tend to combat these two effects. Giannelli (1997) argues for the creation of such independence by organizing crime labs under the office of the medical examiner. The NAS report endorses and recommends independence (p. 6-7). Problems of forensic medical examination in Mississippi (Balko, 2007a,b), however, suggest that such independence is no panacea. As Koppl notes, The Ryan Commission Report (State of Illinois, 2002) and Giannelli (1997) call for independence of forensic labs from law-enforcement agencies. The tendency of independence to reduce bias has been questioned in a minority opinion expressed in the Ryan report. The reality is that no matter how independent this separate state agency is, the bulk of its work will still be for police agencies and prosecutors (Illinois, p. 53). The value of independence depends on other simultaneous factors, such as how forensic labors are divided and whether labs are subject to competitive pressure. In the least favorable conditions, the minority opinion in the Ryan report is probably right. But in more favorable conditions, independence may reduce bias.

RATIONAL BIAS IN FORENSIC SCIENCE 85 Independence is not a stand-alone measure. When combined with other reforms, however, it should help to reduce inappropriate biases in crime labs. 6.2 Sequential unmasking and blind proficiency tests Under the current system, forensic examiners typically know that the authorities are more likely to submit a sample from someone who is guilty, i.e. a > b. The larger is the difference between a and b, the greater is the impact on the lab s prior and thus the greater is the chance of the lab announcing a match. Sequential unmasking and blind proficiency testing would reduce this form of bias. Sequential unmasking helps to reduce inappropriate biases by sequencing the laboratory workflow such that evidentiary samples are interpreted, and the interpretation is fully documented, before reference samples are compared (Krane et al., 2008, p. 1006). The reference sample is not, somehow, a secret to be hidden from the examiner. Rather, in the case of DNA typing, procedure should require the examiner to characterize the evidentiary sample ( call the alleles ) before viewing the reference sample. Information unmasked in a sequential manner is not permanently masked or hidden from the examiner. For this reason, Krane et al. (2008) coined the term sequential unmasking rather than using terms such as masking or blinding that inappropriately suggests that secrets must be kept from untrustworthy forensic scientists. Sequential unmasking would help in part by preventing the authorities from telling the forensic examiners their theory of the case. But this measure may not go far enough because the very act of submitting a sample conveys information. Blind proficiency tests would reduce the information content of such submissions. In a blind proficiency test, the lab taking the test thinks that it is (or at least may be) getting real crime scene evidence. In reality, however, the evidence has been constructed by the testing authority, which knows, therefore, what the right answer is. A blind proficiency test might be called a placebo sample. A medical placebo looks like medicine, but it contains no active ingredients. A placebo sample in forensic science looks like crime scene evidence, but there is no underlying crime because the placebo is constructed to test the lab s performance. Thus, we know the correct outcome for any forensic examination of a placebo sample. If the lab were periodically given blind proficiency tests, the value of b would rise relative to the value of a. Consequently, the submission of samples by the authorities would not increase the lab s prior by as much, and the match threshold would not be satisfied as easily. 6.3 Evidence line-ups Risinger et al. (2002) call for evidence line-ups to combat the almost built-in expectation that tested evidence will inculpate (p. 47). In terms of our model, they believe evidence line-ups would reduce the difference between a and b. An evidence line-up consists of one sample from the suspect plus several others. The lab would be kept ignorant about which sample came from the suspect. Instead of comparing the unknown hair found at the crime scene to the known hair from our suspect, e.g. the lab would compare the unknown hair to five known hairs, only one of which is from the suspect. As with blind proficiency tests (placebo samples), the predicted result is to reduce the impact of the authorities submission decision on the lab s prior by shrinking the difference between a and b. Forensic scientist Larry Miller (1987) conducted an experimental study of evidence line-ups for microscopic hair analysis. The results were dramatic: false positives fell from 30.4% under the

86 G. WHITMAN AND R. KOPPL traditional approach to 3.8% using line-ups (Miller 1987, p. 160). This result is consistent with our model s predictions. It is important to properly design the implementation of this reform, as illustrated by the similar case of police line-ups for eyewitnesses (Schacter et al., 2007). A properly designed system would ensure, e.g. that samples not taken from the suspect are appropriately similar to the suspect s sample. If the suspect s sample is too dissimilar from the others, the forensic science may mistakenly declare a match to what is merely the most salient sample. 6.4 Separation of test and interpretation A great deal of subjective interpretation happens in the process of performing a test and drawing conclusions from it. As we have seen, the decisions of crime labs are affected by factors unrelated to the physical or chemical outcome of the test. We suggest that, at least with some types of forensic analysis, it would be possible to separate the test from its interpretation. A lab could perform DNA testing in a case, for instance, without interpreting the electropherogram. Any conclusions would be drawn by two forensic consultants. (Giannelli, 2004, reviews legal arguments supporting a defense right to forensic expertise.) One forensic consultant would work (and possibly testify) for the prosecution, the other for the defense. Each could, and probably would, be biased but this is a virtue as the adversarial process would force them to reveal the assumptions on which their conclusions are based. Froeb and Kobayashi (1996) argue that, in the context of civil trials, the adversarial process helps to eliminate juror biases and produce full information decisions, so long as both sides may present evidence (see also Milgrom and Roberts, 1986.) Koppl and Cowan (2010) argue that a similar result holds in the criminal context. But even if the forensic consultants competing biases would not promote true judgements by jurors, we should not fool ourselves into thinking that no bias affects the conclusions of forensic examiners in the status quo. In terms of this paper s model, we are essentially suggesting that forensic examiners should f n (r) f g (r) reveal the ratio,, to the jury, rather announcing match or no match. The adversarial process will help to expose ambiguities in the evidence and subject them to open dispute. This approach leaves jurors free to apply their own priors and conviction utilities rather than implicitly relying on those of the crime lab. In today s system, by contrast, priors, conviction utilities and test results are all packed into a single Delphic affirmation of match or no match by a forensic scientist. This practice strips the jurors of their role as triers of fact and hands that function over to the crime lab. We recognize, of course, that much forensic analysis is highly technical in nature, and juries may have difficulty in understanding the claims made. Jurors could be confused by presentations of, e.g. electropherograms and discussion of peak heights, blobs and so on. The current system, however, dodges this problem only by shrouding the evidence in the perceived (but illusory) objectivity of the forensic expert. Koppl and Cowan (2010), citing Froeb and Kobayashi (1996) and Milgrom and Roberts (1986), argue that the competition between strongly opposed forensic consultants would tend to induce the full information decision by the naïve decision makers of the jury. 11 11 Creating a defense right to forensic experts may be necessary to ensure the effectiveness of other reforms. The beneficial effects of any reform tend to fade as the affected parties learn compensating behaviors. But because a body of defense experts similar to and allied with public defenders would act to ensure their own continued existence, the good effects of this reform are less likely to fade. The existence at trial of strongly opposed forensic consultants would tend to apply upstream pressure that would help upstream reforms retain their salutary effects.

RATIONAL BIAS IN FORENSIC SCIENCE 87 7. Conclusions Forensic science aspires to objectivity, but objectivity is a chimera. Subjective interpretation is unavoidable in the process of performing forensic work and reporting the results. The fact that forensic labs are generally part of the police organization and receive their samples almost exclusively from the authorities induces forensic examiners to alter their beliefs and thus behavior: they become more likely to announce incriminating matches than they would be otherwise. Consequently, juries as triers of fact are robbed of a potential source of independent information. This result occurs even if forensic examiners are perfectly rational Bayesians; the result could be yet worse if they are subject to cognitive errors and biases. We recommend reforms that reduce the influence of the conviction utilities and Bayesian priors of forensic scientists, thereby making them better sources of information for the criminal justice system. 8. Acknowledgements For helpful comments, the authors would like to thank David Croson, Pierre Garrouste, Keith Inman, Jay Koehler, Dan Krane, Michael Risinger, Mario Rizzo, Norah Rudin, Michael Saks, two anonymous referees, participants in the Colloquium on Market Institutions and Economic Processes of New York University, the 2007 meeting of the Southern Economic Association and the 2008 Forensic Bioinformatics Conference, The Science of DNA Profiling: A National Expert Forum. We thank William Thompson for providing Figs 1 and 2. We thank Dan Krane and Jason Gilder of Forensic Bioinformatics for special help in understanding DNA typing. REFERENCES AITKEN, C. G. G. (2000) Statistical Interpretation of Evidence/Bayesian Analysis, in Siegel, Jay, Geoffrey Knupfer, and Pekka Saukko editors, Encyclopedia of Forensic Sciences, New York: Academic Press, pp. 717 724. ARKES, H. R. & MELLERS, B. A. (2002) Do Juries Meet Our Expectation? 26(6) Law and Human Behavior 625 639. BALKO, R. (2007a) Indeed, and Without a Doubt. Reason Online, 2 August 2007. Downloaded from http://reason.com/news/show/121671.html. BALKO, R. (2007b) CSI: Mississippi. Reason Magazine, November 2007, 36 48. BANERJEE, A. V. (1992) A Simple Model of Herd Behavior. 107(3) Quarterly Journal of Economics 797 817. BERNSTEIN, D. E. (2008) Expert Witnesses, Adversarial Bias, and the (Partial) Failure of the Daubert Revolution. 93(2) Iowa Law Review 451 490. BERRY, D. A. (1991) Inferences Using DNA Profiling in Forensic Identification and Paternity Cases. 6(2) Statistical Science 175 205. BIEDERMANN, A., BOZZA, S. & TARONI, F. (2008) Decision Theoretic Properties of Forensic Identification: Underlying Logic and Argumentative Principles. 177 Forensic Science International 120 132. BIEDERMANN, A., BOZZA, S. & TARONI, F. (2009) Probabilistic Evidential Assessment of Gunshot Residue Particle Evidence (Part I): Likelihood Ratio Calculation and Case Pre-assessment Using Bayesian Networks. 191 Forensic Science International 24 35. BIKHCHANDANI, S., HIRSHLEIFER, D. & WELCH, I. (1992) A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades. 100(5) Journal of Political Economy 992 1026.

88 G. WHITMAN AND R. KOPPL BLAKE, E. T., NEWALL, P., SENSABAUGH, G., SHALER, R., SINGER, R. L. & STOLOROW, M. D. (2004) Peer Review Report Texas v. George Rodriguez. Government Printing Office, Washington, DC. Available from the authors (Whitman & Koppl) on request. BOLCK, A., WEYERMANN, C., DUJOURDY, L., ESSEIVA, P. & VAN DEN BERG, J. (2009) Different Likelihood Ratio Approaches to Evaluate the Strength of Evidence of MDMA Tablet Comparisons. 191 Forensic Science International 42 51. BUNZEL, H. & MARCOUL, P. (2008) Can Racially Unbiased Police Perpetuate Long-run Discrimination? 68 Journal of Economic Behavior and Organization 36 47. CHAMPOD, C. (2000) Overview and Meaning of ID, in Siegel, Jay, Geoffrey Knupfer, and Pekka Saukko editors, Encyclopedia of Forensic Sciences, New York: Academic Press, pp. 1077 1084. COLE, S. (2003) Fingerprinting: The First Junk Science? 28(1) Oklahoma City University Law Review 73 92. COLE, S. (2007) Where the Rubber Meets the Road: Thinking About Expert Evidence as Expert Testimony. 803 Villanova Law Review 819 824. COSMIDES, L. & TOOBY, J. (1994) Better than Rational: Evolutionary Psychology and the Invisible Hand. 84(2) American Economic Review 327 332. DROR, I. E. & CHARLTON, D. (2006) Why Experts Make Errors. 56(4) Journal of Forensic Identification 600 616. DROR, I., CHARLTON, D. & PÉRON, A. E. (2006) Contextual Information Renders Experts Vulnerable to Making Erroneous Identifications. 156 Forensic Science International 74 78. DROR, I. E., PÉRON, A. E., HIND, S.-L. & CHARLTON, D. (2005) When Emotions Get the Better of Us: The Effects of Contextual Top-down Processing on Matching Fingerprints. 19 Applied Cognitive Psychology 799 809. DROR, I. E. & ROSENTHAL, R. (2008) Meta-analytically Quantifying the Reliability and Biasability of Forensic Experts. 53(4) Journal of Forensic Sciences 900 903. DUDKIEWICZ, S. (2006) Battlefield Forensics Training. Army G2 Information Technology: Note to the Field, September 2006. Downloaded from http://www.universityofmilitaryintelligence.us/mipb/article.asp? articleid=540&issueid=40. DUONG, A. N. (2007) U.S. Navy Biometrics: An Overview. Downloaded 27 October 2007 from http://www.biometrics.org/bc2007/presentations/ Thu Sep 13/Session I/13 Duong DOD.pdf. FRIEDMAN, R. (2003) Squeezing Daubert Out of the Picture. 33 Seton Hall Law Review 1047 1070. FROEB, L. M. & KOBAYASHI, B. H. (1996) Naive, Biased, yet Bayesian: Can Juries Interpret Selectively Produced Evidence? 12 Journal of Law, Economics, and Organization 257 276. GARRETT, B. L. & NEUFELD, P. J. (2009) Invalid Forensic Science Testimony and Wrongful Convictions. 95(1) Virginia Law Review 1 97. GIANNELLI, P. C. (1997) The Abuse of Scientific Evidence in Criminal Cases: The Need for Independent Crime Laboratories. 4 Virginia Journal of Social Policy and the Law 439 478. GIANNELLI, P. C. (2004) Ake v. Oklahoma: The Right to Expert Assistance in a Post-Daubert, Post-DNA World. 89(6) Cornell Law Review 1305 1419. GOLD, M. (2008) Va. DNA Project Is In Uncharted Territory: Legal Experts Seek More Transparency. Washington Post, 17 August 2008. Downloaded 20 August 2008 from http://washingtonpost.com. GRANN, D. (2009) Trial by Fire. New Yorker, 7 September 2009. Downloaded 7 September from http://www.newyorker.com/reporting/2009/09/07/090907fa fact grann?currentpage=all. GREENE, S. & MOFFIET, M. (2007) Bad Faith Difficult to Prove. The Denver Post, 22 July 2007. Downloaded 28 January 2009 from http://www.denverpost.com/evidence/ci 6429277. KELLY, J. F. & WEARNE, P. (1998) Tainting Evidence: Inside the Scandals at the FBI Crime Lab, New York: The Free Press.