BOSTON UNIVERSITY SCHOOL OF MEDICINE. Thesis CHARACTERIZATION OF ERROR TRADEOFFS IN HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY

Size: px
Start display at page:

Download "BOSTON UNIVERSITY SCHOOL OF MEDICINE. Thesis CHARACTERIZATION OF ERROR TRADEOFFS IN HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY"

Transcription

1 BOSTON UNIVERSITY SCHOOL OF MEDICINE Thesis CHARACTERIZATION OF ERROR TRADEOFFS IN HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY THRESHOLD FOR DNA MIXTURE INTERPRETATION By JACOB SAMUEL GORDON A.B., Harvard University, 2005 Submitted in partial fulfillment of the requirements for the degree of Master of Science 2012

2 Copyright by JACOB SAMUEL GORDON 2012

3 Approved by First Reader Catherine M. Grgicak, M.S.F.S., Ph.D. Instructor, Biomedical Forensic Sciences Second Reader Robin W. Cotton, Ph.D. Associate Professor, Biomedical Forensic Sciences

4 ACKNOWLEDGEMENTS Catherine, you made it your personal mission to ensure that this thesis was a success. You were willing to take me on as your mentee with a compressed timeline, introduced me to an engaging problem that I enjoyed attacking, and accommodated many post-work conferences. Thank you for helping me finish and for providing me with direction throughout the last couple months. Dr. Cotton, I have received nothing short of endless support since you became my academic adviser. You helped steer me through the program and are always thinking of interesting people with whom I should speak and churning away at ideas to help guide my future. Thank you for helping me refine my thesis thoughts down the stretch. Mom and Dad, I truly would have nothing without you. You have given me everything of yours and sacrificed throughout my life so that I never missed out on anything. I have learned every important skill and life strategy from you and have not achieved anything without your endless support, calming patience, guidance, and fajitas. Melissa and Amy, you have always been blindly loyal to me through triumphs and setbacks, and the only times you haven t been in my corner were when you were battling me yourselves to ensure that I was imbued with just the right amount of pervasive humbling. You are my best friends. Because of you four, I am the happiest guy I know. iv

5 CHARACTERIZATION OF ERROR TRADEOFFS IN HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY THRESHOLD FOR DNA MIXTURE INTERPRETATION JACOB SAMUEL GORDON Boston University School of Medicine, 2012 Major Professor: Catherine M. Grgicak, M.S.F.S., Ph.D., Instructor, Biomedical Forensic Sciences ABSTRACT DNA analysts considering a forensic evidence sample and a reference sample (e.g., from a suspect) have three options when rendering a decision with regard to the consistency between the samples: exclusion, inclusion, inconclusive. Complicating this determination is the reality that DNA profiles originating from forensic evidence may not be fully observed due to allelic drop-out and/or the presence of overlapping alleles. Different analyst inclinations and laboratory standards exist for informing an analyst s decision; typically and particularly for samples demonstrating some degree of allelic drop-out reference samples exhibiting less than exactly 100% allelic overlap are not automatically precluded from inclusion in an evidence sample. In tolerating some measure of absence of a reference sample s alleles in an evidence sample, the potential for two kinds of errors exists: In a case in which an individual could not have contributed to an evidence sample, there is the potential for false inclusion; in a case in which an individual could have contributed, there is the potential for false exclusion. In selecting a particular decision criterion to inform determinations of inclusion or exclusion, a tradeoff v

6 between these antagonistic errors exists. A lax decision criterion minimizes false exclusions at the expense of false inclusions while a strict criterion eschews false inclusions at the expense of greater numbers of false exclusions. The relevance of a decision criterion is greatest for low-template samples and for samples that are mixtures of multiple contributors since both are likely to experience allelic drop-out and thus to occupy a potential gray area between certain exclusion and likely inclusion. In this study, databases of simulated mixtures and laboratory mixtures are compared with databases of simulated excluded and included individuals. In order to generate credible genetic profiles, the phenomena of allelic drop-out and profile mixing are modeled. Given this framework, the universe of possible decision criteria is explored. Receiver Operating Characteristic curves, a type of analysis originally applied to assessing World War II radar performance, are adopted as a paradigm for summarizing the tradeoff of both types of errors. The a priori balancing of these errors as specified by a laboratory s standard operating procedures defines a complexity threshold that will determine, before the process of sample interpretation is undertaken or inclusion/exclusion statistics are calculated, whether a mixture or low copy number sample ultimately holds any evidentiary value. When a sample does in fact hold evidentiary value, the complexity threshold will have specified a predetermined decision point to inform determinations of exclusion versus inclusion. vi

7 TABLE OF CONTENTS TITLE PAGE... i COPYRIGHT PAGE ii READER APPROVAL PAGE... iii ACKNOWLEDGEMENTS... iv ABSTRACT... v TABLE OF CONTENTS... vii LIST OF TABLES... x LIST OF FIGURES... xi 1 INTRODUCTION Comparing Forensic Evidence Samples and Reference Samples DNA Profile Comparison Conclusions Arising from Profile Comparisons Inclusion Statistics Consideration of Error Rates Error Rate Analysis Using the Paradigm of Receiver Operating Characteristics METHODS Overview Error Analysis Study Using Simulated Mixture Data Simulation Materials Simulation Model vii

8 Modeling Allele Drop-out Generating Populations for Comparison Simulating Mixtures Simulating Excluded Individuals Simulating Included Individuals Comparing Populations and Counting Allelic Discrepancies Validation Study Using Laboratory Mixture Data Profile Typing Materials and Methods Data Interpretation Framework Organizing Comparison Results Making Determinations of Exclusion or Inclusion RESULTS AND DISCUSSION Error Analysis Study Using Simulated Data Comparing Simulated Mixtures to Simulated Excluded Individuals Comparing Simulated Mixtures to Simulated Included Individuals Validation Study Using Laboratory Mixture Data Comparing Laboratory Mixtures to Simulated Excluded Individuals Comparing Laboratory Mixtures to Simulated Included Individuals Impact of Analyst Decision Threshold on Expected Errors Receiver Operating Characteristic (ROC) Analysis Rollup ROC Results for Simulated Mixture Data Rollup ROC Results for Laboratory Mixture Data viii

9 4 CONCLUSION REFERENCES VITA ix

10 LIST OF TABLES Table 1: Example Profile Comparison between Three Reference Samples and an Evidence Sample... 3 Table 2: Inclusion/Exclusion Criteria from The German Stain Commission: Recommendations for the Interpretation of Mixed Stains (reproduced from [7])... 5 Table 3: Steps to Mixture Interpretation from the DNA Commission of the International Society of Forensic Genetics (reproduced from [10])... 7 Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM ( Characteristics below quoted from [7])... 8 Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21]) Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality Table 7: Example of Excluded-Individual-to-Mixture Comparison Table 8: Notional Table of False Positive & True Positive Rates for Different Decision Thresholds x

11 LIST OF FIGURES Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and Person Figure 6: Flow Describing Pristine Mixture Profile Generation Figure 7: Flow Used to Generate Perturbed Mixture Profiles Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded Individuals Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture Profiles) Figure 10: Example Histogram Tabulating Discrepancies Between 10,000 References and 1 Mixture Figure 11: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies. 39 Figure 12: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies (two-color) Figure 13: Summary histograms: Excluded-Individual-to-Mixture Comparisons (No Drop-Out) xi

12 Figure 14: Results Comparing Excluded Individuals to Simulated Mixtures (No Drop- Out) Figure 15: Results Comparing Excluded Individuals to Simulated Mixtures, Pr(D) φ= Figure 16: Results Comparing Included Individuals to Simulated Mixtures (No Drop- Out) Figure 17: Results Comparing Included Individuals to Simulated Mixtures, Pr(D) φ= Figure 18: Results Comparing Excluded Individuals to 1:1 Laboratory Mixtures Figure 19: Results Comparing Included Individuals to Laboratory Mixtures Figure 20: Example Simultaneous Visualization of Exclusion & Inclusion Comparisons Figure 21: Example ROC Plot Figure 22: Compilation of Exclusion & Inclusion Data Using Simulated Mixtures Figure 23: Error Analysis Results: ROC Rollup Using Simulated Mixtures Figure 24: Notional Complexity Threshold: Simulated Mixture Results Figure 25: More Realistic Complexity Threshold: Simulated Mixture Results Figure 26: Complexity Threshold Based on Blackstone s Ratio: Simulated Mixture Results Figure 27: Compilation of Exclusion & Inclusion Data Using Laboratory Mixtures Figure 28: Complexity Threshold: Laboratory Mixture Results xii

13 1 INTRODUCTION The goal of an analyst when considering DNA evidence is to determine whether a reference sample could have contributed to the genetic profile expressed in an evidence sample. The final determination as to whether the two samples could have originated from the same source may have significant implications in an ongoing investigation or for how compelling a jury finds a prosecutor s theory of a crime (or, alternatively, how compelling they find a suspect s defense). At its core, identifying individuals through DNA profiling is accomplished by examining particular regions (i.e., loci) that are part of an individual s genetic makeup and are highly variable (i.e., polymorphic) among individuals. These loci are contained on two sets of structures (i.e., chromosomes) containing a person s DNA, with one set inherited from each parent. A person s DNA sequence is defined by a collection of four nucleotides adenine, cytosine guanine, and thymine, conventionally abbreviated as A, C, G, and T, respectively. The use of DNA as a genetic fingerprint was introduced in 1985 by Alec Jeffries et al. [1] who described a method of simultaneously detecting hypervariable minisatellite regions in human DNA in other words, a collection of exploitable polymorphic loci that could collectively be of forensic utility to individuate a person [2]. Around the same time, Kary Mullis et al. [3] published their work on the polymerase chain reaction (PCR), a method that allowed for the amplification and quantification of trace amounts of DNA, which are often encountered in forensic work. More modern methods have introduced increased automation [4] and efficiency [5] to the process, but 1

14 the seminal works of Jeffries et al. and Mullis et al. formed the foundation on which DNA profiling has grown. The introduction of PCR technology into forensic laboratories was accompanied by a shift from using large minisatellite regions, consisting of 1000s 10,000s of bases to short tandem repeats (STRs), regions that typically range from ~70 ~500 base pairs. STRs are similar to variable number of tandem repeat (VNTR) regions in that they both consist of tandemly repeating units of DNA, where the number of repeat units (i.e., alleles) is variable in the population. However, STRs are shorter, which allows for efficient amplification of these regions and for increased sensitivity in DNA profiling. At its heart, successfully typing or profiling an individual amounts to a signal detection problem. First, biological evidence is collected from a crime scene, evidence sample, or person of interest. Sources of biological evidence that can yield DNA samples include but are not limited to blood, semen, saliva, urine, feces, teeth, bone, hair, skin cells, or other biological tissue samples [6]. DNA is then extracted from the biological sample and quantified using quantitative polymerase chain reaction (qpcr). Next, a separate round of PCR amplifies the STR loci of interest. Detection is accomplished by separating amplicons based on allele size and by the use of fluorescent dyes. The identification of which alleles are expressed in the sample is determined by applying a threshold to distinguish a collection of fluorescently-detected electronic signals allele peak heights/areas from a background of noise in an electropherogram. The results are then interpreted and a comparison between an unknown (i.e., the evidence) and a known (i.e., the standard) is made. 2

15 Although getting to the point of producing a DNA profile from a sample involves many nuanced steps from collecting and storing an evidence sample to extracting and amplifying the evidentiary sample s genetic profile the DNA profile that is ultimately produced consists of the STR alleles at each locus tested, where STR loci are generally typed during forensic DNA testing. 1.1 Comparing Forensic Evidence Samples and Reference Samples DNA Profile Comparison The first step before comparing evidence and reference samples involves amplifying the evidence sample s loci of interest and interpreting which alleles are present. This process leads to an evidence profile, which consists of a list of alleles observed to be present at each amplified locus within the evidence sample. The reference sample is also amplified and its observed alleles noted. This leads to a reference profile. The comparison of the two involves determining which of the reference alleles are present in the evidence sample. Table 1 provides an example comparison between the alleles detected at four loci for three reference samples and those detected for an evidence sample. Table 1: Example Profile Comparison between Three Reference Samples and an Evidence Sample The alleles present at each locus in the Evidence Sample are compared with those present at the same locus for a reference sample. Each reference sample is considered in turn. Alleles Locus 1 Locus 2 Locus 3 Locus 4 Evidence Sample 5, 10 5, 6, 7, 8 9, 10, 11 4, 9, 13 Reference 1 5, 10 5, 8 10, 11 9, 9 Reference 2 5, 10 7, 8 9, 11 4, 13 Reference 3 5, 10 6, 6 10, 10 8, 9 3

16 Assuming all alleles are detected in the evidence sample, a comparison between the alleles at each of the loci from each reference sample would lead an analyst to a conclusion for that reference sample. For example, the alleles present in Reference 1 are also present at every locus in the Evidence Sample. If all alleles are detected and peak height and contributor ratios are not considered, Reference 1 would be included as a potential contributor to the DNA mixture in Evidence Sample 1. Similarly, Reference 2 would also be included as a potential contributor since all alleles present in the individual s genotype are also present in the evidence profile. In contrast, not all of Reference 3 s alleles are present in the Evidence Sample; while all of the alleles at Loci 1, 2, and 3 are present, Allele 8 at Locus 4 is not present in the Evidence Sample. A strict insistence that every reference allele be present in the evidence would lead an analyst to exclude Reference 3. However, if allelic drop-out is suspected due to low-level amplification conditions, Reference 3 may be included as a potential contributor despite the allelic discrepancy at Locus 4. Under ideal circumstances, the inclusion or exclusion of an individual as a contributor to an item of evidence would be straightforward: If and only if 100% of the reference alleles are detected in the evidence sample can a reference be included as a contributor to the evidence; otherwise, the reference is excluded. In reality, such determinations are not so straightforward due to allelic drop-out during low-template DNA amplification, as evidenced by the example contained in Table 1. 4

17 1.1.2 Conclusions Arising from Profile Comparisons Given profiles generated from evidence and reference samples, a DNA analyst compares the two and attempts to make a determination as to whether the contributor of the reference sample could have contributed to the evidence sample. Thoughtful comparison of the two samples leads an analyst to one of three conclusions, whose criteria have been delineated by The German Stain Commission [7] and are contained in Table 2. Table 2: Inclusion/Exclusion Criteria from The German Stain Commission: Recommendations for the Interpretation of Mixed Stains (reproduced from [7]) Conclusion Inclusion Criteria If all alleles of a person in question are uniformly present in a mixed stain, the person shall be considered a possible contributor to the stain. Exclusion Gray Area between Inclusion and Exclusion If alleles of a person in question are not present in a mixed stain, the person shall not be considered as a possible contributor to the stain. The following effects may occur in [mixtures with no major component(s) and evidence of stochastic effects] due to imbalances between the mixture components and may cause difficulties in reaching an unambiguous decision about inclusion or exclusion across all analyzed DNA systems: - Locus drop out and allelic drop out (e.g., caused by the sensitivity of the amplification system, as well as by stochastic effects). - Allelic drop out is more likely to occur for longer than for shorter alleles, and in particular for DNA systems with long amplicon sizes. If the samples do not amplify efficiently, there are too many contributors present in an evidence stain, and/or there is too little starting template, no reliable comparison can be made between reference and evidence profiles. The ability of an analyst to accurately 5

18 conclude whether a reference sample should be included as a possible contributor to an evidence sample is dependent upon the quality or complexity of the sample. Currently no standard to determine such a quality factor is offered in the literature. Assuming both profiles amplify and produce readable electropherograms that are not overwhelmingly diluted in discriminatory power due to the presence of many contributors, assessing allelic commonality is trivial in the case where there are no common alleles between samples: The individual could not have contributed to the evidence sample. The assessment is equally trivial in the case of complete allelic overlap: The individual almost certainly contributed. The case where more-than-zero and less-than-all of the alleles are in common is the case of interest and the case most relevant for considering forensic evidence profiles, which are often subject to some combination of complicating factors. One metric that describes the degree of exclusion of two profiles is the number of allelic discrepancies between them; here, degree of exclusion is defined as the extent to which sets of alleles from different samples fail to overlap. As the probability of dropout increases, the number of discrepancies between a true contributor and an evidence sample increases. Work by Tvedebrink et al. [8] has attempted to determine the probability of drop-out with respect to an allele s average electropherogram peak height (corrected for diploidy); the peak heights are taken to be robust indicators of quantity of DNA contributed [9]. But little work on defining the implications for an analyst s ability to accurately include or exclude a contributor has been provided. 6

19 The steps to interpreting a mixture according to the DNA Commission of the International Society of Forensic Genetics [10] are contained in Table 3. Table 3: Steps to Mixture Interpretation from the DNA Commission of the International Society of Forensic Genetics (reproduced from [10]) Interpretation Steps Action Step 1 Identify the presence of a mixture Step 2 Designation of allelic peaks Step 3 Identify the number of contributors in the mixture Step 4 Estimation of the mixture proportion or ratio of the individuals contributing to the mixture Step 5 Consideration of all possible genotype combinations Step 6 Compare reference samples However, there is a decision that is actually or implicitly made prior to Step 3: Will this mixture sample lead to a credible interpretation? In other words, is the mixture of sufficient quality i.e., not too complex to be confidently interpreted? Mixture types have also been proposed and have generally been parsed into three main types. An example scheme for mixture typing according to the German Stain Commission [7] and the Scientific Working Group on DNA Analysis Methods (SWGDAM) Mixture Interpretation Guidelines [11] is contained in Table 4. 7

20 Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM ( Characteristics below quoted from [7]) Mixture Type Type A Mixtures / Indistinguishable Mixtures Characteristics No obvious major contributor with no evidence of stochastic effects Type B Mixtures / Distinguishable Mixtures Type C Mixtures / Uninterpretable Mixtures Clearly distinguishable major and minor DNA components; consistent peak height ratios of approximately 4:1 (major to minor component) across all heterozygous systems, and no evidence of stochastic effects No major component(s) and evidence of stochastic effects Inclusion Statistics Any legal declaration of consistency between a reference standard and an evidentiary profile, dubbed inclusion in the forensics community, must be contextualized statistically [11]. In the realm of DNA analysis, this statistic however calculated aims at informing the jury of the probability that a randomly selected individual could not be eliminated as a contributor to a given stain or of how much more likely the prosecution s hypothesis is than the defense s hypothesis. The first kind of statistic is calculated via the Random Man Not Excluded (RMNE) method [12] while the latter is calculated via the Likelihood Ratio (LR) method [13]. Calculation of the RMNE statistic or, synonymously, of the Combined Probability of Inclusion (CPI) is performed by considering all combinations of feasible genotypes that could have contributed to the evidence sample. Only evidence sample loci where alleles are above the stochastic threshold are considered in the 8

21 computation. Notably, when unrestricted CPI is invoked as the statistical method, quantitative information such as peak height or area is ignored. Although the number of contributors theoretically has no bearing on whether CPI could be used as the statistic, it does have a significant effect on the assumption that all alleles have been detected. Despite this consideration, the RMNE approach may be considered appropriate when dealing with an indistinguishable evidence mixture where drop-out of alleles from all contributors is deemed unlikely. [14,15] For a sample consisting of m loci, with each locus L containing n alleles {α L,1, α L,2,, α L,n } with associated probabilities of occurrence p(α L,1 ), p(α L,2 ),, p(α L,n ), the Random Man Not Excluded (RMNE) statistic is given by Equation 1. RMNE = CPI = m L = 1 m n PI L = p(α L L = 1 A= 1, A ) 2 Equation 1: Random Man Not Excluded (RMNE) Statistic m total number of loci contained in sample n number of alleles contained at a particular locus α L,A a particular allele A at a particular locus L p(α L,A ) probability of α L,A occurring in population PI L probability of inclusion at locus L CPI combined probability of inclusion Accepting the stipulations of Recommendation 4.1 of the Second National Research Council Report (NRC-II) [16], which requires an assumption of within-locus independence based on Hardy-Weinberg equilibrium and the associated inbreeding coefficient θ, yields Equation 2. Unlike the RMNE statistic, the LR must assume a number of mixture contributors to the evidence sample and employs knowledge of a reference sample s genotype. 9

22 RMNE = m n 2 n p( α L, A ) + θ p( α L, A ) α L, A ) L= 1 A 1 A= 1 [ 1 p( ] = Equation 2: RMNE Using NRC-II Recommendation 4.1 m total number of loci contained in sample n number of alleles contained at a particular locus α L,A a particular allele A at a particular locus L p(α L,A ) probability of α L,A occurring in population θ inbreeding coefficient A LR can be calculated either with or without taking peak height information into account as well as with or without taking the probability of allelic drop-out into account. The likelihood ratio gives the ratio of the conditional probabilities of encountering an evidence profile given competing prosecution and defense hypotheses. While the likelihood ratio makes use of more information (e.g., a reference sample s genotype) in a more robust manner, it is both harder to calculate and fundamentally reliant on assumptions (e.g., number of mixture contributors, defense hypotheses) that can be difficult to justify. In the most complex scenarios, multiple conclusions for various numbers of contributors may need to be stated for a single item of evidence. This would results in two or more LR s with no clear indication of which LR is the best estimate. [15,17-19] For a given evidentiary profile E and two competing hypotheses i.e., the prosecutor s hypothesis H P and the defense s hypothesis H D the likelihood ratio LR is given by Equation 3. 10

23 Pr( E H LR = Pr( E H Equation 3: Likelihood Ratio (LR) Statistic Pr(E H P ) probability of encountering an evidence profile E given prosecution hypothesis H P Pr(E H D ) probability of encountering an evidence profile E given defense hypothesis H D P D ) ) Equation 4 is based on the general formulation by Weir et al. [20] for computing the likelihood ratio for x unknown contributors carrying a set of alleles U that are included in evidence sample E under a given hypothesis H. Prx ( U LR = Pr ( U x H H P D E) E) Equation 4: General Formulation of Likelihood Ratio (LR) E evidence sample containing a set of alleles H P prosecution hypothesis H D defense hypothesis U H E set of alleles contained in evidence sample E but not contributed by contributors specified by hypothesis H Pr x (U H E) probability, given x contributors, of observing allele set U given evidence sample E and contributors specified by hypothesis H The mathematical rigor involved in computing a statistic as well as the apparent precision of the numerical result belie the complexity associated with asking analysts, attorneys, judges, and juries to actually interpret the number that is ultimately calculated. If the evidence is neither overwhelmingly strong nor underwhelmingly weak, deciding what weight to assign to the evidence can be subjective. Evett and Weir [21] furnish a framework that attempts to guide interpretation of the statistical result by linking a quantitative likelihood ratio with a qualitative indication of the degree to which the evidence backs the prosecution s hypothesis; the prescriptions are found in Table 5. This 11

24 scheme is helpful in eliminating some subjectivity from the evidence support but fails to connect qualitative notions of evidence strength with quantitative notions of error incidence. Limited support, for instance, may equate to error rates in the determination of a reference sample s inclusion that are unacceptably high. Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21]) Likelihood Ratio Range Degree of Support Provided by Evidence for Prosecution s Hypothesis 1 to 10 Limited 10 to 100 Moderate 100 to 1000 Strong 1000 and greater Very Strong Word notes, The statement of inclusion under these scenarios may have little meaning Inconclusive or insufficient for comparison purposes may be the more appropriate conclusion in some cases [22]. Despite the interpretive difficulties involved in using an inclusion statistic, the literature is filled with an abundance of scholarly work dedicated to reporting inclusion probabilities and likelihood ratios for multifarious scenarios when only qualitative allelic information is considered [23], when the number of contributors is ambiguous [24,25], when multiple hypotheses are entertained [26], when a reference sample is identified via a database search [27], and when reporting prior odds affects interpretation of the LR [28,29]. By contrast, little has been published on exclusion criteria or complexity criteria. Although the LR approach attempts to handle allelic loss through incorporation of the probability of drop-out in the computation [30], a minimal amount of 12

25 work has been published regarding a criterion that may be used by analysts to decide when a profile contains usable information. That is, is there a point when an evidentiary profile contains too many contributors and/or too much drop-out to be considered a reliable item of evidence for comparison purposes? Blackstone s ratio, a central moral and legal principle, asserts: better that ten guilty persons escape than one innocent suffer [31]. This is bolstered by the notion: The DNA tests that we routinely use in our laboratories are designed to be exclusionary tests. That is, testing is performed under the premise that an individual who is not the source of the DNA with a single-source profile or who is not one of the sources in a mixture of DNA is expected to be excluded from the DNA sample [22]. The allure of increased detection of criminals must be chastened by consideration of the specious detection of innocents. Invoking this reverse perspective seems pedantic until one considers the technological advancements employed in DNA testing as well as the implications of DNA testing to criminal justice policy and practice. Advances include the emerging abilities to detect DNA in minute quantities of sample for example, from fingerprints [32], latex gloves [33], fingernail remains [34], skin cells [35], single hairs [36], clothing [37], and cigarettes [38] as well as the ability to discern and analyze mixture samples [20,23,39-42]. The extraction of potentially individualizing information from degraded or partial profiles and/or from individual profiles whose characteristics are potentially masked by the presence of another s profile is being pursued more often with the increasing commercialization of new, more sensitive amplification chemistries [43,44]. 13

26 Although touch and otherwise low-level samples are now more routinely submitted for DNA typing, a number of issues are associated with testing these types of low-template samples. By its nature, the attempted detection of scant quantities of an evidentiary sample carries with it the risk of not fully observing all the alleles that truthfully comprise the sample under investigation; this is the phenomena of allelic dropout [13]. At the same time, because the detection threshold must be relaxed and extra amplification cycles might need to be added to produce a discernible signal, concurrent risk exists for detecting spurious evidentiary signal(s) due to contamination, stochastic variation, or adventitious deposition [45,46]; this is the phenomena of allelic drop-in. Issues relating to heterozygous peak-height balance and the increased prevalence of stutter products also need to be considered [47]. The ramifications for evidence interpretation given this mélange of complications 1 are significant. For example, if a laboratory uses the CPI statistic, loci with alleles below the stochastic threshold cannot be used for inclusion. If all alleles of an evidence profile exhibited alleles less than the stochastic threshold, then the evidence theoretically would not be used for comparison, and the likely outcome would be that a reliable comparison cannot be made. Similarly, incorporating the probability of drop-out (Pr(D)) into the LR would decrease the weight of the evidence since more random people could be included. 1 Error rates associated with sample collection, extraction, and the amplification process itself have been well documented [62-64]. These errors, which range in pervasiveness from the mislabeling of laboratory sample tubes to the contamination of an evidence collection kit by a crime scene investigator, would preface the types of profile interpretation errors that are assessed in Sections 2 and 3 and are not considered in this analysis. 14

27 1.2 Consideration of Error Rates Despite the growing literature on determination of the LR, little in the way of examining whether a profile should even be analyzed is considered; concomitantly missing is a treatment of the Type I and Type II errors associated with a given likelihood ratio. Further, models that use linear mixture analysis [41,48], Monte Carlo Markov Chains [49], or least-square deconvolution [50] have been proposed, but in each case, the fact that the profile is interpretable has been presumed. The decision of whether to embark on the analysis path towards sample comparison needs to be determined prior to comparison to a reference sample (whether qualitatively by the analyst or quantitatively via computational software); furthermore, how to contextualize an inclusion statistic either RMNE or LR within the framework of error rates is critical. Accordingly, an investigation of the error rates associated with interpreting successfully amplified DNA profiles with a focus on uncovering a complexity threshold to help constrain the conditions under which interpretation of a low-copy or multiplecontributor profile is even attempted is necessary. A method to accomplish these goals is proposed and detailed in subsequent sections by employing Receiver Operating Characteristic analysis Error Rate Analysis Using the Paradigm of Receiver Operating Characteristics Receiver Operating Characteristic (ROC) analysis originated in World War II as a means for radar operators to set their detection thresholds in a manner that optimized the 15

28 tradeoff between false alarms (i.e., spurious target detections) and leakage (i.e., undetected targets) [51]. The ROC parameter space of a classification scheme can be organized in a contingency table or confusion matrix that maps data instances into one of four classes relative to the actual and determined classes of those instances. Within the realm of DNA analysis, the classification scheme involves including or excluding a reference profile from an evidentiary profile, and the data points consist of reference-to-evidence comparisons. The true or actual classes depend on whether a reference sample really is included in an evidentiary sample while the determined classes represent the conclusions of an analyst. This is distinct from the issue of whether a reference sample ought to be included analytically. As outlined in Section 2.4.2, this study models an analyst s determination of exclusion or inclusion deterministically. In doing so, different determinations, which are based on specific decision criteria, will specify when an analyst ought to include or exclude a reference as a contributor to an evidence sample. The agreements and deviations between those prescriptions (i.e., ought to be included versus ought to be excluded ) and reality (i.e., possible contributor versus non-contributor ) form the basis of this study. Considering whether a standard is a possible contributor versus an actual contributor is a subtle but important distinction. For example, if the mixture sample contains alleles 14, 15, 16 at a particular locus, then individuals who ought to be included could have any of the following allele pairings at that locus: (14,14); (14,15); (14,16); (15,15); (15,16); or (16,16). This is different than confining the error analysis to 16

29 those individuals that did actually contribute to a particular mixture since an analyst cannot usually know the exact genotype of a particular contributor among the universe of possible included genotype combinations. Thus, the focus is on the decision criteria themselves; no attempts to model any potential errors committed by an analyst s actions independent of the prescriptions of his laboratory s standard operating procedures are considered. Table 6 shows the four possible outcomes coupled with the consequences of an analyst s decision for the comparison of a given reference sample with an unknown. Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality H 0 : The mixture profile does not contain the individual s profile (i.e., the individual is excluded). H 1 : The mixture profile contains the individual s profile (i.e., the individual is included). A false negative occurs when an included individual is improperly excluded as a contributor to a mixture, while a true positive occurs when an included individual is correctly included as a mixture contributor. A true negative occurs when an excluded individual is correctly excluded as a mixture contributor, while a false positive occurs when an excluded individual is improperly included as a contributor to a mixture. Reality Analyst Action H 0 false; H 1 true H 0 true; H 1 false Fail to reject H 0 ; Accept H 0 False Negative Type II Error True Negative Reject H 0 ; Accept H 1 True Positive False Positive Type I Error Comparisons between an evidence sample and an individual who ought to have been excluded can result either in a correctly excluded individual (i.e., a true negative) or in an incorrectly included individual (i.e., a false positive). Thus, all comparisons that fall within the bottom row of Table 6 come from comparisons involving individuals who could not have contributed to the mixtures to which they are being compared. Comparisons between an evidence sample and an included individual (i.e., possible contributor) can result either in a correctly included individual (i.e., a true 17

30 positive) or in an incorrectly excluded individual (i.e., a false negative). Thus, all comparisons that fall within the top row of Table 6 come from comparisons involving individuals who could have contributed to the mixtures to which they are being compared. Within the realm of error analysis, a decision that results in a truthfully negative sample being judged to be positive (i.e., a false positive) is known as a Type I error or error of the first kind. The rate at which false positives occur, which is typically denoted by α, is the conditional probability of including an individual as a contributor to a mixture given that the individual ought to be excluded. The sensitivity of a test, also known as hit rate or recall, is equivalent to the true positive rate and is given by 1 α. A decision that results in a truthfully positive sample being judged to be negative (i.e., a false negative) is known as a Type II error or error of the second kind. The rate at which false negatives occur, which is typically denoted by β, is the conditional probability of excluding an individual as a contributor to a mixture given that the individual ought to be included. The specificity of a test is equivalent to the true negative rate and is given by 1 β. Initially applied to radar detection thresholds, ROC analysis has found utility in general problems involving signal detection across multiple disciplines, including speech recognition and music detection [52], face detection and recognition [53], and vibrationbased structural health monitoring of bolt loosening [54], among many others. It has also been successfully applied to clinical settings involving disease diagnosis [55] and 18

31 provides a useful, reductive lens through which to process the allelic discrepancy data contained in Sections 3.1 and 3.2. Given databases of mixtures, excluded individuals, and included individuals which will originate from simulation (Section 2.2) and from the laboratory (Section 2.3) assessments will be made as to the relative error rates resulting from varying decision criteria for declaring a reference profile as either excluded or included from a given evidence mixture sample, where the decision criterion will be dependent upon the number of discrepant alleles between the two samples. 2 METHODS 2.1 Overview Two studies were conducted: an error analysis study that considered a simulated mixture database (Section 2.2) and a validation study that considered a laboratory mixture database (Section 2.3). Results for each of these studies (Sections 3.1 and 3.2, respectively) were generated by comparing these mixture databases to two different simulated individual databases: one consisting of individuals verified to be excluded as mixture contributors; the other consisting of individuals verified to be included as mixture contributors. In this study, the mixture profiles were taken to be the forensic evidence samples, and the individual profiles represented the reference samples. 19

32 2.2 Error Analysis Study Using Simulated Mixture Data For the error analysis study using simulated mixture data, the population under study was a database of simulated mixtures, which were compared to a database of simulated excluded individuals and a database of simulated included individuals Simulation Materials All simulated genetic profiles and subsequent analyses were accomplished using MATLAB Version (R2007b) (MathWorks, Natick, Massachusetts) Simulation Model Within the simulation framework, profiles were represented as a collection of alleles determined to be present at each of the 15 autosomal loci contained in the AmpFlSTR Identifiler Amplification Kit (Applied Biosystems, Foster City, CA): D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vwa, TPOX, D18S51, D5S818, FGA, D5S818, FGA [43]. Amelogenin, the gender-determining locus, was not considered since it is not a hypervariable locus providing discriminatory power. The alleles observed and tabulated by Butler et al. [56] in their 2003 population study using Identifiler were taken as the universe of realizable alleles for the purposes of simulating profiles. Butler et al. observed 59 distinct allele calls over all autosomal STR loci: 5, 6, 7, 8, 8.1, 9, 9.3, 10, 10.3, 11, 12, 12.2, 13, 13.2, 14, 14.2, 15, 15.2, 16, 16.2, 17, 17.2, 18, 18.2, 19, 19.2, 20, 21, 21.2, 22, 22.2, 22.3, 23, 23.2, 24, 24.2, 25, 25.2, 20

33 26, 27, 28, 29, 29.2, 30, 30.2, 31, 31.2, 32, 32.2, 33, 33.1, 33.2, 34, 34.2, 35, 36, 37, 38, 39. The collection of alleles observed at each particular locus, along with each allele s associated subpopulation frequency among Caucasians is diagrammed in Figure 1. Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles The genetic model employed when simulating profiles was based on Butler et al. s 2003 subpopulation study of Caucasians using the Identifiler kit [56]. All observed alleles (i.e., the common alleles) at each of the 15 autosomal STR loci are represented by the integer topping each vertical line, and each allele s associated subpopulation frequency is represented by the height of its respective vertical line. Profiles of individuals were generated by randomly selecting two (not necessarily distinct) alleles for each of the 15 autosomal loci in the Identifiler kit. For any given locus, a list of alleles was constructed that consisted of all alleles with non-zero subpopulation frequencies with respect to the 302 Caucasians observed by Butler et al. Two alleles were selected at random from this locus-specific allele list according to the subpopulation frequencies observed by Butler et al. (and represented in Figure 1). For 21

34 instance, for the CSF1PO locus, the following alleles were observed (with the associated subpopulation frequencies in Caucasians): allele 8 (with a frequency of ); allele 9 ( ); allele 10 ( ); allele 11 ( ); allele 12 ( ); allele 13 ( ); and allele 14 ( ). The simulation selects a 9 allele 1.159% of the time and a 12 allele % of the time. A genotype consisting of alleles 8 and 13 at the CSF1PO locus would be selected = % of the time 2, whereas a homozygous CSF1PO locus consisting of the alleles 11 and 11 would be selected = % of the time. This random selection is consummated for each of the 15 autosomal loci in the Identifiler kit according to Butler et al. s observed allele frequencies for each allele at each locus. Each individual profile, then, consists of a collection of two alleles at each of the fifteen loci. Recasting this information in the form of a matrix served the dual purpose of neatly summarizing an individual s profile in a logical manner as well as setting up the data in a computationally efficient manner. Matrix representation of profiles leveraged MATLAB s vectorized analysis environment to facilitate fast data manipulation in performing essential operations, such as summing individuals profiles to simulate mixtures and comparing two profiles for the presence of common alleles. Thus, an amplification result for an individual was represented as a 59 x 15 matrix, with the 59 rows corresponding to the universe of possible alleles at all loci and the 15 columns corresponding to those particular loci. (Row 1 corresponded to allele 5 2 The factor of 2 that is included in the product of heterozygous allele frequencies to account for the combinatorial fact that a genotype consisting of allele 8 and allele 13 (in that order) is equivalent to a genotype consisting of allele 13 and allele 8 (in that order). 22

35 and proceeded in a monotonically increasing fashion through row 59, which corresponded to allele 39.) The loci order corresponded to the ordering listed above with column 1 corresponding to the D8S1179 locus and column 15 corresponding to the FGA locus.) The matrix entries represented relative allele prevalences for that profile. The relative presence of an allele allows for a simple model of allele expression as either absent, present heterozygously, or present homozygously for single-source profiles while not taking into account signal intensity, the number of contributors, or relative contributor ratios for mixed profiles. For example, a given single-source profile matrix consisted of entries of relative prevalences of 0, 1, or 2. An entry of zero corresponded to the absence of that allele for that particular locus (e.g., a zero in the 8 th row and 1 st column indicates that the reference did not have a 10 allele at the D8S1179 locus); an entry of unity corresponded to the presence of a heterozygote allele at a particular locus (e.g., a one in the 42 nd row and 2 nd column indicates that the reference s D21S11 locus is heterozygous and that one and only one of the two alleles possessed at this locus is a 29); an entry of 2 represents a homozygous allele at the specified locus (e.g., a two in the 11 th row and 3 rd column indicates that the reference has two 12 alleles at the D7S820 locus). The profiles of all individuals were assumed to have exactly two alleles at a given locus. For an individual profile I with 15 loci L, each consisting of two alleles α L,1 and α L,2, Equation 5 models the resulting profile. 23

36 I = 15 α L,1 α L,2 ( I + I ) = L= L= 1 A= 1 I α L, A α L, A I Equation 5: Model of Individual Profile allele A contained at locus L for individual I I (complete) individual profile Figure 2 shows a graphic representation of an example single-source profile using the Dirac delta function. Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele s associated subpopulation frequency is represented by the height of its respective vertical line. This example individual s profile is equivalently represented in Figure 3 as a matrix. 24

37 Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye colors in the Identifiler kit (except in the case of the black font, which corresponds to a dye color of yellow). Because of space limitations, only the first and last allele values are identified on the y-axis. In place of potentially illegible numbers, a relative allele prevalence of 0 is represented as whitespace, while a red box indicates a relative prevalence of 1, and a blue box indicates a relative prevalence of 2. Mixtures were generated by summing the matrices of a given number of contributors. A mixture of two people could have matrix entries corresponding to relative allele prevalence in the range 0 4 depending on the degree of allelic overlap between contributors. (For example, if two contributors were homozygous for the same allele at a particular locus, the resulting mixture matrix entry for that locus s allele would be 4.) 25

38 Therefore, in general, for a mixture M 1, created by contributions from two individuals I 1 and I 2 with 15 loci L, each consisting of two alleles α L,1 and α L,2, the resulting alleles expressed at each locus in the mixture profile are modeled by the simple sum shown in Equation 6. M 1 = I 1 + I 2 = 15 α L,1 α L,2 ( I S + I ) = 1 S2 L= c= 1 L= 1 A= 1 I α S L, A c Equation 6: Model of a Two-Person Mixture Profile α L,A allele A contained at locus L S c single-source profile for individual c M 1 (complete) profile for mixture 1 Figure 4 and Figure 5 show graphic representations of an example mixture of Person 1 and Person 2 s profiles as a collection of Dirac delta function plots and as a matrix, respectively. Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person 2 The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele s associated subpopulation frequency is represented by the height of its respective vertical line. 26

39 Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and Person 2 The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye color in the Identifiler kit (except in the case of the black font, which corresponds to a dye color of yellow). The first and last common alleles are identified on the y-axis, where a relative allele prevalence of 0 is represented as white-space; a red box indicates a relative prevalence of 1; a blue box indicates a relative prevalence of 2; a green box indicates a relative prevalence of 3; and a magenta box indicates a relative prevalence of 4. In this scenario, allele detection has effectively been reduced to a binary system such that each allele is deterministically either present or absent. This ultimately manifests itself as a relative prevalence number instead of a peak height or area. Modeling different mixture ratios between contributors would be required to fully encompass the potential effects of allelic drop-out. Since the drop-out model employed in this study incorporates consideration of relative allele prevalence, all results assume a 1:1 mixture ratio. 27

40 Modeling Allele Drop-out If no drop-out is assumed, summation of the single-source matrices would result in a mixture profile with all contributed alleles detected; this was considered a pristine mixture profile and is akin to instances in casework in which the mixture proportion ratio is 1:1 with a total DNA mass input of greater than 0.5 ng into the amplification process [57]. To account for instances where lower targets of DNA are amplified, allele drop-out needed to be modeled. To accomplish this, pristine mixtures were perturbed for varying proportions of drop-out for a heterozygous allele from 0 to 1 in increments of 0.1. Here, a drop-out level of 0 means that all of the alleles were detected; in this case, the perturbed mixture is identical to the pristine mixture. A non-zero level of dropout corresponded to the proportion of time a heterozygously-present allele (i.e., an allele with a relative prevalence of 1) was not detected. For example, for a single-source sample with a drop-out proportion of 0.1, each allele with a relative prevalence of 1 stood a 10% chance of not being detected. Random numbers drawn separately for each allele at every locus according to the specified proportion of drop-out determined whether a given allele actually dropped out. For alleles within a profile that were contributed multiple times (i.e., had a relative prevalence greater than unity) either from an individual contributor being homozygous at that locus or from overlapping alleles between contributors their increased prevalence diminished the probability that that particular allele would drop-out. Therefore, an allele that is twice as prevalent in a mixture is half as likely to completely drop-out while an allele that is four times as prevalent is one-quarter as likely to drop-out. 28

41 Thus, for a particular mixture allele α at a particular locus L with a relative prevalence φ, the expression used to describe the probability of drop-out Pr( α L A D ), φ for that particular mixture allele, given a specified probability of drop-out for a heterozygous allele Pr( D ), is given by Equation 7. φ = 1 Pr( D) α φ Pr( D) = φ φ 1 L, A = Equation 7: Probability of Allele Drop-out α L,A allele A at locus L φ relative prevalence of allele (e.g., 0, 1, 2, 3, 4) Pr( D) specified probability of drop-out for a heterozygous allele Pr( φ = 1 α L A D ), φ realized probability of drop-out for allele A at locus L Whether an allele actually dropped out or remained observable was determined through a random number draw weighted with the appropriate probability of drop-out Pr( α L A D ), φ Generating Populations for Comparison Simulating Mixtures First, profiles for 100,000 single-source samples were simulated to serve as possible mixture contributors. Random profiles from this set were selected two-at-a-time and their profiles combined into a mixture. A quality check was employed to ensure that an individual s and potential mixture contributor s profile was not selected multiple times, which would have led to a single person contributing twice to a single mixture. As described in Section , the combination of the two profiles was achieved by 29

42 summing the matrix profiles from each contributor to arrive at a matrix profile that represented the pristine mixture under the condition of no allele drop-out. This selection of mixture contributors and summing of contributor profiles was repeated 10,000 times. The final collection of pristine mixture profiles comprised part of the simulated mixture database; these 10,000 mixture profiles (with no drop-out) collectively represented one mixture set. Perturbations of the pristine mixtures, generated by applying varying levels of allelic drop-out (i.e., 0.10 to 0.90 in increments of 0.10), contributed the rest of the mixture sets, resulting in a total of 10,000 mixtures/set 10 sets = 100,000 simulated mixtures. These simulated mixtures were later used for comparison to simulated excluded and included single-source profiles. Figure 6 provides a flow chart representing the process of simulating pristine mixture profiles, along with the associated inputs that defined the parameters of this particular study. 30

43 Figure 6: Flow Describing Pristine Mixture Profile Generation Results contained in Sections 3.1 & 3.3 are for npop number of simulated potential contributor individuals = 100,000, ncontribs number of mixture contributors = 2, & nmix number of mixtures to simulate = 10,000. From this master set of pristine mixture profiles, perturbed mixtures were generated, each modeling a different level of allelic drop-out. This flow is depicted in Figure 7. Figure 7: Flow Used to Generate Perturbed Mixture Profiles Results included in Sections 3.1 & 3.3 incorporate a range of allele drop-out rates increasing from 0% to 90% in increments of 10%. Pr( D) specified probability of drop-out for a heterozygous allele φ = 1 31

44 Simulating Excluded Individuals For the exclusion component of the simulation study, 10,000 possible individual profiles were simulated in the same manner as they were for the mixture contributors to create a database. Subsequently, when comparisons were being made between these excluded types and a particular mixture, a quality check ensured that a given reference profile deemed to be excluded did not in fact contain a collection of alleles that overlapped 100% with the mixture profile under consideration; in other words, the profiles were confirmed to be excluded. This confirmed that the correct relationship between the references and the mixtures was in fact exclusion. Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded Individuals Results provided in Sections 3.1 and 3.3 are for npop number of simulated potential contributor individuals = 100,000, nindivs number of simulated included individuals = 10,000, ncontribs number of mixture contributors = 2, & nmix number of mixtures to simulate = 10,

45 Figure 8 demonstrates the expanded simulation methodology as well as the relationship between the mixture contributor profiles and excluded reference profiles, along with the relevant simulation inputs Simulating Included Individuals For the inclusion component of the simulation study, the same number of comparisons between single-source and mixture profiles was desired as that which took place for the simulated excluded individuals. In the exclusion component of the simulated mixture study, 10,000 excluded individuals were compared recursively to 10,000 mixtures, resulting in a total of 10,000 individuals 10,000 mixtures = comparisons. Unlike the exclusion component of the study in which a single population of individuals was simultaneously excluded from all mixtures, separate populations of included individuals are needed for every individual mixture; in other words, for 10,000 mixtures, distinct sets (each containing 10,000 individuals) are needed, with each set appropriate for comparison to a single mixture. Thus, a total of 10,000 included individuals per mixture 10,000 mixtures = to arrive at an equivalent number of comparisons total included individuals are needed To generate the set of included individuals for a given mixture, that mixture s matrix profile was considered. In the case of no allelic drop-out, the universe of possible alleles considered for reference profile generation previously set to mirror Butler et al. s observed subpopulation frequencies was collapsed to include only those alleles that were represented in the mixture. Once all of the alleles for each locus had been

46 identified, the frequencies of allele incidence at a given locus were renormalized to 1 using Butler et al. s published frequencies [56]. This ensured the relative subpopulation frequencies between alleles was maintained while limiting the pool of possible alleles. In general, at locus L for a mixture containing n alleles {α L,1, α L,2,, α L,n } with corresponding subpopulation frequencies {f L,1, f L,2,, f L,n }, the resulting renormalized frequency R L,m of an allele α L,m is given by Equation 8. R L, m f L, m = n i = 1 f L, A i Equation 8: Renormalized Allele Frequency for Generation of Included Individuals A collection of alleles at locus L with non-zero allele frequencies A i ith allele in set A n number of alleles in A f L,m subpopulation frequency of allele m at locus L R L,m renormalized subpopulation frequency of allele m at locus L For the D16S539 locus, for example, the observed alleles (along with their subpopulation frequencies) were alleles 8 ( ), 9 ( ), 10 ( ), 11 ( ), 12 ( ), 13 ( ), 14 ( ). If a given mixture only contained alleles 9, 10, and 14, then the re-normalized frequencies for the generation of individuals would be allele 9 ( ), 10 ( ), 14 ( ), and all included individuals would contain one of the following genotypes (α 1,α 2 ): (9,9), (9,10), (9,14), (10,10), (10,14), or (14,14). 3 It should be noted that restricting the included individuals only to include the mixture contributors that actually contributed to these simulated mixtures provides an 3 (α 1,α 2 ) is genotypically equivalent to (α 2,α 1 ). 34

47 insufficiently large population (i.e., consisting of only two individuals) from which to make comparisons. The actual mixture contributors represent only a subset of all individuals that could have contributed to a given profile. That is, just because the combination of C 1 and C 2 s profiles resulted in a collection of alleles in mixture M 1 does not necessarily exclude the possibility that the combination of C 3 and C 4 s profiles could produce that same combination of alleles. In fact, the possibility that any given person could have reasonably contributed to a mixture increases in likelihood as the number of contributors and thus the total collection of mixture alleles increases. To ease the computational burden of simulating extraordinary quantities of individuals until one was serendipitously included in a given mixture, the prescribed methodology constrains the simulation space to produce included individuals in a more efficient manner. Such individuals that are forced by simulation to have profiles included in a given mixture should be determined to be potential contributors by an analyst. Figure 9 depicts the simulation process for generating included individuals, along with the simulation inputs used in this study. Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture Profiles) Results provided in Section 3.1 and 3.3 are for nindivs number of simulated included individuals = 10,000, ncontribs number of mixture contributors = 2, & nmix number of mixtures to simulate = 10,

DNA for Defense Attorneys. Chapter 6

DNA for Defense Attorneys. Chapter 6 DNA for Defense Attorneys Chapter 6 Section 1: With Your Expert s Guidance, Interview the Lab Analyst Case File Curriculum Vitae Laboratory Protocols Understanding the information provided Section 2: Interpretation

More information

LRmix tutorial, version 4.1

LRmix tutorial, version 4.1 LRmix tutorial, version 4.1 Hinda Haned Netherlands Forensic Institute, The Hague, The Netherlands May 2013 Contents 1 What is LRmix? 1 2 Installation 1 2.1 Install the R software...........................

More information

Forensic DNA Testing Terminology

Forensic DNA Testing Terminology Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.

More information

DNA and Forensic Science

DNA and Forensic Science DNA and Forensic Science Micah A. Luftig * Stephen Richey ** I. INTRODUCTION This paper represents a discussion of the fundamental principles of DNA technology as it applies to forensic testing. A brief

More information

Package forensic. February 19, 2015

Package forensic. February 19, 2015 Type Package Title Statistical Methods in Forensic Genetics Version 0.2 Date 2007-06-10 Package forensic February 19, 2015 Author Miriam Marusiakova (Centre of Biomedical Informatics, Institute of Computer

More information

Mixture Interpretation: Defining the Relevant Features for Guidelines for the Assessment of Mixed DNA Profiles in Forensic Casework*

Mixture Interpretation: Defining the Relevant Features for Guidelines for the Assessment of Mixed DNA Profiles in Forensic Casework* J Forensic Sci, July 2009, Vol. 54, No. 4 doi: 10.1111/j.1556-4029.2009.01046.x Available online at: www.blackwell-synergy.com Bruce Budowle, 1 Ph.D.; Anthony J. Onorato, 1 M.S.F.S., M.C.I.M.; Thomas F.

More information

Development of two Novel DNA Analysis methods to Improve Workflow Efficiency for Challenging Forensic Samples

Development of two Novel DNA Analysis methods to Improve Workflow Efficiency for Challenging Forensic Samples Development of two Novel DNA Analysis methods to Improve Workflow Efficiency for Challenging Forensic Samples Sudhir K. Sinha, Ph.D.*, Anne H. Montgomery, M.S., Gina Pineda, M.S., and Hiromi Brown, Ph.D.

More information

Commonly Used STR Markers

Commonly Used STR Markers Commonly Used STR Markers Repeats Satellites 100 to 1000 bases repeated Minisatellites VNTR variable number tandem repeat 10 to 100 bases repeated Microsatellites STR short tandem repeat 2 to 6 bases repeated

More information

DNA as a Biometric. Biometric Consortium Conference 2011 Tampa, FL

DNA as a Biometric. Biometric Consortium Conference 2011 Tampa, FL DNA as a Biometric Biometric Consortium Conference 2011 Tampa, FL September 27, 2011 Dr. Peter M. Vallone Biochemical Science Division National Institute of Standards and Technology Gaithersburg, MD 20899

More information

Computer with GeneMapper ID (version 3.2.1 or most current) software Microsoft Excel, Word Print2PDF software

Computer with GeneMapper ID (version 3.2.1 or most current) software Microsoft Excel, Word Print2PDF software Procedure for GeneMapper ID for Casework 1.0 Purpose-This procedure specifies the steps for performing analysis on DNA samples amplified with AmpFlSTR Identifiler Plus using the GeneMapper ID (GMID) software.

More information

Are DNA tests infallible?

Are DNA tests infallible? International Congress Series 1239 (2003) 873 877 Are DNA tests infallible? G. Penacino *, A. Sala, D. Corach Servicio de Huellas Digitales Genéticas and Cátedra de Genética y Biología Molecular, Fac.

More information

Touch DNA and DNA Recovery. H. Miller Coyle

Touch DNA and DNA Recovery. H. Miller Coyle Touch DNA and DNA Recovery 1 2 What is the link between cell biology & forensic science? Cells are the trace substances left behind that can identify an individual. Cells contain DNA. There are two forms

More information

Nancy W. Peterson Forensic Biology Consultants, LLC July 7, 2011

Nancy W. Peterson Forensic Biology Consultants, LLC July 7, 2011 Nancy W. Peterson Forensic Biology Consultants, LLC July 7, 2011 My Qualifications 20 years : Forensic Serology and DNA cases at the FDLE 30+ years: Training Forensic DNA Technologists & DNA Analysts 30+

More information

SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories

SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories Scientific Working Group on DNA Analysis Methods (SWGDAM) The Scientific Working Group on DNA Analysis Methods,

More information

Quantifiler Human DNA Quantification Kit Quantifiler Y Human Male DNA Quantification Kit

Quantifiler Human DNA Quantification Kit Quantifiler Y Human Male DNA Quantification Kit Product Bulletin Human Identification Quantifiler Human DNA Quantification Kit Quantifiler Y Human Male DNA Quantification Kit The Quantifiler kits produce reliable and reproducible results, helping to

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR Results Interpretation Guide Pathogen Detection Systems by Real Time PCR Microbial offers real time PCR based systems for the detection of pathogenic bacteria

More information

Real-time PCR: Understanding C t

Real-time PCR: Understanding C t APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence

More information

Mitochondrial DNA Analysis

Mitochondrial DNA Analysis Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)

More information

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

The Techniques of Molecular Biology: Forensic DNA Fingerprinting Revised Fall 2011 The Techniques of Molecular Biology: Forensic DNA Fingerprinting The techniques of molecular biology are used to manipulate the structure and function of molecules such as DNA and proteins

More information

Y Chromosome Markers

Y Chromosome Markers Y Chromosome Markers Lineage Markers Autosomal chromosomes recombine with each meiosis Y and Mitochondrial DNA does not This means that the Y and mtdna remains constant from generation to generation Except

More information

ASSURING THE QUALITY OF TEST RESULTS

ASSURING THE QUALITY OF TEST RESULTS Page 1 of 12 Sections Included in this Document and Change History 1. Purpose 2. Scope 3. Responsibilities 4. Background 5. References 6. Procedure/(6. B changed Division of Field Science and DFS to Office

More information

6 Scalar, Stochastic, Discrete Dynamic Systems

6 Scalar, Stochastic, Discrete Dynamic Systems 47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

More information

Forensic Statistics. From the ground up. 15 th International Symposium on Human Identification

Forensic Statistics. From the ground up. 15 th International Symposium on Human Identification Forensic Statistics 15 th International Symposium on Human Identification From the ground up UNTHSC John V. Planz, Ph.D. UNT Health Science Center at Fort Worth Why so much attention to statistics? Exclusions

More information

CRIME SCENE INVESTIGATION THROUGH DNA TRACES USING BAYESIAN NETWORKS

CRIME SCENE INVESTIGATION THROUGH DNA TRACES USING BAYESIAN NETWORKS CRIME SCENE INVESTIGATION THROUGH DNA TRACES USING BAYESIAN NETWORKS ANDRADE Marina, (PT), FERREIRA Manuel Alberto M., (PT) Abstract. The use of biological information in crime scene identification problems

More information

Validation and Calibration. Definitions and Terminology

Validation and Calibration. Definitions and Terminology Validation and Calibration Definitions and Terminology ACCEPTANCE CRITERIA: The specifications and acceptance/rejection criteria, such as acceptable quality level and unacceptable quality level, with an

More information

Crime Scenes and Genes

Crime Scenes and Genes Glossary Agarose Biotechnology Cell Chromosome DNA (deoxyribonucleic acid) Electrophoresis Gene Micro-pipette Mutation Nucleotide Nucleus PCR (Polymerase chain reaction) Primer STR (short tandem repeats)

More information

DNA Detection. Chapter 13

DNA Detection. Chapter 13 DNA Detection Chapter 13 Detecting DNA molecules Once you have your DNA separated by size Now you need to be able to visualize the DNA on the gel somehow Original techniques: Radioactive label, silver

More information

HLA data analysis in anthropology: basic theory and practice

HLA data analysis in anthropology: basic theory and practice HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,

More information

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

NATIONAL GENETICS REFERENCE LABORATORY (Manchester) NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits

More information

Collecting a Buccal Swab An Art or a Cinch? By Chantel Marie Giamanco, Forensic Scientist Human Identification Technologies, Inc.

Collecting a Buccal Swab An Art or a Cinch? By Chantel Marie Giamanco, Forensic Scientist Human Identification Technologies, Inc. Collecting a Buccal Swab An Art or a Cinch? By Chantel Marie Giamanco, Forensic Scientist Human Identification Technologies, Inc. An increasing number of cases tried in the courtroom involve DNA evidence.

More information

Paternity Testing. Chapter 23

Paternity Testing. Chapter 23 Paternity Testing Chapter 23 Kinship and Paternity DNA analysis can also be used for: Kinship testing determining whether individuals are related Paternity testing determining the father of a child Missing

More information

How do we build and refine models that describe and explain the natural and designed world?

How do we build and refine models that describe and explain the natural and designed world? Strand: A. Understand Scientific Explanations : Students understand core concepts and principles of science and use measurement and observation tools to assist in categorizing, representing, and interpreting

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Popstats Unplugged. 14 th International Symposium on Human Identification. John V. Planz, Ph.D. UNT Health Science Center at Fort Worth

Popstats Unplugged. 14 th International Symposium on Human Identification. John V. Planz, Ph.D. UNT Health Science Center at Fort Worth Popstats Unplugged 14 th International Symposium on Human Identification John V. Planz, Ph.D. UNT Health Science Center at Fort Worth Forensic Statistics From the ground up Why so much attention to statistics?

More information

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds Overview In order for accuracy and precision to be optimal, the assay must be properly evaluated and a few

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Additional Markers 13 core STR loci Obtain further information from additional markers: Y STRs Separating male samples Mitochondrial DNA Working with extremely degraded

More information

Introduction to Post PCR Cleanup

Introduction to Post PCR Cleanup Matt Kramer Introduction to Post PCR Cleanup Overview Why post PCR amplification cleanup? Enhancing human identity testing Introduction to QIAGEN MinElute post PCR cleanup technologies MinElute as a tool

More information

MASCOT Search Results Interpretation

MASCOT Search Results Interpretation The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

The Chinese University of Hong Kong School of Life Sciences Biochemistry Program CUGEN Ltd.

The Chinese University of Hong Kong School of Life Sciences Biochemistry Program CUGEN Ltd. The Chinese University of Hong Kong School of Life Sciences Biochemistry Program CUGEN Ltd. DNA Forensic and Agarose Gel Electrophoresis 1 OBJECTIVES Prof. Stephen K.W. Tsui, Dr. Patrick Law and Miss Fion

More information

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Page 1 of 5 Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Genetics Exercise: Understanding how meiosis affects genetic inheritance and DNA patterns

More information

Introduction To Real Time Quantitative PCR (qpcr)

Introduction To Real Time Quantitative PCR (qpcr) Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors

More information

Forensic Science International: Genetics

Forensic Science International: Genetics Forensic Science International: Genetics 4 (2009) 1 10 Contents lists available at ScienceDirect Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig Interpreting low

More information

DNA & CRIME VICTIMS: WHAT VICTIMS NEED TO KNOW

DNA & CRIME VICTIMS: WHAT VICTIMS NEED TO KNOW DNA & CRIME VICTIMS: WHAT VICTIMS NEED TO KNOW DNA & CRIME VICTIMS: What Victims Need to Know The increasing use of DNA evidence in criminal cases gives victims of crime new hope that offenders will be

More information

Willmar Public Schools Curriculum Map

Willmar Public Schools Curriculum Map Subject Area Science Senior High Course Name Forensics Date June 2010 Timeline Content Standards Addressed Skills/Benchmarks Essential Questions Assessments 1-2 Introduction History and Development of

More information

2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number

2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000

More information

Validation of measurement procedures

Validation of measurement procedures Validation of measurement procedures R. Haeckel and I.Püntmann Zentralkrankenhaus Bremen The new ISO standard 15189 which has already been accepted by most nations will soon become the basis for accreditation

More information

APPENDIX N. Data Validation Using Data Descriptors

APPENDIX N. Data Validation Using Data Descriptors APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection

More information

Guidance for Industry

Guidance for Industry Guidance for Industry Q2B Validation of Analytical Procedures: Methodology November 1996 ICH Guidance for Industry Q2B Validation of Analytical Procedures: Methodology Additional copies are available from:

More information

Executive Summary. Summary - 1

Executive Summary. Summary - 1 Executive Summary For as long as human beings have deceived one another, people have tried to develop techniques for detecting deception and finding truth. Lie detection took on aspects of modern science

More information

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

FIVS 316 BIOTECHNOLOGY & FORENSICS Syllabus - Lecture followed by Laboratory

FIVS 316 BIOTECHNOLOGY & FORENSICS Syllabus - Lecture followed by Laboratory FIVS 316 BIOTECHNOLOGY & FORENSICS Syllabus - Lecture followed by Laboratory Instructor Information: Name: Dr. Craig J. Coates Email: ccoates@tamu.edu Office location: 319 Heep Center Office hours: By

More information

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry BY GHULAM A. SHABIR Introduction Methods Validation: Establishing documented evidence that provides a high

More information

Rapid DNA Instrument Update & Enhancement Plans for CODIS

Rapid DNA Instrument Update & Enhancement Plans for CODIS Rapid DNA Instrument Update & Enhancement Plans for CODIS Biometrics Consortium Conference 2013 September 19, 2013 Tampa, Florida Thomas Callaghan PhD FBI Laboratory Rapid DNA Analysis (Law Enforcement

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

DNA & CRIME VICTIMS: WHAT VICTIM ASSISTANCE PROFESSIONALS NEED TO KNOW

DNA & CRIME VICTIMS: WHAT VICTIM ASSISTANCE PROFESSIONALS NEED TO KNOW DNA & CRIME VICTIMS: WHAT VICTIM ASSISTANCE PROFESSIONALS NEED TO KNOW What Victim Assistance Professionals Need to Know 1 DNA & CRIME VICTIMS: What Victim Assistance Professionals Need to Know As the

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

DNA PROFILING IN FORENSIC SCIENCE

DNA PROFILING IN FORENSIC SCIENCE DA PROFILIG I FORESIC SCIECE DA is the chemical code that is found in every cell of an individual's body, and is unique to each individual. Because it is unique, the ability to examine DA found at a crime

More information

DRAFT RESEARCH SUPPORT BUILDING AND INFRASTRUCTURE MODERNIZATION RISK MANAGEMENT PLAN. April 2009 SLAC I 050 07010 002

DRAFT RESEARCH SUPPORT BUILDING AND INFRASTRUCTURE MODERNIZATION RISK MANAGEMENT PLAN. April 2009 SLAC I 050 07010 002 DRAFT RESEARCH SUPPORT BUILDING AND INFRASTRUCTURE MODERNIZATION RISK MANAGEMENT PLAN April 2009 SLAC I 050 07010 002 Risk Management Plan Contents 1.0 INTRODUCTION... 1 1.1 Scope... 1 2.0 MANAGEMENT

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Real-Time PCR Vs. Traditional PCR

Real-Time PCR Vs. Traditional PCR Real-Time PCR Vs. Traditional PCR Description This tutorial will discuss the evolution of traditional PCR methods towards the use of Real-Time chemistry and instrumentation for accurate quantitation. Objectives

More information

Are you collecting all the available DNA from touched objects?

Are you collecting all the available DNA from touched objects? International Congress Series 1239 (2003) 803 807 Are you collecting all the available DNA from touched objects? R.A.H. van Oorschot a, *, D.G. Phelan a,b, S. Furlong a,b, G.M. Scarfo a,b, N.L. Holding

More information

A guide to the analysis of KASP genotyping data using cluster plots

A guide to the analysis of KASP genotyping data using cluster plots extraction sequencing genotyping extraction sequencing genotyping extraction sequencing genotyping extraction sequencing A guide to the analysis of KASP genotyping data using cluster plots Contents of

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Aurora Forensic Sample Clean-up Protocol

Aurora Forensic Sample Clean-up Protocol Aurora Forensic Sample Clean-up Protocol 106-0008-BA-D 2015 Boreal Genomics, Inc. All rights reserved. All trademarks are property of their owners. http://www.borealgenomics.com support@borealgenomics.com

More information

The College of Forensic Sciences at NAUSS: The pioneer of Forensics in the Arab world

The College of Forensic Sciences at NAUSS: The pioneer of Forensics in the Arab world 12 Arab Journal of Forensic Sciences and Forensic Medicine 2014; Volume 1 Issue (0), 12-16 Naif Arab University for Security Sciences Arab Journal of Forensic Sciences and Forensic Medicine www.nauss.edu.sa

More information

Forensic. Sciences. Forensic Sciences. Specialties. Programs. Career Pathways

Forensic. Sciences. Forensic Sciences. Specialties. Programs. Career Pathways Forensic Sciences Specialties Programs Prof. R. E. Gaensslen Director of Graduate Studies Forensic Science University of Illinois - Chicago Career Pathways Forensic Sciences 1 The Hype... the TV version

More information

Schedule Adherence. and Rework Walt Lipke, PMI Oklahoma City Chapter. PUBLISHER s CHOICE

Schedule Adherence. and Rework Walt Lipke, PMI Oklahoma City Chapter. PUBLISHER s CHOICE Schedule Adherence and Rework Walt Lipke, PMI Oklahoma City Chapter Abstract. When project performance is such that the product is delivered with expected functionality at the time and price agreed between

More information

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99. 1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

Blood Stain Analysis Part One

Blood Stain Analysis Part One Hughes Undergraduate Biological Science Education Initiative HHMI Blood Stain Analysis Part One Investigators often find blood stains during their examination of a crime scene. They also find stains that

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Validation Guide for the DNA IQ Reference Sample Kit for Maxwell 16 Printed in USA. 9/06 Part# GE181

Validation Guide for the DNA IQ Reference Sample Kit for Maxwell 16 Printed in USA. 9/06 Part# GE181 REFERENCE MANUAL Validation Guide for the DNA IQ Reference Sample Kit for Maxwell 16 9/06 Validation Guide for the DNA IQ Reference Sample Kit for Maxwell 16 All technical literature is available on the

More information

DNA: FORENSIC AND LEGAL APPLICATIONS By: Lawrence Koblinsky, Thomas F. Liotti, Jamel Oeser-Sweat

DNA: FORENSIC AND LEGAL APPLICATIONS By: Lawrence Koblinsky, Thomas F. Liotti, Jamel Oeser-Sweat DNA: FORENSIC AND LEGAL APPLICATIONS By: Lawrence Koblinsky, Thomas F. Liotti, Jamel Oeser-Sweat Citation: LAWRENCE KOBLINSKY ET AL., DNA: FORENSIC AND LEGAL APPLICATIONS (John Wiley & Sons, Inc., 2005).

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Hardy-Weinberg Equilibrium Problems

Hardy-Weinberg Equilibrium Problems Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of

More information

CHAPTER VII CONCLUSIONS

CHAPTER VII CONCLUSIONS CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the

More information

The Image Deblurring Problem

The Image Deblurring Problem page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation

More information

Consistent Assay Performance Across Universal Arrays and Scanners

Consistent Assay Performance Across Universal Arrays and Scanners Technical Note: Illumina Systems and Software Consistent Assay Performance Across Universal Arrays and Scanners There are multiple Universal Array and scanner options for running Illumina DASL and GoldenGate

More information

Format for Experiment Preparation and Write-Up

Format for Experiment Preparation and Write-Up Format for Experiment Preparation and Write-Up Scientists try to answer questions by applying consistent, logical reasoning to describe, explain, and predict observations; and by performing experiments

More information

Essentials of Real Time PCR. About Sequence Detection Chemistries

Essentials of Real Time PCR. About Sequence Detection Chemistries Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected

More information

Blood Stains at the Crime Scene Forensic Investigation

Blood Stains at the Crime Scene Forensic Investigation Blood Stains at the Crime Scene Forensic Investigation Introduction Blood stains at a crime scene can be crucial in solving the crime. Numerous analytical techniques can be used to study blood stains.

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

More information

Example #1: Controller for Frequency Modulated Spectroscopy

Example #1: Controller for Frequency Modulated Spectroscopy Progress Report Examples The following examples are drawn from past student reports, and illustrate how the general guidelines can be applied to a variety of design projects. The technical details have

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL

TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL A collection of Visual Basic macros for the analysis of terminal restriction fragment length polymorphism data Nils Johan Fredriksson TOOLS FOR T-RFLP DATA ANALYSIS

More information

For example, estimate the population of the United States as 3 times 10⁸ and the

For example, estimate the population of the United States as 3 times 10⁸ and the CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

CURRICULUM GUIDE. When this Forensics course has been completed successfully, students should be able to:

CURRICULUM GUIDE. When this Forensics course has been completed successfully, students should be able to: CURRICULUM GUIDE NAME OF COURSE: FORENSICS COURSE NUMBER: SCI 40 WRITTEN / REVISED: SEPTEMBER, 2011 LEVEL OF COURSE: REPLACMENT NUMBER OF CREDITS: SIX (6) PREREQUISITES: BIOLOGY GRADE LEVELS OFFERED TO:

More information

GUIDANCE FOR ASSESSING THE LIKELIHOOD THAT A SYSTEM WILL DEMONSTRATE ITS RELIABILITY REQUIREMENT DURING INITIAL OPERATIONAL TEST.

GUIDANCE FOR ASSESSING THE LIKELIHOOD THAT A SYSTEM WILL DEMONSTRATE ITS RELIABILITY REQUIREMENT DURING INITIAL OPERATIONAL TEST. GUIDANCE FOR ASSESSING THE LIKELIHOOD THAT A SYSTEM WILL DEMONSTRATE ITS RELIABILITY REQUIREMENT DURING INITIAL OPERATIONAL TEST. 1. INTRODUCTION Purpose The purpose of this white paper is to provide guidance

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information