Elsevier Editorial System(tm) for Forensic Science International: Genetics Manuscript Draft



Similar documents
Comparison of Y-chromosomal lineage dating using either evolutionary

DNA Genealogy, Mutation Rates, and Some Historical Evidences Written in Y-Chromosome. I. Basic Principles and the Method

Y Chromosome Markers

I Have the Results of My Genetic Genealogy Test, Now What?

Paternity Testing. Chapter 23

Genetic Variation and Human Evolution Lynn B. Jorde, Ph.D. Department of Human Genetics University of Utah School of Medicine.

Analysis of 16 Y STR loci in the Finnish population reveals a local reduction in the diversity of male lineages

Y-STR haplotype diversity and population data for Central Brazil: implications for environmental forensics and paternity testing

The sample is taken with a simple mouth swab no blood is involved. There will be instructions included on how to take the sample.

What s in a name? Y chromosomes, surnames, and the genetic genealogy. Department of Genetics, University of Leicester, University Road, Leicester

Some ancestors are difficult to trace. Adoptions illegitimacies, name changes and migrations can present brick walls in our research that seem

Forensic Science International: Genetics Journal Status Report

NORSE VIKING HERITAGE

Matthew Kaplan and Taylor Edwards. University of Arizona Tucson, Arizona

SNP Essentials The same SNP story

The Story of Human Evolution Part 1: From ape-like ancestors to modern humans

Online Y-chromosomal Short Tandem Repeat Haplotype Reference Database (YHRD) for U.S. Populations*

What Do I Do With My DNA Results.in 10 Easy Steps By Roberta Estes (copyright 2008)

DXS8378, DXS7132, HPRTB,

Mitochondrial DNA Analysis

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Online reference database of European Y-chromosomal short tandem repeat STR) haplotypes

DNA Sequence Alignment Analysis

GENETIC GENEALOGY AND DNA TESTING

Commonly Used STR Markers

From Africa to Aotearoa Part 1: Out of Africa

REPORT An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Estimating Scandinavian and Gaelic Ancestry in the Male Settlers of Iceland

Significant genetic differentiation between Poland and Germany follows present-day political borders, as revealed by Y-chromosome analysis

Identifying Genetic Traces of Historical Expansions: Phoenician Footprints in the Mediterranean

Approximating the Coalescent with Recombination. Niall Cardin Corpus Christi College, University of Oxford April 2, 2007

FHF prosjekt :

Rules for conducting ISAG Comparison Tests (CT) for animal DNA testing.

Genomic insights on the ethno-history of the Maya and the Ladinos from Guatemala

Forensic DNA Testing Terminology

Rethinking Polynesian Origins: Human Settlement of the Pacific

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

ELECTRONIC EDITIONS WHICH WE HAVE MADE AND WHICH WE WANT TO MAKE

Single Nucleotide Polymorphisms (SNPs)

DNA Testing for Genealogy - What Can It Do For You??

Genetic modification for cell pedigree labels to aid disease treatment

Thesis Guidelines Handbook

Estimation of Population Divergence Times from Non- Overlapping Genomic Sequences: Examples from Dogs and Wolves. Research article.

Lecture 10 Friday, March 20, 2009

Original article: DXS10079, DXS10074 and DXS10075: new alleles and SNP occurrence

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

A Primer of Genome Science THIRD

Tutorial on gplink. PLINK tutorial, December 2006; Shaun Purcell,

DNA as a Biometric. Biometric Consortium Conference 2011 Tampa, FL

Phillips DNA News. Project News. April 2013 Volume 5 Issue 4 Editor: Nancy Kiser

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Bayesian coalescent inference of population size history

GOBII. Genomic & Open-source Breeding Informatics Initiative

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

REVIEWS. Computer programs for population genetics data analysis: a survival guide FOCUS ON STATISTICAL ANALYSIS

How To Find Rare Variants In The Human Genome

(Refer Slide Time: 2:03)

Geographic Patterns of Haplogroup R1b in the British Isles

14.3 Studying the Human Genome

DNA for Defense Attorneys. Chapter 6

ADVANCES IN BOTANICAL RESEARCH

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Edlund, H., Allen, M. (2009) Y chromosomal STR analysis using Pyrosequencing technology. Forensic Science International: Genetics, 3(2):

Worksheet - COMPARATIVE MAPPING 1

Seeing Faces and History through Human Genome Sequences

DNA Tribes is on Facebook. Find us at DNA Tribes Digest February 1, 2013 Page 1 of 10

Got Lactase? The Co-evolution of Genes and Culture

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Prepare your result file for input into SPSS

Editors Comparison (NetBeans IDE, Eclipse, IntelliJ IDEA)

SNPbrowser Software v3.5

Written by Roberta Estes, Copyright

Milk protein genetic variation in Butana cattle

Annex to the Accreditation Certificate D-PL according to DIN EN ISO/IEC 17025:2005

PRINCIPLES OF POPULATION GENETICS

Training and Support of Lab Demonstrators at Imperial College.

The Human Genome Project

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Human Genome and Human Genome Project. Louxin Zhang

Guide to Writing a Project Report

Section 3 Comparative Genomics and Phylogenetics

Phylogenetic Trees Made Easy

Computer with GeneMapper ID (version or most current) software Microsoft Excel, Word Print2PDF software

Reduced-Median-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African, Asian, and European Haplogroups

The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Evolution (18%) 11 Items Sample Test Prep Questions

DNA Detection. Chapter 13

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Contrasting patterns of Y chromosome variation in Ashkenazi Jewish and host non-jewish European populations

Gene mutation and molecular medicine Chapter 15

ENCODED EVIDENCE: DNA IN FORENSIC ANALYSIS

Asexual Versus Sexual Reproduction in Genetic Algorithms 1

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Transcription:

Elsevier Editorial System(tm) for Forensic Science International: Genetics Manuscript Draft Manuscript Number: Title: A comment on the Paper: A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping By W. Wei, Q. Ayub, Y. Xue, C. Tyler-Smith Forensic Sci. Int. Genet. (2013) http://dx.doi.org/10.1016/j.fscigen.2013.03.014 Article Type: Letter to the Editor Keywords: Y-STR; Y-SNP; mutation rate constant; haplotype; haplogroup; back mutations. Corresponding Author: Prof. Anatole Alex Klyosov, PhD, D. Sc. Corresponding Author's Institution: The Academy of DNA Genealogy First Author: Anatole Alex Klyosov, PhD, D. Sc. Order of Authors: Anatole Alex Klyosov, PhD, D. Sc. Abstract: Suggested Reviewers:

Title Page (with authors & addresses) Forensic Science International: Genetics Letter to the Editor A comment on the Paper: A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping By W. Wei, Q. Ayub, Y. Xue, C. Tyler-Smith Forensic Sci. Int. Genet. (2013) http://dx.doi.org/10.1016/j.fscigen.2013.03.014 Anatole A. Klyosov Newton, Massachusetts. aklyosov@comcast.net http://aklyosov.home.comcast.net

*Manuscript Click here to view linked References Forensic Science International: Genetics Letter to the Editor A comment on the Paper: A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping By W. Wei, Q. Ayub, Y. Xue, C. Tyler-Smith Forensic Sci. Int. Genet. (2013) http://dx.doi.org/10.1016/j.fscigen.2013.03.014 Anatole A. Klyosov Newton, Massachusetts. aklyosov@comcast.net http://aklyosov.home.comcast.net

The cited paper, unfortunately, presents a striking example in the area of mismanagement of mutation rate constants in population genetics. Chris Tyler-Smith and his team have made at least five errors in their work on different levels of significance. First, their selection of the SNP lineages was greatly distorted. It might be acceptable for their SNP analysis, which gave 101-115 kya for a common ancestor of the selection of A-DT lineages; however, the A and DT nodes are shown in the paper at equal level position in time. This tells us that something is wrong in their assumptions/calculations. An earlier STR-based analysis (Klyosov and Rozhanskii, 2012) gave 160,000 ya for the split, and showed a very different level for the A, BT and DT split from the main evolution Y-DNA tree. Nonetheless, let it be 100,000 ya in this particular case. The Tyler-Smith STR analysis does not fit, and it is strikingly incorrect. When one "averages" the STR lineages being calculated and aims at the TMRCA, they should be balanced; the dataset should be examined for different branches descending from different common ancestors. If they are mixed in the dataset, they should be separated and analyzed separately. If this is not done, the largest haplotype branch would "pull the blanket" on itself, and instead of a balanced combined haplotype dataset we would have obtained the system totally shifted to the largest branch and its common ancestor. In other words, the largest branch would play the "ancestral" (base) role, and the TMRCA would be phantom one. We cannot compare a flock of sheep and a horse, however, we can compare a sheep and a horse for a meaningful study. In addition, the 33 haplotype dataset in the cited study contained four identical haplotypes, of a father and his three sons. All of it added to the distortion of the dataset, as described above. Now, let us examine their STR selection for the study. Out of 33 individuals, 13 were R1b, 11 were E, three I, and the rest were singular haplogroups (their particular subclades, in fact). The dataset was greatly distorted. In fact, the paper has calculated the phantom TMRCA shifted towards R1b/E1b, which should give the TMRCA somewhere around 20-40 kya and not anything close to 100 kya. This was the greatest principal error in his study. Secondly, they did not introduce a proper correction for back mutations, though in the paper, they talked much about "saturation" of allele values. This is still terra incognita in population genetics, but developed well in DNA genealogy (e.g., Klyosov, 2009a; Klyosov, 2012). A quick look at paper s sample of haplotypes and a number of their mutations (removing the haplotypes of the three sons, see above) reveals that there are around 1000 mutations altogether in 30 haplotypes. It gives 1000/30/23 = 1.45 mutations per marker in the dataset. This is an impermissibly high degree of mutations

in any dataset for any meaningful calculation. The reason is simple - the haplotypes contained many "fast" markers that should have been eliminated. It is recommended for calculations of TMRCA for ancient common ancestors to calculate the "slow" 22 marker panel, or even the slower panels (Rozhanskii and Klyosov, 2011). "Fast" markers "saturate" the system, indeed, and the TMRCA is always underestimated. That is why the Tyler-Smith paper obtained much lower STR-based TMRCA (around 20 kya) compared to the SNP-based TMRCA (around 100 kya). The underestimation was dual - the distorted, unbalanced dataset, plus the lack of proper corrections for back mutations. Thirdly, the authors made an improper selection of markers for their haplotype (STR) study, and it is partly explained above. They should have focused on the slow markers only. There are a sufficient number of them to choose from; and they should have calibrated them (see below). Fourth, they choose mutation rate constants uncritically. They blindly picked some markers with poorly determined mutation rate constant (in the father-son study), since there were too few mutations in them to be statistically sound. Here are examples from their choice: (a) 1 mutation in 1213 father-son pairs (it is not statistics). How one can calculate reliably the mutation rate constant from just one mutation? Well, the authors did: 8.244 x 10-4 [!]. (Notice the given accuracy). (b) 3 mutations in 403 pairs; (c) 1 mutation in 555 pairs; (d) 2 mutations in 555 pairs; (e) 2 mutations in 4565 pairs; (f) 0 mutation in 555 pairs - why choose such a marker? It was useless; (g) 5 mutations (h) 6 mutations... (i) 6 mutations. In fact, half of the markers picked were statistically mute. What can be said on the accuracy of their calculations? In their particular case it is not really important, they killed their study anyway, and the sloppiness of the study is shocking. Fifth, they employed the "evolution mutation rate" by Zhivotovsky et al (2004), which was a grave mistake. This "rate" was (and continues to be) a disaster for practically all population genetic studies which includes calculations for TMRCA in the last decade (see critique in, e.g., Klyosov, 2009b). The damage it brought to population genetics is beyond comprehension. The evolution mutation rate assigns the same rate - 0.00069 mutation per marker per 25 years - to each marker equally. We have seen above how different the markers are in terms of their mutation rate, but 0.00069 - always? No matter which markers are chosen? Let us reexamine to see what happened. The summary mutation rate (per haplotype) in those 23 markers (actually, 21, since two markers were eliminated by the authors) [see the table below] is 0.09369 (my calculation, it is not in the paper); that is divided by 21 and yields 0.00446 mutation per marker per. The "evolution/zhivotovsky" rate is 0.00069 mutation/marker/25 years, and that is 6.5 times slower. Naturally, when they used the "evolution" rate they obtained the TMRCA much higher. That is an

explanation why they obtained a "fit" between the SNP- and STR- based TMRCA when the "evolution" rate was used. They just automatically increased the TMRCA. The "recalibrated evolutionary mutation rate" was the same thing, in just one marker (out of 21) instead of the "evolution" rate of 0.00069, they changed it to 0.000351. The rest was the same 0.00069. Clearly, nothing has changed overall. Table. A comparison of the mutation rate constants for a number of loci employed by the authors of the cited paper from father-son pairs (Burgarella et all, 2011), from Chandler (2006), and from Zhivotovsky et al (2004). The table show how different those values are. The authors of the cited paper did not provide any support for the chosen mutation rate constants (Burgarella et al, 2011) from any other source, such the Chandler (2006) data. Number of fatherson pairs Number of mutations Mutation rate constant k x 10 5 per marker per Chandler constants (2006) k x 10 5 per marker Zhivotovsky Population, or evolution mutation rate constant (2004) k x 10 5 per marker DYS576 555 9 1622 1022 69 DYS389 I 7,864 20 254 226 69 DYS448 1,213 1 82 135 69 DYS389 b 7,842 28 357* 242 69 DYS19 9,840 23 234 151 69 DYS391 9,279 25 269 265 69 DYS481 403 3 744 69 DYS549 555 1 180 69 DYS533 555 2 360 69 DYS438 4,565 2 44 55 69 DYS437 4,381 6 137 99 69

DYS570 555 7 1261 790 69 DYS635 1,920 12 625 69 DYS390 9,340 22 236 311 69 DYS439 4,542 28 616 477 69 DYS392 9,264 5 54 52 69 DYS643 555 0 0 69 DYS393 7,835 6 77 76 69 DYS458 1,243 13 1046 814 69 DYS456 1,243 8 643 735 69 Y-GATA-H4 2,083 11 528 208 69 As a result, the paper is a total embarrassment. Other minor things: I could not find in the paper the mutation rate constant they employed for SNPs (1 x 10-9 per nucleotide per year? Any other figure?). I also could not find how the authors converted s (from father-son studies) into years. References Burgarella, С., Navascues, М. (2011). Mutation rate estimates for 110 Y chromosome STRs combining population and father-son pair data. European Journal of Human Genetics, 19, 70-75. Chandler, J. F. (2006). Estimating per-locus mutation rates. Journal of Genetic Genealogy, 2, 27-33. Klyosov, A.A. (2009a) DNA Genealogy, mutation rates, and some historical evidences written in Y-chromosome. I. Basic principles and the method. J. Genetic Genealogy, 5, 186-216. Klyosov, A.A. (2009b) A comment on the paper: Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish Priesthood. Human Genetics, 126, 719-724. Klyosov, A.A., Rozhanskii, I.L. (2012) Re-examining the Out of Africa theory and the origin of Europeoids (Caucasoids) in light of DNA genealogy. Adv. Anthropol. 2, 80-86. Klyosov, A.A. (2012) Ancient history of the Arbins, bearers of haplogroup R1b, from Central Asia to Europe, 16,000 to 1500 years before present. Advances in Anthropology, 2, 87-105.

Rozhanskii, I.L., Klyosov, A.A. (2011) Mutation rate constants in DNA genealogy (Y chromosome). Adv. Anthropol. Vol. 1, No. 2, 26-34. Zhivotovsky, L. A., Underhill, P. A., Cinnioglu, C., Kayser, M., Morar, B., Kivisild, T. et al. (2004). The effective mutation rate at Y chromosome short tandem repeats, with application to human population divergence time. The Amer. J. Hum. Genetics, 74, 50-61.

Table Table. A comparison of the mutation rate constants for a number of loci employed by the authors of the cited paper from father-son pairs (Burgarella et all, 2011), from Chandler (2006), and from Zhivotovsky et al (2004). The table show how different those values are. The authors of the cited paper did not provide any support for the chosen mutation rate constants (Burgarella et al, 2011) from any other source, such the Chandler (2006) data. Number of fatherson pairs Number of mutations Mutation rate constant k x 10 5 per marker per Chandler constants (2006) k x 10 5 per marker Zhivotovsky Population, or evolution mutation rate constant (2004) k x 10 5 per marker DYS576 555 9 1622 1022 69 DYS389 I 7,864 20 254 226 69 DYS448 1,213 1 82 135 69 DYS389 b 7,842 28 357* 242 69 DYS19 9,840 23 234 151 69 DYS391 9,279 25 269 265 69 DYS481 403 3 744 69 DYS549 555 1 180 69 DYS533 555 2 360 69 DYS438 4,565 2 44 55 69 DYS437 4,381 6 137 99 69 DYS570 555 7 1261 790 69 DYS635 1,920 12 625 69 DYS390 9,340 22 236 311 69 DYS439 4,542 28 616 477 69

DYS392 9,264 5 54 52 69 DYS643 555 0 0 69 DYS393 7,835 6 77 76 69 DYS458 1,243 13 1046 814 69 DYS456 1,243 8 643 735 69 Y-GATA-H4 2,083 11 528 208 69