Automated binning of microsatellite alleles: problems and solutions
|
|
|
- Hillary Jennings
- 10 years ago
- Views:
Transcription
1 Molecular Ecology Notes (2007) 7, doi: /j x Blackwell Publishing Ltd TECHNICAL ARTICLE Automated binning of microsatellite alleles: problems and solutions W. AMOS,* J. I. HOFFMAN,* A. FRODSHAM, L. ZHANG, S. BEST and A. V. S. HILL *Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK, The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK Abstract As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele-calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2-, 3- or 4-base-pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time. Keywords: ABI, allele calling, fragment size, genotyper, genotyping error. Received 20 April 2006; revision accepted 21 August 2006 Despite increasing interest in single nucleotide polymorphisms, microsatellites remain arguably the most important class of markers for use in population genetic studies (Schlotterer 2004). Since microsatellites were first used as markers around 1990, there has been a steady switch in the way alleles are visualized, from radioactive incorporation to fluorescent dye technology. In theory, fluorescent methods allow much greater levels of automation, with internal size markers aiding size estimation and various algorithms assigning fragments to allele classes. However, the reality appears to be that progression from raw fragment lengths to allele sizes remains an important challenge (Ginot et al. 1996; Ewen et al. 2000; Weeks et al. 2002). Several strategies have been developed for allele assignment. Where allele size distributions are already well documented, for example in European human populations, any particular fragment can be compared against a database of expected lengths and assigned to the closest. This Correspondence: William Amos, Fax: ; [email protected]. Present address: Angela Frodsham, Cambridge Genetics Knowledge Park, Public Health Genetics Unit, Strangeways Research Laboratory, Worts Causeway, Cambridge CB1 8RN, UK seems to work well, but becomes prone to error when new alleles are found, and is of course inappropriate for most nonhuman populations and loci where reference standards are not available. Consequently, in most cases alleles need to be binned, with fragments assigned to allele categories according to their lengths. Alternative methods of binning include fitting observed allele lengths to bins defined by the repeat unit of the locus being considered (e.g. dinucleotide repeats being matched to a two base-pair periodicity) or simply rounding fragment lengths to the nearest base. Although standard binning methods appear to work, the goal of fully automated binning still seems some way off. For example, one study using the program genotyper to place 816 alleles from 12 loci in 38 samples (Ginot et al. 1996) reported an error rate of 5%, far higher than with manual approaches (Hoffman & Amos 2005a). Similarly, in a study using a commercial panel of 400 human primer pairs and 60 nonoptimized fine-mapping markers (Ewen et al. 2000), binning errors made by genotyper accounted for 21.05% and 39.62% of call errors respectively. Elsewhere, Weeks et al. (2002) compared results from two laboratories and found that 82.8% of discrepancies in dinucleotide repeat microsatellite scores and 50% of tetranucleotide
2 TECHNICAL ARTICLE 11 scores were due to different decisions about placing alleles in bins. Unfortunately, the overwhelming majority of studies that report and/or quantify binning errors belong to the field of human genetics, bringing two disadvantages: first, these results may well not reach the broader molecular ecology community, and second, they may underestimate the problem because human systems are by and large well-optimized. Binning errors may arise in many ways, but the most important and often overlooked problem is that DNA fragment mobility depends both on length and on sequence (Rosenblum et al. 1997; Wenz et al. 1998). Thus, DNA fragments with a high GC content migrate differently compared with equivalent length pieces having a low GC content. Consequently, depending on the GC content of the flanking sequence, alleles may show periodicities that are close to, but not exactly the same as the underlying repeat unit. At loci where the effective repeat length differs more than negligibly from the nominal repeat length, automatic fixed binning will often place bin boundaries within rather than between the clusters of lengths that represent each allele. Typically, predetermined bin widths work nicely at one end of the length distribution but poorly at the other end (Ewen et al. 2000). In addition to the problems of fixed-sized bins, there exist at least two further ways in which the binning process can be disrupted. First, mutations in the flanking sequence can create heterogeneity in the effective length of an allele, thereby broadening the length distribution and reducing the accuracy of binning. Second, while most mutations appear to involve changes of whole numbers of repeats, occasionally alleles change in length by other sizes, for example, a dinucleotide locus losing or gaining a single nucleotide. Such changes present a challenge for the binning process (Ewen et al. 2000) because the alleles they create may lie on what would otherwise be a valid bin boundary. Poor binning has a number of potentially detrimental consequences. First, even small numbers of resulting errors can seriously impact individual identification (Creel et al. 2003; Waits & Leberg 2003) and parental exclusion (Hoffman & Amos 2005a, b). Moreover, by splitting fragments that in reality represent one allele into two different length classes, the validity of allele frequency estimates will be undermined. This will adversely affect studies of population structure and distort estimates of key parameters such as heterozygosity. In the worst-case scenario, the outcome of binning will depend on the particular gel or experiment being run. For example, length estimates can vary with the speed at which polymerase chain reaction products run through a gel or capillary relative to the standard fragments, and this can in turn vary with stochastic variation in ambient temperature and with different qualities of the separation matrix (Ghosh et al. 1997; Rosenblum et al. 1997; Bruland et al. 1999; Williams et al. 1999; Davidson & Chiba 2003). When this happens, an allele run on one day may be consistently scored as length X when exactly the same allele in the same set of samples run on a different day would be scored as X ± 1 repeat or base pair. Since samples are often analysed in batches from similar times and places, this has the potential to create artefactual genetic differences. Most of the problems described above can be (and often are) addressed by careful manual scrutiny and adjustment where poor binning is apparent. However, in our experience this is not always the case, sometimes because the operator is unaware that automated binning is far from foolproof, and sometimes because of time constraints. Even where manual checking is carried out, the process is often tedious and time-consuming. To improve both the accuracy and speed of the binning process, we have therefore developed a simple algorithm that combines a flexible approach to finding the effective repeat unit size with an output that allows rigorous checking for any other problems such as intermediate allele sizes. To test the approach, we used a published data set from a whole genome screen in humans for hepatitis B. Materials and methods We have designed a program, flexibin to implement automated binning. The program is available on Bill Amos website ( and can also be obtained by contacting [email protected]. The basic algorithm is simple and has been coded in Visual Basic as an Excel Macro. First, to ensure that allele lengths reflect as much as possible the number of repeats they contain, all lengths are first trimmed to a length X min + 3, where X is the original length, min is the minimum allele length recorded and the value of 3 is added to eliminate problems of negative allele lengths (see below). The next step is to explore all likely binning patterns. In terms of actual mobility on a gel, each dinucleotide repeat unit will contribute approximately, but seldom exactly, 2 base pairs (bp). We refer to this as the effective repeat unit length, EL, assumed to lie between 1.7 to 2.3 bp for dinucleotides. In addition to some integer number of repeats, there is also likely to be a remainder, termed the offset, O. Ideally, all alleles will have an expected mobility of O + nel, where n is the number of repeat units. To find the best bin locations for a set of observed mobilities, we exhaustively explore all possible combinations of values for O and EL. For each combination, the program considers every observed fragment in turn and determines the nearest expected length, equivalent to the centre of the bin in which that allele would be placed. As a measure of fit, the value OL EL is stored, where OL is the observed length. Over all alleles, the goodness of fit to the
3 12 TECHNICAL ARTICLE Fig. 2 Frequency distribution of effective repeat sizes obtained for 276 human autosomal dinucleotide markers. The data set As a test of the binning process, data from a whole genome screen for hepatitis B were analysed. These data include raw allele lengths for 318 individuals typed for 276 autosomal markers. All samples were run on an ABI 377 machine with the manufacturer s internal size standards. Fig. 1 Representative cumulative allele length distributions. (a) Pattern obtained from a standard locus (d7s798) showing a clear, discrete allele size distribution; (b) one of the worst cases found (locus d7s2465), showing indistinct allele size classes where even manual binning would struggle to identify allele class boundaries. Alternate allele classes as inferred are shown in contrasting colours. current binning parameters is taken as the sum of the squared deviations between the observed length and the nearest bin centre. The smaller this sum, the greater is the tendency for fragments to lie towards the middle rather than the edge of each bin. To find the best binning parameter, the program steps through all possible parameter combinations, seeking to find the set that achieves the best fit. For maximum resolution, the search is conducted in two phases, one coarse and then a second, fine-grained search centred on the parameter values generated by the coarse search. When the bestfit values are found, all alleles are replaced with their repeat unit equivalents and an output is produced containing summary statistics, including the standard deviation of alleles in each bin, the mean and expected lengths of alleles in each bin, the estimated size of offset and the best-fit effective repeat length. To aid interpretation, a graphical output is also generated in which alternate alleles are colour-coded and plotted as a cumulative distribution. Any misplaced alleles are then spotted with ease (Fig. 1). Results Overall, the binning process experienced few problems, and most alleles were binned satisfactorily. Figure 1(a) illustrates a standard, accurate binning, with a graphical output for the cumulative allele length distribution. Figure 1(b) illustrates one of the worst cases found and reveals the potential problems faced by any binning program. Although some alleles reveal to nice, discrete length distributions, others merge with each other, apparently due to allele mobility heterogeneity. There is also the problem that allele lengths do not always conform to consecutive numbers of repeat units, but instead have large length gaps that will tend to exacerbate any mismatch between the effective and applied repeat unit length. Figure 2 depicts the observed distribution of effective repeat lengths. The range we find runs from 1.77 up to 2.23 bp, peaking clearly in the size category 1.9 to 1.95 bp, somewhat under 2 bp. Importantly, 30% of all alleles have an estimated effective repeat length of 1.9 bp or less, implying that an allele length range of 10 repeat units or more will see a proportion of alleles falling on bin boundaries if an exact 2-bp periodicity is assumed (the effective length is 0.1 bases shorter than 2 bp, so alleles 10 repeats apart will have lost one full base pair relative to a 2-bp scale). Figure 3 gives the overall distribution of standard deviations across all allele bins at all loci. The standard
4 TECHNICAL ARTICLE 13 Fig. 3 Relationship between mean allele length (bp) and standard deviation summarized over 276 autosomal markers showing that the majority of alleles have a low standard deviation (< 0.4) making misclassification unlikely. deviation for each allele bin is calculated as the square root of the variance of the mobilities of all fragments placed within that bin. Several features are apparent. First, the overwhelming majority of alleles have standard deviations less than 0.5, the level at which 95% of all alleles will be binned correctly (a standard deviation of 0.5 indicates that 95% of alleles lie within 1 bp of the bin centre). In practice, resolution will tend to be a little better than this because the deviations are taken relative to the predicted mean rather than the actual mean. Consequently, alleles whose mean does not match exactly the expected mean will tend to show a slightly inflated standard deviation. Satisfyingly, 98.3% of all alleles have a standard deviation less than 0.5, where misclassification is unlikely. The second trend is, overall, for the standard deviation to increase with increasing allele length. Finally, there is a pronounced periodicity, with standard deviations tending to be lower at lengths such as 100, 150 and 200 base pairs. These lengths correspond to the lengths of the internal standard fragments, and suggest that fragment length estimation is noticeably more accurate when the fragment lies near to a standard band. Again, this is to be expected and has been noticed previously as a tendency for between-laboratory consistency to be greater for alleles lying close to size standard bands (Jones et al. 1997). Discussion Here we describe a simple algorithm aimed at speeding up and increasing the accuracy of microsatellite allele-binning. Applied to a wide range of dinucleotide loci in humans, the results are encouraging, with a very high proportion of all alleles being apparently correctly assigned. Importantly, the generation of output statistics concerning the accuracy of the binning process allows a rapid check as to whether there are problems. We are hopeful that our approach will help to increase the level of automation in allele-calling (Bonin et al. 2004; Pompanon et al. 2005) without compromising the error rate. The extent to which binning is viewed as a serious problem varies between studies. Where allele diversity is low and/or the loci being typed are tri- or tetranucleotides, standard approaches may be both rapid and effective. However, dinucleotide markers remain widely used for a number of reasons, including the fact that they are generally easier to clone in numbers (Zane et al. 2002) and were often preferred over higher order repeats historically. Even with dinucloetide repeats, experienced researchers will know to spend the necessary time making manual adjustments. Most problems therefore arise either through lack of awareness that binning is by no means trivial, or because too little time is available or spent on manual rechecking. Perhaps the most striking aspect of this study is the range of effective repeat lengths found. The peak size estimate was in fact not two repeats, but a little under 2 bp, while the estimates ranged from as low as 1.77 bp through to an upper limit of 2.23 bp. This range corresponds approximately to the limits of the possible range enforced by the input search criteria, but this is no coincidence. During algorithm optimization, wider ranges were explored, including the use of single base periodicities. On wellbehaved loci, the same answer emerged regardless of the range allowed. However, on problematic loci, reduced constraints sometimes resulted in obviously unrealistic solutions, with good fits being achieved simply by having more allele length classes. The current parameter limits appear successful for two reasons. First, the distribution of effective repeat sizes is clearly bell-shaped with welldefined tails that fit within the allowable range. Second, the fit criteria indicate that the overwhelming majority of alleles binned fall close to the bin centres and rarely approach the interbin length boundaries. In the current study, only a tiny fraction of all alleles draw from a handful of loci revealing unusably poor binning, and most of these are easily rectifiable following manual inspection. This high success rate could be because the loci had been preselected for having relatively few problems due to, for example, alleles having both odd and even numbers of bases. However, this in no way implies that such problems do not arise or are even rare. For this reason, post hoc scrutiny remains a vital part of the whole allele calling procedure. To help alleviate this problem, possible problematic loci are flagged and a more detailed output generated. The most widely used automated system is genemapper which uses two stages of binning. At the start of a study, allele classes are defined using a combination of an automated procedure and manual input, adjusting suggested
5 14 TECHNICAL ARTICLE bin sizes and bin centres. Once all bins have been allocated, alleles are then called by finding the nearest bin centre. While we do not use genemapper ourselves, talking to colleagues reveals that three common problems seem apparent. First, none of the researchers we spoke to used the automated initial stage but instead focused on defining bins manually. Second, we also received reports of bin creep, a tendency for the migration rate of particular alleles to vary somewhat over time, in some cases causing a mismatch with the original bin definitions. Third, new alleles falling outside the expected range caused problems and were placed with poor accuracy. All three of these problems are largely negated by the system we use, where no reliance is placed on predefining a fixed series of bins. This offers a particular advantage for lower throughput, nonhuman systems where sample sizes are seldom large enough to be confident that all alleles have been found. In conclusion, we demonstrate an effective algorithm for binning the output from automatic genotyping systems. Although demonstrated for dinucleotide repeats, the most challenging class, our program will also operate on data from tri- and tetranucleotides. Applied to real data, we highlight the common problem that nominal repeat length frequently fails to match the effective repeat length as estimated by an automated sequencer. We hope that our algorithm will help to decrease errors associated with automated allele-calling. Acknowledgements Many thanks to Josephine Pemberton and Mathieu Joron for their helpful comments about using genemapper. The genotyping work was funded by the Wellcome Trust. Joe Hoffman is currently supported by the Isaac Newton Trust and the Balfour Fund. References Bonin A, Bellemain E, Bronken Eidesen P, Pompanon F, Brochmann C, Taberlet P (2004) How to track and assess genotyping errors in population genetic studies. Molecular Ecology, 13, Bruland O, Almqvist EW, Goldberg YP, Broman H, Hayden MR, Knappskog PM (1999) Accurate determination of the number of CAG repeats in the Huntington disease gene using a sequencespecific internal DNA standard. Clinical Genetics, 55, Creel S, Spong G, Sands JL et al. (2003) Population size estimation in Yellowstone wolves with error-prone noninvasive microsatellite genotypes. Molecular Ecology, 12, Davidson A, Chiba S (2003) Laboratory temperature variation is a previously unrecognised source of genotyping error during capillary electrophoresis. Molecular Ecology Notes, 3, Ewen KR, bahlo M, Treloar SA et al. (2000) Identification and analysis of error types in high-throughput genotyping. American Journal of Human Genetics, 67, Ghosh S, Karanjawala ZE, Hauser ER et al. (1997) Methods for precise sizing, automated binning of alleles, and reduction of error rates in large-scale genotyping using fluorescently labelled dinucleotide markers. FUSION (Finland US Investigation of NIDDM Genetics) Study Group. Genome Research, 7, Ginot F, Bordelais I, Nguyen S, Gyapay G (1996) Correction of some genotyping errors in automated fluorescent microsatellite analysis by enzymatic removal of one base overhangs. Nucleic Acids Research, 24, Hoffman JI, Amos W (2005a) Microsatellite genotyping errors: detection approaches, common sources and consequences for paternal exclusion. Molecular Ecology, 14, Hoffman JI, Amos W (2005b) Does kin selection influence fostering behaviour in Antarctic fur seals (Arctocephalus Gazella)? Proceedings of the Royal Society of London. Series B, Biological Sciences, 272, Jones CJ, Edwards KJ, Castaglione S et al. (1997) Reproducibility testing of RAPD, AFLP and SSR markers in plants by a network of European laboratories. Molecular Breeding, 3, Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nature Reviews Genetics, 6, Rosenblum BB, Oaks F, Menchen S, Johnson B (1997) Improved single-stranded DNA sizing accuracy in capillary electrophoresis. Nucleic Acids Research, 25, Schlotterer C (2004) The evolution of molecular markers-just a matter of fashion? Nature Reviews Genetics, 5, Waits JL, Leberg PL (2003) Biases associated with population estimation using molecular tagging. Animal Conservation, 3, Weeks DE, Conley YP, Ferrell RE, Mah TS, Gorin MB (2002) A tale of two genotypes: consistency between two high-throughput genotyping centres. Genome Research, 12, Wenz HM, Robertson JM, Menchen S et al. (1998) High-precision genotyping by denaturing capillary electrophoresis. Genome Research, 8, Williams LC, Hegde MR, Herrera G, Stapleton PM, Love DR (1999) Comparative semi-automated analysis of (CAG) repeats in Huntington disease gene: use of internal standards. Molecular and Cellular Probes, 13, Zane L, Bargelloni L, Patarnello T (2002) Strategies for microsatellite isolation: a review. Molecular Ecology, 11, 1 16.
Gene Mapping Techniques
Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction
Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company
Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just
Forensic DNA Testing Terminology
Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.
Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology
Lecture 13: DNA Technology DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology DNA Sequencing determine order of nucleotides in a strand of DNA > bases = A,
Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne
Sanger Sequencing and Quality Assurance Zbigniew Rudzki Department of Pathology University of Melbourne Sanger DNA sequencing The era of DNA sequencing essentially started with the publication of the enzymatic
2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence
Introduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
Single Nucleotide Polymorphisms (SNPs)
Single Nucleotide Polymorphisms (SNPs) Additional Markers 13 core STR loci Obtain further information from additional markers: Y STRs Separating male samples Mitochondrial DNA Working with extremely degraded
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
A guide to the analysis of KASP genotyping data using cluster plots
extraction sequencing genotyping extraction sequencing genotyping extraction sequencing genotyping extraction sequencing A guide to the analysis of KASP genotyping data using cluster plots Contents of
Artisan Scientific is You~ Source for: Quality New and Certified-Used/Pre:-awned ECJuiflment
Looking for more information? Visit us on the web at http://www.artisan-scientific.com for more information: Price Quotations Drivers Technical Specifications. Manuals and Documentation Artisan Scientific
Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA
Page 1 of 5 Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Genetics Exercise: Understanding how meiosis affects genetic inheritance and DNA patterns
Commonly Used STR Markers
Commonly Used STR Markers Repeats Satellites 100 to 1000 bases repeated Minisatellites VNTR variable number tandem repeat 10 to 100 bases repeated Microsatellites STR short tandem repeat 2 to 6 bases repeated
TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL
TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL A collection of Visual Basic macros for the analysis of terminal restriction fragment length polymorphism data Nils Johan Fredriksson TOOLS FOR T-RFLP DATA ANALYSIS
14.3 Studying the Human Genome
14.3 Studying the Human Genome Lesson Objectives Summarize the methods of DNA analysis. State the goals of the Human Genome Project and explain what we have learned so far. Lesson Summary Manipulating
Data Analysis for Ion Torrent Sequencing
IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
Hidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium
Computer with GeneMapper ID (version 3.2.1 or most current) software Microsoft Excel, Word Print2PDF software
Procedure for GeneMapper ID for Casework 1.0 Purpose-This procedure specifies the steps for performing analysis on DNA samples amplified with AmpFlSTR Identifiler Plus using the GeneMapper ID (GMID) software.
Becker Muscular Dystrophy
Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
Isolation and characterization of nine microsatellite loci in the Pale Pitcher Plant. MARGARET M. KOOPMAN*, ELIZABETH GALLAGHER, and BRYAN C.
Page 1 of 28 1 1 2 3 PERMANENT GENETIC RESOURCES Isolation and characterization of nine microsatellite loci in the Pale Pitcher Plant Sarracenia alata (Sarraceniaceae). 4 5 6 MARGARET M. KOOPMAN*, ELIZABETH
DNA Detection. Chapter 13
DNA Detection Chapter 13 Detecting DNA molecules Once you have your DNA separated by size Now you need to be able to visualize the DNA on the gel somehow Original techniques: Radioactive label, silver
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
NATIONAL GENETICS REFERENCE LABORATORY (Manchester)
NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits
Real-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
The Techniques of Molecular Biology: Forensic DNA Fingerprinting
Revised Fall 2011 The Techniques of Molecular Biology: Forensic DNA Fingerprinting The techniques of molecular biology are used to manipulate the structure and function of molecules such as DNA and proteins
How many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics
The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The GS Junior System The Power of Next-Generation Sequencing on Your Benchtop Proven technology: Uses the same long
Mitochondrial DNA Analysis
Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)
Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR
Roche Applied Science Technical Note No. LC 18/2004 Purpose of this Note Assay Formats for Use in Real-Time PCR The LightCycler Instrument uses several detection channels to monitor the amplification of
The Biotechnology Education Company
EDVTEK P.. Box 1232 West Bethesda, MD 20827-1232 The Biotechnology 106 EDV-Kit # Principles of DNA Sequencing Experiment bjective: The objective of this experiment is to develop an understanding of DNA
Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
Innovations in Molecular Epidemiology
Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether
Gene Expression Assays
APPLICATION NOTE TaqMan Gene Expression Assays A mpl i fic ationef ficienc yof TaqMan Gene Expression Assays Assays tested extensively for qpcr efficiency Key factors that affect efficiency Efficiency
Molecular typing of VTEC: from PFGE to NGS-based phylogeny
Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing
DNA PROFILING IN FORENSIC SCIENCE
DA PROFILIG I FORESIC SCIECE DA is the chemical code that is found in every cell of an individual's body, and is unique to each individual. Because it is unique, the ability to examine DA found at a crime
Copyright 2007 Casa Software Ltd. www.casaxps.com. ToF Mass Calibration
ToF Mass Calibration Essentially, the relationship between the mass m of an ion and the time taken for the ion of a given charge to travel a fixed distance is quadratic in the flight time t. For an ideal
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
MASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005
Deutsche Akkreditierungsstelle GmbH German Accreditation Body Annex to the Accreditation Certificate D-PL-13372-01-00 according to DIN EN ISO/IEC 17025:2005 Period of validity: 26.03.2012 to 25.03.2017
PreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
The Human Genome Project
The Human Genome Project Brief History of the Human Genome Project Physical Chromosome Maps Genetic (or Linkage) Maps DNA Markers Sequencing and Annotating Genomic DNA What Have We learned from the HGP?
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
6 Scalar, Stochastic, Discrete Dynamic Systems
47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1
DNA Sequence Analysis
DNA Sequence Analysis Two general kinds of analysis Screen for one of a set of known sequences Determine the sequence even if it is novel Screening for a known sequence usually involves an oligonucleotide
Essentials of Real Time PCR. About Sequence Detection Chemistries
Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: 212-998-8295 Office: 764 Brown Email: ignatius.tan@nyu.
Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: 212-998-8295 Office: 764 Brown Email: [email protected] Course Hours: Section 1: Mon: 12:30-3:15 Section 2: Wed: 12:30-3:15
2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number
application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000
Multiplex your most important
Multiplex your most important genetic assays on one platform GenomeLab GeXP Genetic Analysis System Blood Banking Capillary Electrophoresis Centrifugation Flow Cytometry Genomics Lab Automation Lab Tools
Introduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
Objectives: Vocabulary:
Introduction to Agarose Gel Electrophoresis: A Precursor to Cornell Institute for Biology Teacher s lab Author: Jennifer Weiser and Laura Austen Date Created: 2010 Subject: Molecular Biology and Genetics
Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR
Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR BioTechniques 25:415-419 (September 1998) ABSTRACT The determination of unknown DNA sequences around
July 7th 2009 DNA sequencing
July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer
The Chinese University of Hong Kong School of Life Sciences Biochemistry Program CUGEN Ltd.
The Chinese University of Hong Kong School of Life Sciences Biochemistry Program CUGEN Ltd. DNA Forensic and Agarose Gel Electrophoresis 1 OBJECTIVES Prof. Stephen K.W. Tsui, Dr. Patrick Law and Miss Fion
Content Sheet 7-1: Overview of Quality Control for Quantitative Tests
Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management
Factors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing
STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER
Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
TIPS DATA QUALITY STANDARDS ABOUT TIPS
2009, NUMBER 12 2 ND EDITION PERFORMANCE MONITORING & EVALUATION TIPS DATA QUALITY STANDARDS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance
Basic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
Chapter 13: Meiosis and Sexual Life Cycles
Name Period Chapter 13: Meiosis and Sexual Life Cycles Concept 13.1 Offspring acquire genes from parents by inheriting chromosomes 1. Let s begin with a review of several terms that you may already know.
Using Digital Photography to Supplement Learning of Biotechnology. Methods
RESEARCH ON LEARNING Using Digital Photography to Supplement Learning of Biotechnology Fran n orf l u s AbstrAct The author used digital photography to supplement learning of biotechnology by students
SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE
AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,
DNA Sequencing Handbook
Genomics Core 147 Biotechnology Building Ithaca, New York 14853-2703 Phone: (607) 254-4857; Fax (607) 254-4847 Web: http://cores.lifesciences.cornell.edu/brcinfo/ Email: [email protected] DNA Sequencing
DNA and Forensic Science
DNA and Forensic Science Micah A. Luftig * Stephen Richey ** I. INTRODUCTION This paper represents a discussion of the fundamental principles of DNA technology as it applies to forensic testing. A brief
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES
NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES EDWARD ROBINSON & JOHN A. BULLINARIA School of Computer Science, University of Birmingham Edgbaston, Birmingham, B15 2TT, UK [email protected] This
SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
Chapter 6 DNA Replication
Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore
Simple Random Sampling
Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for
The following chapter is called "Preimplantation Genetic Diagnosis (PGD)".
Slide 1 Welcome to chapter 9. The following chapter is called "Preimplantation Genetic Diagnosis (PGD)". The author is Dr. Maria Lalioti. Slide 2 The learning objectives of this chapter are: To learn the
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
Measurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis
Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype
VLLM0421c Medical Microbiology I, practical sessions. Protocol to topic J10
Topic J10+11: Molecular-biological methods + Clinical virology I (hepatitis A, B & C, HIV) To study: PCR, ELISA, your own notes from serology reactions Task J10/1: DNA isolation of the etiological agent
PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data
PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data Ryhan Pathan Department of Electrical Engineering and Computer Science University of Tennessee Knoxville Knoxville,
American Association for Laboratory Accreditation
Page 1 of 12 The examples provided are intended to demonstrate ways to implement the A2LA policies for the estimation of measurement uncertainty for methods that use counting for determining the number
Distributed Bioinformatics Computing System for DNA Sequence Analysis
Global Journal of Computer Science and Technology: A Hardware & Computation Volume 14 Issue 1 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.
An Overview of DNA Sequencing
An Overview of DNA Sequencing Prokaryotic DNA Plasmid http://en.wikipedia.org/wiki/image:prokaryote_cell_diagram.svg Eukaryotic DNA http://en.wikipedia.org/wiki/image:plant_cell_structure_svg.svg DNA Structure
DNA FRAGMENT ANALYSIS by Capillary Electrophoresis
DNA FRAGMENT ANALYSIS by Capillary Electrophoresis USER GUIDE DNA Fragment Analysis by Capillary Electrophoresis Publication Number 4474504 Rev. A Revision Date September 2012 For Research Use Only. Not
Innovative Techniques and Tools to Detect Data Quality Problems
Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful
Real-time qpcr Assay Design Software www.qpcrdesign.com
Real-time qpcr Assay Design Software www.qpcrdesign.com Your Blueprint For Success Informational Guide 2199 South McDowell Blvd Petaluma, CA 94954-6904 USA 1.800.GENOME.1(436.6631) 1.415.883.8400 1.415.883.8488
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Crime Scenes and Genes
Glossary Agarose Biotechnology Cell Chromosome DNA (deoxyribonucleic acid) Electrophoresis Gene Micro-pipette Mutation Nucleotide Nucleus PCR (Polymerase chain reaction) Primer STR (short tandem repeats)
A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES, Vol. 59, No. 1, 2011 DOI: 10.2478/v10175-011-0015-0 Varia A greedy algorithm for the DNA sequencing by hybridization with positive and negative
IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR)
IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR) Background Infectious diseases caused by pathogenic organisms such as bacteria, viruses, protozoa,
