SNP and destroy  a discussion of a weighted distancebased SNP selection algorithm


 Iris Todd
 1 years ago
 Views:
Transcription
1 SNP and destroy  a discussion of a weighted distancebased SNP selection algorithm David A. Hall Rodney A. Lea November 14, 2005 Abstract Recent developments in bioinformatics have introduced a number of methods for quickly typing a large number of Single Nucleotide Polymorphisms (SNPs) in the human genome. Due to time and cost constraints, carrying out similar typing in large populations may not be viable. Further to this, due to linkage between SNPs, any typing done at one location may be fully predictive for typing carried out at another adjacent SNP. A Java program has been developed (SNPBlaster) that is able to carry out a weighted iterative SNP selection procedure, which may be useful in weeding out SNPs that are not likely to be useful in a population SNP screen, based on frequency differences between two populations. Included in this discussion is an application of the algorithm for finding SNPs on chromosome 4, approximately 1MB apart, that are likely to be informative for ancestry. While the program is useful in its current form, there are a few cautions that should be taken into account related to the prototype / test nature of the program. 1 Introduction An increasingly large number of Single Nucleotide Polymorphisms (SNPs) are being found in the human genome, with the help of largescale genotyping projects such as HapMap (The International HapMap Consortium, 2003). However, due to cost constraints, most studies will need to select a much smaller number of SNPs, preferably ones with a high information content. A Java application, SNPBlaster, has been designed that attempts to remove SNPs from a large set, retaining only those SNPs that would be useful or informative. The program combines positional information with a arbitrary (user defined) information measure, referred to in this paper as a weighting factor. It is assumed that this measure at each SNP has already been calculated, and 1
2 uses that measure to determine, within a small region of the chromosome (window), what SNPs should be removed. This paper gives an example of the use of SNPBlaster using population differences as the measure, with a window size of 1MB. 2 Algorithm 2.1 Summary of the algorithm j l a b c d e f g h i k m n o Figure 1: This figure indicates a hypothetical situation in which a number of equally weighted markers are given to the SNPBlaster program to remove. The SNPBlaster program will remove markers b,c,e,g,i,j,k,l and o with the given window size. The algorithm begins with a group of SNPs with a high weighting factor ( ), removing those that are closer than a certain distance (the window size) to other SNPs. If there is more than one SNP that may be removed, then the algorithm will attempt to remove the SNP that will result in a better distribution of SNPs, such that the variance in distance between SNPs is low. Once this group is dealt with, the algorithm locks the SNPs so that they can t be removed, then continues on with the next group of SNPs, until all SNPs have been tested for potential removal from the marker set. 2.2 Detail The SNPBlaster program used an O(n) iterative algorithm to select SNPs from a list for a given window size, given the SNP location and an associated weighting factor. The algorithm begins with an empty list of SNPs to be worked on, and at the beginning of each iteration, moves SNPs from the complete list of SNPs into the working list. Each group of SNPs is anything still in the complete list that has a weight factor greater than or equal to the threshold value (which starts at 1.0, and is reduced by 0.1 after each iterative step). After each step, the SNPs that remain in the working list are locked, 2
3 so that they will not be removed, even if another more appropriate SNP, location wise, is found the assumption being that a high weight factor is more important than the location of the SNP. The core part of the algorithm involves a sweep through the chromosome, selecting up to four adjacent SNPs in order to decide which to remove. For the purpose of explanation, they are named p2, p1, c1, and c2 ( previous and current ), with the main decision process involving deciding which of p1 or c1 (if any) to remove. After each decision, markers are shifted as necessary based on the decision that has been made. Looking at figure 1, it may help to note that p2, p1, c1 and c2 start out as a, b, c and d respectively. There are a few trivial cases in the process, which may be useful in pointing out before discussing more complex cases. Firstly, if p1 and c1 are further apart than the window size and p1 and p2 are also further apart than the window size, then no SNP removal is carried out. Also, no removal is carried out if the closest markers (within a window) are locked a situation that should only occur if the SNPs are explicitly locked by the user. In cases where p1 is closer than the window size to either c1 or p2 (and at least one is unlocked), then a SNP will be removed. If one SNP is locked, then the other will be removed (with a decision being made on the p1,c1 pair first, if possible). If c1 and p1 are closer than the window size to each other, and neither are locked, then the SNP that is removed will be the one that results in the most even spread between the remaining three (of the four) markers. In the example in figure 1, b will be the first marker removed, because the distance between a and b is less than the distance between c and d. 3 Application The following details an example application of the SNPBlaster algorithm to SNPs recorded in the HapMap database (The International HapMap Consortium, 2003). 3
4 3.1 Preparation All available nonredundant SNPs for chromosome 4 as at 15 August 2005 were loaded into a database. The difference in genotype frequency for the nonreference ( rare ) alleles between the CEU population (Utah residents with ancestry from northern and western Europe) and the CHB population (Han Chinese in Beijing, China) was used to weight the usefulness of the alleles  a value of 1 indicated 100% difference in allele frequencies, while 0 indicated no difference in allele frequencies (a more rigorous study may take the minimum difference for all population combinations). The rs#, chromosome position and frequency difference for each SNP was exported into a text file, and the SNPBlaster header (with a chromosome length of base pairs) was added to prepare the file for the program. 3.2 Running Figure 2: A plot of the SNPs on chromosome 4 chosen by SNPBlaster that appear to be informative for ancestry. The measure (given on the y axis) is the allele frequency difference between two populations, labelled CEU and CHB in the HapMap project. After the input file was prepared, the program was run, loading up the prepared input file, and setting a window size of 1MB. From an input file of approximately markers, an output file containing 146 markers was generated, giving an average SNP 4
5 separation distance of about 1.35MB. A graphical representation of the SNPs that were selected is shown in figure Speed Loading all the HapMap information for chromosome 4 into the database certainly took the longest time. The process took approximately 17 minutes, but it may be possible to reduce this to around 25 minutes depending on the database format, and program used to import the data. In comparison to this, the SNPBlaster algorithm was significantly faster, typically taking 25 seconds for the loading process, and a similar time for the iterative algorithm. This suggests that the algorithm is reasonably fast, and is likely to be very useful for carrying out SNP selection tasks in the future. 4 Discussion 4.1 Distancebased information [Something about crossover etc, centimorgans] One paper has decided that a crossover frequency of 1% is sufficient for detecting recent population structure. This relates to a base pair distance of approximately one megabase [is it reasonable to say that we can treat the mutation rate as effectively zero? does the mutation rate matter? does it remove information at a SNP, or add information to a SNP?]. Regardless of the approach, it is reasonable to assume that within a certain distance, two SNP mutations are likely to have a high degree of linkage a specific mutation in one SNP will almost always correspond to a specific mutation in the other SNP. For this reason, it is not useful to record mutations at both of those SNPs in a study. This minimum allowed distance between SNPs will be referred to as the window size. The more SNPs that are typed, the greater the cost per individual, meaning that a smaller number of individuals will be able to be typed for the same amount of money. With a large window size, the cost per individual will be low, but it is also likely that there will be a loss of information because some of the variation will be missed by not typing enough SNPs. 5
6 4.2 Measurebased information Another approach to determining the information derived from specific SNPs is to carry out some function on each SNP to give an idea of the information content of that SNP. This function, regardless of its method, will typically identify the SNPs that will be the most useful in any investigation. It would be expected that something that did this would also be able to choose which of two SNPs would be more appropriate for an investigation Population differences One way to get an estimate of the information content with respect to the ancestry of an individual is to determine differences between populations at specific SNPs. An example of this (the example used in this paper) is one that compares allele frequency differences of a specific mutation of a SNP. The reasoning behind this is as follows: if a certain mutation is always present in one population (or more correctly a small subset of that population), but never present in another population, then that SNP will be informative in determinining the proportion of ancestry [or something similar] that an unknown individual has relative to those population groups. In this sense, a SNP with a high frequency difference between populations will be considered useful, and one that has a low difference between populations will not be useful. Other methods for determining the information content of SNPs are available. A reader who is interested in these may like to read [cite some relevant papers]. 4.3 Binocular vision There are issues involved with choosing only one of these two information procedures in SNP selection. Working purely on distance based information is likely to mean that some of the SNPs that are chosen will not be informative enough for the investigation, even if a more appropriate SNP is available nearby. Working purely on measurebased information may mean that some variation will be missed, and there may be a lot of unnecessary content. To further explain this, if many highly informative SNPs were in a single area of the chromosome (none were further apart from each other than the window size), then all the SNPs would be expected to carry linked mutations any one 6
7 of them could be used to infer the mutations at the other locations. In addition to this, some parts of the chromosome may be missed out, because those parts only have SNPs with a very low measurebased information content. It is probably worth noting that some SNP information measures will also take into account local distance information. However, the algorithms used to generate these measures may be very processor intensive, because they require recalculation after each SNP removal. The SNPBlaster algorithm gets around the issue of complexity due to recalculation by grouping SNPs of the same measure together, and from then on using distance methods to select SNPs. While the measure is the key factor in determing which SNP is chosen locally, the window size is typically more important on a whole chromosome level. 4.4 Similar programs Another application, CHOISS (Lee and Kang, 2004), is currently available to carry out a similar SNP selection process, choosing SNPs (either by stating an interval, or by stating the required number of SNPs) to minimise variance. After attempting to use the webbased version of CHOISS with a data set derived from chromosome 4 (approximately 50,000 markers), the web interface timed out at fifteen minutes. The algorithmic complexity of CHOISS is reported to be O(n 2 ), while the complexity of SNPBlaster is O(n). For small to medium numbers of SNPs (possibly up to around 5000), this may not make a significant difference, but above that level, the solution time for SNPBlaster is likely to be significantly less than that of CHOISS. However, there may be situations in which CHOISS is more appropriate, most likely those situations where a rough guess at the best SNPs is not appropriate. After downloading CHOISS, it was noticed that the algorithm did in fact run in a reasonably short period of time (2 minutes, compared with around 5 seconds for SNPBlaster). However, the output was not what was expected. When SNPs in the input file were in a random order, the CHOISS algorithm selected approximately 2000 SNPs, with a reported average distance of 1MB for a chromosome of total length around 200MB. When the SNPs were sorted, the algorithm did not select any SNPs. It is likely that this selection process had more to do with range overflows (i.e. numbers being too large) rather than problems with the algorithm itself, but it will be difficult to work out 7
8 for sure without a more thorough analysis of the program. [I should probably contact the authors then] 4.5 Caveats While the algorithm as described is useful for quick large scale selection of SNPs, there are a number of factors that may make it not as reliable as would be expected. At the moment, the program will only work for SNPs on a single chromosome. If more chromosomes are desired (as would be expected for a full genome selection process), then it is necessary to work on each chromosome individually. The algorithm is iterative, but does not have soft divisions between different weight factors. This means that SNPs with a weighting at the low end of a grouping (e.g. 0.91) are considered to be just as important as SNPs at the high end of that grouping (e.g. 0.99). In some cases, this may be alleviated by increasing the number of iterative divisions in the algorithm, but this would remove the linking between similar weights. An alternative procedure would attempt to determine a relationship between the distance between markers and the weighting factor, although it is likely that such a procedure would be only able to be applied to specific types of studies. Such an approach will certainly increase the amount of processing required (as all SNPs within a window will need to be tested, rather than just the closest), but as long as the window size is small, this increase is unlikely to make the algorithm unmanageably slow. The window size defined in the algorithm refers to the minimum allowed distance between SNPs. If there is enough coverage, then the maximum distance between SNPs will be just under twice the window size. Knowledge of this variation in SNP distance may be important in choosing a window size for the algorithm, because the average SNP distance is likely to be closer to 1.5 the window size. An increase in the number of iterative divisions is likely to increase the average distance between SNPs, as it is less likely that two SNPs within one of the weight ranges will be close to each other. 4.6 Other Applications While the algorithm was designed for the purpose of removing SNPs in order to obtain a panel of useful markers for further studies, attempts have been made to make the algorithm as generic as possible. There is potential for similar processes to be carried 8
9 out in other areas anywhere in which a welldistributed selection is required along a linear track, and there are a limited number of choices along that track. References Lee, S. and Kang, C. (2004). Choiss for selection of single nucleotide polymorphism markers on interval regularity, Bioinformatics 20(4): Liu, Z. and Lin, S. (2005). Multilocus LD measure and tagging SNP selection with generalized mutual information, Genetic Epidemiology?(?): 1 10? Sebastiani, P., Lazarus, R., Weiss, S. T., Kunkel, L. M., Kohane, I. S. and Ramoni, M. F. (2003). Minimal haplotype tagging, PNAS 100(17): The International HapMap Consortium (2003). The international hapmap project, Nature 426:
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium
More informationGlobally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
More informationUpdated 10/28/2007 Software to download prior to using HapMap Java Haploview
Updated 10/28/2007 Software to download prior to using HapMap Java http://www.java.com/ Haploview http://www.broad.mit.edu/mpg/haploview/ Use of HapMap: Find HapMap SNPs near a gene or region of interest
More informationSNPbrowser Software v3.5
Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the KnowledgeDriven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium
More informationJournal of Statistical Software
JSS Journal of Statistical Software October 2006, Volume 16, Code Snippet 3. http://www.jstatsoft.org/ LDheatmap: An R Function for Graphical Display of Pairwise Linkage Disequilibria between Single Nucleotide
More informationGAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters
GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters Michael B Miller , Michael Li , Gregg Lind , SoonYoung
More informationStepbyStep Guide to BiParental Linkage Mapping WHITE PAPER
StepbyStep Guide to BiParental Linkage Mapping WHITE PAPER JMP Genomics StepbyStep Guide to BiParental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps
More informationSupplementary Methods: Recombination Rate calculations: Hotspot identification:
Supplementary Methods: Recombination Rate calculations: To calculate recombination rates we used LDHat version 2[1] with minor modifications introduced to simplify the use of the program in a batch environment.
More informationCombining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan
Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined
More informationAlgorithms. Theresa MiglerVonDollen CMPS 5P
Algorithms Theresa MiglerVonDollen CMPS 5P 1 / 32 Algorithms Write a Python function that accepts a list of numbers and a number, x. If x is in the list, the function returns the position in the list
More informationExtraneous markers used for genetic similarity leads to loss of power in GWAS and heritability determination
Extraneous markers used for genetic similarity leads to loss of power in GWAS and heritability determination Christoph Lippert 1*, Gerald Quon 1, Jennifer Listgarten 1*, and David Heckerman 1* 1 escience
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationMapping bias overestimates reference allele frequencies at the HLA genes in the 1000 Genomes Project phase I data
Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 Genomes Project phase I data Débora Y. C. Brandt*, Vitor R. C. Aguiar*, Bárbara D. Bitarello*, Kelly Nunes*, Jérôme
More informationBenchmarking Student Learning Outcomes using Shewhart Control Charts
Benchmarking Student Learning Outcomes using Shewhart Control Charts Steven J. Peterson, MBA, PE Weber State University Ogden, Utah This paper looks at how Shewhart control charts a statistical tool used
More informationInvestigating the genetic basis for intelligence
Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.coggenomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a
More informationQTL Mapping using WinQTLCart V2.5
QTL Mapping using WinQTLCart V2.5 Balram Marathi 1, A. K. Singh 2, Rajender Parsad 3 and V.K. Gupta 3 1 Institute of Biotechnology, Acharya N. G. Ranga Agricultural University, Rajendranagar, Hyderabad,
More informationGene Mapping Techniques
Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction
More informationBAPS: Bayesian Analysis of Population Structure
BAPS: Bayesian Analysis of Population Structure Manual v. 6.0 NOTE: ANY INQUIRIES CONCERNING THE PROGRAM SHOULD BE SENT TO JUKKA CORANDER (first.last at helsinki.fi). http://www.helsinki.fi/bsg/software/baps/
More informationDNAAnalytik III. Genetische Variabilität
DNAAnalytik III Genetische Variabilität Genetische Variabilität Lexikon Scherer et al. Nat Genet Suppl 39:s7 (2007) Genetische Variabilität Sequenzvariation Mutationen (Mikro~) Basensubstitution Insertion
More informationSNP Essentials The same SNP story
HOW SNPS HELP RESEARCHERS FIND THE GENETIC CAUSES OF DISEASE SNP Essentials One of the findings of the Human Genome Project is that the DNA of any two people, all 3.1 billion molecules of it, is more than
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationLecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationSorting, Polynomials
Sorting, Polynomials http://people.sc.fsu.edu/ jburkardt/isc/week07 lecture 14.pdf... ISC3313: Introduction to Scientific Computing with C++ Summer Semester 2011... John Burkardt Department of Scientific
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More information(1p) 2. p(1p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1p)^2 + ½(1p)p + ¼(p^2) #Dpy + #DpyUnc
Advanced genetics Kornfeld problem set_key 1A (5 points) Brenner employed 2factor and 3factor crosses with the mutants isolated from his screen, and visually assayed for recombination events between
More informationStatistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western
More informationBAPS: Bayesian Analysis of Population Structure
BAPS: Bayesian Analysis of Population Structure Manual v. 5.3 NOTE: ANY INQUIRIES CONCERNING THE PROGRAM SHOULD BE SENT TO JUKKA CORANDER. EMAIL ADDRESS IS VISIBLE AT THE BAPS WEBPAGE: http://web.abo.fi/fak/mnf//mate/jc/software/baps.html
More informationAP Statistics 2008 Scoring Guidelines Form B
AP Statistics 2008 Scoring Guidelines Form B The College Board: Connecting Students to College Success The College Board is a notforprofit membership association whose mission is to connect students
More informationYDNA FACT SHEET. Bruce A. Crawford
YDNA FACT SHEET By Bruce A. Crawford For those not familiar with DNA analysis and particularly YDNA, the following explanation may help. DNA is the basic building block of cell information and heredity.
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationGOBII. Genomic & Opensource Breeding Informatics Initiative
GOBII Genomic & Opensource Breeding Informatics Initiative My Background BS Animal Science, University of Tennessee MS Animal Breeding, University of Georgia Random regression models for longitudinal
More informationGenotyping and quality control of UK Biobank, a large scale, extensively phenotyped prospective resource
Genotyping and quality control of UK Biobank, a large scale, extensively phenotyped prospective resource Information for researchers Interim Data Release, 2015 1 Introduction... 3 1.1 UK Biobank... 3
More informationComparison of Major Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments
Comparison of Maor Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments A. Sima UYAR and A. Emre HARMANCI Istanbul Technical University Computer Engineering Department Maslak
More informationMAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters
MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for
More informationDnaSP, DNA polymorphism analyses by the coalescent and other methods.
DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. SánchezDelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,
More informationStep by Step Guide to Importing Genetic Data into JMP Genomics
Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one
More informationLDPlus. Visualizing SNP Statistics in the Context of Linkage. Disequilibrium
LD LDPlus Visualizing SNP Statistics in the Context of Linkage Disequilibrium Introduction LDPlus is a data visualization script for the display of single SNP statistics in the context of linkage disequilibrium
More informationASSIsT: An Automatic SNP ScorIng Tool for in and outbreeding species Reference Manual
ASSIsT: An Automatic SNP ScorIng Tool for in and outbreeding species Reference Manual Di Guardo M, Micheletti D, Bianco L, Koehorstvan Putten HJJ, Longhi S, Costa F, Aranzana MJ, Velasco R, Arús P, Troggio
More informationLab 11. Simulations. The Concept
Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that
More informationChinook analysis report
Chinook analysis report Mars Veterinary 04/1/09 Introduction... 2 Data source, error checking, and validation... 4 Analysis:... 5 Investigating Haplotypes... 5 Looking at common haplotypes between breeds...
More informationInfinite Campus Grade Book BETA
Infinite Campus Grade Book BETA This tool was released for an open beta testing period. This new Grade Book will continue to exist parallel to the current Grade Book. All Teachers in the Nelson County
More informationWEEK 2: INTRODUCTION TO MOTION
Names Date OBJECTIVES WEEK 2: INTRODUCTION TO MOTION To discover how to use a motion detector. To explore how various motions are represented on a distance (position) time graph. To explore how various
More informationMinesweeper as a Constraint Satisfaction Problem
Minesweeper as a Constraint Satisfaction Problem by Chris Studholme Introduction To Minesweeper Minesweeper is a simple one player computer game commonly found on machines with popular operating systems
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More information2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationGenomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
More informationUsing a Genetic Algorithm to Solve Crossword Puzzles. Kyle Williams
Using a Genetic Algorithm to Solve Crossword Puzzles Kyle Williams April 8, 2009 Abstract In this report, I demonstrate an approach to solving crossword puzzles by using a genetic algorithm. Various values
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationThe Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPAPositive Rheumatoid Arthritis
The Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPAPositive Rheumatoid Arthritis Yan Du Peking University People s Hospital 100044 Beijing CHINA
More informationPopulation 1 Population 2. A a A a p 1. 1m m m 1m. A a A a. ' p 2
Gene Flow Up to now, we have dealt with local populations in which all individuals can be viewed as sharing a common system of mating. But in many species, the species is broken up into many local populations
More informationA very brief introduction to genetic algorithms
A very brief introduction to genetic algorithms Radoslav Harman Design of experiments seminar FACULTY OF MATHEMATICS, PHYSICS AND INFORMATICS COMENIUS UNIVERSITY IN BRATISLAVA 25.2.2013 Optimization problems:
More informationCloudBased Big Data Analytics in Bioinformatics
CloudBased Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationContent DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS
Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction
More informationAn analysis of the 2003 HEFCE national student survey pilot data.
An analysis of the 2003 HEFCE national student survey pilot data. by Harvey Goldstein Institute of Education, University of London h.goldstein@ioe.ac.uk Abstract The summary report produced from the first
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationCracking the Sudoku: A Deterministic Approach
Cracking the Sudoku: A Deterministic Approach David Martin Erica Cross Matt Alexander Youngstown State University Center for Undergraduate Research in Mathematics Abstract The model begins with the formulation
More informationOnline Resource 6. Estimating the required sample size
Online Resource 6. Estimating the required sample size Power calculations help program managers and evaluators estimate the required sample size that is large enough to provide sufficient statistical power
More informationDNA PHENOTYPING: PREDICTING ANCESTRY AND PHYSICAL APPEARANCE FROM FORENSIC DNA
DNA PHENOTYPING: PREDICTING ANCESTRY AND PHYSICAL APPEARANCE FROM FORENSIC DNA Ellen McRae Greytak, PhD* and Steven Armentrout, PhD Parabon NanoLabs, Inc., 11260 Roger Bacon Dr., Suite 406, Reston, VA
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationResearch Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement
Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.
More informationAsexual Versus Sexual Reproduction in Genetic Algorithms 1
Asexual Versus Sexual Reproduction in Genetic Algorithms Wendy Ann Deslauriers (wendyd@alumni.princeton.edu) Institute of Cognitive Science,Room 22, Dunton Tower Carleton University, 25 Colonel By Drive
More informationMinitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab StepByStep
Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different
More informationLab 4: 26 th March 2012. Exercise 1: Evolutionary algorithms
Lab 4: 26 th March 2012 Exercise 1: Evolutionary algorithms 1. Found a problem where EAs would certainly perform very poorly compared to alternative approaches. Explain why. Suppose that we want to find
More informationHigh Throughput Testing (HTT) Overview of ProTest and Praxis
High Throughput Testing (HTT) Overview of ProTest and Praxis HTT Overview High Throughput Testing (HTT) is a new technology which provides a solution to the problem of excessive test cases and/or poorly
More informationThe effect of population history on the distribution of the Tajima s D statistic
The effect of population history on the distribution of the Tajima s D statistic Deena Schmidt and John Pool May 17, 2002 Abstract The Tajima s D test measures the allele frequency distribution of nucleotide
More information1 One Dimensional Horizontal Motion Position vs. time Velocity vs. time
PHY132 Experiment 1 One Dimensional Horizontal Motion Position vs. time Velocity vs. time One of the most effective methods of describing motion is to plot graphs of distance, velocity, and acceleration
More informationAssessment Schedule 2013 Biology: Demonstrate understanding of genetic variation and change (91157)
NCEA Level 2 Biology (91157) 2013 page 1 of 5 Assessment Schedule 2013 Biology: Demonstrate understanding of genetic variation and change (91157) Assessment Criteria with with Excellence Demonstrate understanding
More informationA Robust Method for Solving Transcendental Equations
www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. AlAmin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,
More informationGenetic diagnostics the gateway to personalized medicine
Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed
More informationHints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)
Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second
More informationSample Size Determination
Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)
More informationCAP BIOINFORMATICS SuShing Chen CISE. 10/5/2005 SuShing Chen, CISE 1
CAP 55108 BIOINFORMATICS SuShing Chen CISE 10/5/2005 SuShing Chen, CISE 1 Genomic Mapping & Mapping Databases High resolution, genomewide maps of DNA markers. Integrated maps, genome catalogs and comprehensive
More informationSNP Data Integration and Analysis for Drug Response Biomarker Discovery
B. Comp Dissertation SNP Data Integration and Analysis for Drug Response Biomarker Discovery By Chen Jieqi Pauline Department of Computer Science School of Computing National University of Singapore 2008/2009
More information(Refer Slide Time: 00:00:56 min)
Numerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology, Delhi Lecture No # 3 Solution of Nonlinear Algebraic Equations (Continued) (Refer Slide
More informationMemory Allocation Technique for Segregated Free List Based on Genetic Algorithm
Journal of AlNahrain University Vol.15 (2), June, 2012, pp.161168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationAssessment Schedule 2014 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Statement
NCEA Level 2 Biology (91157) 2014 page 1 of 5 Assessment Schedule 2014 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Statement NCEA Level 2 Biology (91157) 2014 page
More informationGenetic Drift Simulation. Experimental Question: How do random events cause evolution (a change in the gene pool)?
Genetic Drift Simulation Experimental Question: How do random events cause evolution (a change in the gene pool)? Hypothesis: Introduction: What is Genetic Drift? Let's examine a simple model of a population
More informationMonotone Partitioning. Polygon Partitioning. Monotone polygons. Monotone polygons. Monotone Partitioning. ! Define monotonicity
Monotone Partitioning! Define monotonicity Polygon Partitioning Monotone Partitioning! Triangulate monotone polygons in linear time! Partition a polygon into monotone pieces Monotone polygons! Definition
More informationCOMPLEX GENETIC DISEASES
COMPLEX GENETIC DISEASES Date: Sept 28, 2005* Time: 9:30 am 10:20 am* Room: G202 Biomolecular Building Lecturer: David Threadgill 4340 Biomolecular Building dwt@med.unc.edu Office Hours: by appointment
More informationOneSample ttest. Example 1: Mortgage Process Time. Problem. Data set. Data collection. Tools
OneSample ttest Example 1: Mortgage Process Time Problem A faster loan processing time produces higher productivity and greater customer satisfaction. A financial services institution wants to establish
More informationApplication Note. Introduction AN2395/D 12/2002. PC Master Software Usage
Application Note 12/2002 PC Master Software Usage By Milan Brejl and Pavel Kania S 3 L Applications Engineerings MCSL Roznov pod Radhostem Introduction The PC master software is a PC Windows based application
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationCASSI: GenomeWide Interaction Analysis Software
CASSI: GenomeWide Interaction Analysis Software 1 Contents 1 Introduction 3 2 Installation 3 3 Using CASSI 3 3.1 Input Files................................... 4 3.2 Options....................................
More informationUse Excel to Analyse Data. Use Excel to Analyse Data
Introduction This workbook accompanies the computer skills training workshop. The trainer will demonstrate each skill and refer you to the relevant page at the appropriate time. This workbook can also
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationSOP 3 v2: webbased selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms
W548 W552 Nucleic Acids Research, 2005, Vol. 33, Web Server issue doi:10.1093/nar/gki483 SOP 3 v2: webbased selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms Steven
More informationPhasing the Chromosomes of a Family Group When One Parent is Missing
Journal of Genetic Genealogy, 6(1), 2010 Phasing the Chromosomes of a Family Group When One Parent is Missing T. Whit Athey Abstract A technique is presented for the phasing of sets of SNP data collected
More informationExpected values, standard errors, Central Limit Theorem. Statistical inference
Expected values, standard errors, Central Limit Theorem FPP 1618 Statistical inference Up to this point we have focused primarily on exploratory statistical analysis We know dive into the realm of statistical
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives  Theory and Examples 2 Problem Statement 2 Perceptron  TwoInput Case 4 Pattern Recognition Example 5 Hamming Network 8 Feedforward Layer 8 Recurrent
More informationUsing CrunchIt (http://bcs.whfreeman.com/crunchit/bps4e) or StatCrunch (www.calvin.edu/go/statcrunch)
Using CrunchIt (http://bcs.whfreeman.com/crunchit/bps4e) or StatCrunch (www.calvin.edu/go/statcrunch) 1. In general, this package is far easier to use than many statistical packages. Every so often, however,
More informationCCCR Outreach FAQ and User Manual
CCCR Outreach FAQ and User Manual Q.1 What is the CCCR Outreach Application used for? The CCCR Outreach Application is a Web interface for displaying data. The CCCR Outreach Application Access enables
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationY Chromosome Markers
Y Chromosome Markers Lineage Markers Autosomal chromosomes recombine with each meiosis Y and Mitochondrial DNA does not This means that the Y and mtdna remains constant from generation to generation Except
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More information