# Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT Some models for the estimation of substitution rates from pairs of functionally homologous DNA sequences are compared. A novel feature here, motivated by the observed asymmetry in the data, is that the substituion processes in each arm of the tree are allowed to differ. The statistical method involves maximum likelihood estimation for multinomial trials, whose underlying cell probabilities are determined by cont-inuous-time Markov chains. I. INTRODUCTION Consider two functionally homologous nucleotide sequences of length n, taken one from each of two species. The sequences are aligned, to give data of the form: A T T C G A A Species Y: G T T C..... G A T. Species X: We form a contingency table, with entries {N } given JY ij N ij = number of times an aligned site has a base of type i in species X, and of type j in species Y; the bases A, T, C, G are labelled 1, 2, 3, 4 respectively. Now assume that the two species in question diverged from a common ancestor T years ago, and that after divergence the two species behaved independently, the bases in each

2 90 sequence being changed through time by the substitution of one base for another. Under these assumptions, the probability f that a ij site has a base of type i in species X, and j in species Y is where sc is the probability that the ancestral base is e, X Y and pei (resp., pci) is the probability that in species X (resp., Y), a base C at divergence is of type i a time T later. Most authors have used a Markov model to specify X the probabilities {pei}, so that where Qx is the generator of the (irreducible) X-process.. In addition, it is assumed that (a) Qx = Q, -: Q, implying that Px = py (b) E'Q = &I, implying that == ( S ~, S ~, S ~ ~ (1.3) S ~ ) ~ is the stationary distribution of Q. See Lanave et al. (1984), Tajima and Nei (1984) for some recent work in this area. The assumption (1.3a) above means that the cell probability matrix F = {fij} is symmetric; the data matrix N should be consistent with such symmetry. There is ample evidence that this is not the case for many pairs of sequences, particularly for those arising from the third codon positions; Tavarg (1985). This paper therefore focuses on methods of estimation for models like (1.2) in which (a) Qx and Q, may be different. (1.4) (b) s is not necessarily the stationary distribution of either Q, or Q,. 11. MODELS AND STATISTICAL METHODS The cell probability matrix F is 15-dimensional, and a general Q-matrix in (1.2) is 12-dimensional. Thus the most general model will have dimension = 27..

3 91 Thus we need to restrict the form of the Q-matrices to be used. There are several candidates that might be useful; this paper focusses on just two possibilities. Model (K) (Kimura (198l), Gojobori et al. (1982)) Here, the Q-matrices are of the form A T C G where the dlagonal elements are determined by Ql-= 0. Model (TK) (Takahata and Kimura (1981)) A T C G where once more Ql-= &determines the diagonal elements. In both of these examples, all the parameters indicated must be positive. The statistical problem is to estlmate the parameters of Q and Q, (and possibly the X inltial dlstrlbutlon %if thls 1s assumed unknown) using the data matrix N, and cell probabilities determined by (1.11, (1.2) and (1.4). Assuming that sites evolve independently of one another, the data matrix N has the structure of a 16-cell multinomlal trlals experiment, and we wlsh to estimate the parameters x = (x..., x ) of the cell probabilities F = 1' P {fij(cl)g If %must be estimated, then p = 15 for model (K), and p = 11 for model (TK). We chose to estimate the parameters by maximum likelihood (or, equivalently, 2 minimum Y l methods. The theory of maximum likelihood estimation in multinomial trials is, of course, well docdmented; a good review is given by Cox (1984). We also estimated the asymptotic variance-covariance matrix of the estimates for large sample size (= sequence length), n..

4 92 The parameter of particular interest here is the mean number Kx (resp., 5) of substitutions in the X (resp., Y) species in the interval [0, TI. Recall that if a Markov chain has generator Q = {qij} with qi := -qii, and initial distribution then the mean number of changes of state in [O, TI is Notice that if =is a stationary distribution for Q, then (2.1) reduces to Ke = T I siqi, i the 'el denoting equilibrium value. Notice also that if a Markov chain has generator QT, then the mean number of jumps it makes in [O, 11 is given by (2.1), with T = 1, Q = QT. The parameter T is confounded in this estimation problem; from now on, we absorb T into Q, and so take T = 1 in (2.1) and (2.2). In the present context, then, we want to use our estimates of the parameters in Q, Q, and s to estimate KX and Ky, and their joint asymptotic - distribution COMPUTATIONAL ASPECTS The maximum likelihood estimates of the parameters of the model may be found using a constrained optimisation package. The constraints involve positivity of the parameters in the two Q-matrices, and, if the initial probabilities =are to be estimated, then 0 s s1 + s2 + s 3 1; sl, S 2' s 2 0 must hold. Our approach was to transform out the constraints, converting the problem into an unconstrained one, and then use one of the IMSL optimlsation algorithms, ZXMIN or ZXSSQ. From the point of view of asymptotic theory needed here, it seems to be very hard to prove that a unique solution of the likelihood equations exists. Further, the computational work often found solutions in which some elements of the Q-matrices were (algorithmically) zero. The approach adopted here was an exploratory one: We started the algorithm from several initial positions, and compared the results. Any parameters in the Q-matrices that were computationally zero (say, I were set to

5 93 zero, and not used as parameters. This reduces the dimension p of the problem, and thus increases the degrees of freedom for the goodness of fit test of. the fitted model to the data. All derivatives were calculated using multipoint forward difference formulae (to avoid negative parameter values), as no expliclt formulae for such derivatives seem to be useful. The values of the cell probabilities involve calculation of matrix exponentials, exp{qx} and exp{qy); recall (1.1) and (1.2). For the model (K), we used the exact results of Gojobori et al. (1982). For model (TK), we used a diagonalisation method, falling back on an efficient series algorithm if this faflee. IV. AN EXAMPLE The data here are taken from the EMBL sequence library. The sequences are from bovine (X) and mouse (Y) mitochondrial genomes [Anderson et al. (1982), Bibb et al. (198l)], and come from the sequence of third base positions of the genes cytochrome B, cytochrome oxidase I, I1 and 111, and Atpase 6. The base length is n =l601, and the data matrix N is given by A T C G C G We ran five repetitions of the program, obtaining the same solutions for each run. For the model (K), the estimated Q-matrlces were Qx=( , ,097.OS6.056.O O -.I12.056, O O O O while for model (TK), the estimated Q-matrices were

6 ( x Q ( :::') ) o,041.O ,087 Qy=.O , O O The estimated base composition of the ancestral sequence was (in order A, T, C, G) for model (K): (.406,.070,,520,.004), while for model (TK): (.351,.039,.604,.006). The estimated average number of substitutions since divergence is given via (2.1) by: Kx f std. error Ky f std. error Model (K).463 f f.027 Model (TK).401 f f.051 The average number of substitutions per site, KX + Ky, was estimated as: (KX + Yy) f std. error Model (K) 904 f.042 Model (TK).807 f.043 For model (K), the goodness-of-fit statistic was y 2 = 18.3 (6 df.) for model (K), and y2 = 11.5 (5 df.), both of which are reasonable. The mean number of substitutions per site as estimated by either method is similar to estimates obtained for models in which assumptions (1.3) hold. For example, the reversible model of Tavare (1985) or Lanave et al. (1984) gave 1.09 f.13. In this last case, the estimate of the ancestral composition was (.436,.231,.296,.037), which should be contrasted with the estimates from the asymmetric models. Finally, despite the asymmetric shape of N, the estimated Q-matrices are qualitatively similar, and no significant difference can be found between KX and Ky. However, the asymptotic substitution rate Ke in (2.3) does differ significantly between the species: KX e f std. error \$ f std error Model (K) Model (TK).311 f f, f f.034 A more extensive study of the substitution process - among a variety of mitochondrial sequences is being prepared; this will also discuss problems of heterogeneity (due to almalgamating different coding regions, or due to

7 heterogeneity within a single coding region), the estimation of transition and transversion rates, and the independence of bases assumption used here. IV CONCLUSIONS Thls note has focused on statistical methods for the estimation of substitution rates from pairs of DNA sequences. The analysis has allowed for the observed asymmetry in the data. The estimation methods are computational in nature, rather than the analytic "method-of-moments" approaches usually used. It should be pointed out the the restriction to a small class of models was necessary because in the two-sequence case only 15 degrees of freedom are available. The methods developed here are also useful for analysing multiple-sequence data sets, in which general models of the type (1.2) may be fitted. REFERENCES Anderson, S., H. L. DeBruijn, A. R. Coulson, I. C. Eperon, F. Sanger and I. G. Young (1982). Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J. Mol. Biol., 156, Bibb, M. J., R. A. Van Etten, C. T. Wright, M. W. Walberg and D. A. Clayton (1981). Sequence and gene organization of mouse mitochondrial DNA. Cell. 26, Cox, C. (1984). An elementary introduction to maximum likelihood estimation for multinomial models: Birch's theorem and the delta method. Ameri can Statistician, 38, Gojobori, T., K. Ishii and M. Nei (1982). Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J. Mol. Evol., ls, Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. 78, Lanave, C.8 G. Preparata, C. Saccone and G. Serio (1984). - A new method for calculating evolutionary substitution rates. J. Mol. Evol., 20,

8 . 96 Tajima, F., and M. Nei (1984). Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol., Takahata, N., and M. Kimura (1981). A model of evolutionary base substitutions and Its application with special reference to rapid change of pseudogenes. Genetics, 98, Tavare, / S. (1985). Some probabilistic and statistical problems in the analysis of DNA sequences. In "Lectures on mathematics in the life sciences", Vol. 17. Amer. Math. SOC., in press.

### MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788

MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,

### How to Build a Phylogenetic Tree

How to Build a Phylogenetic Tree Phylogenetics tree is a structure in which species are arranged on branches that link them according to their relationship and/or evolutionary descent. A typical rooted

### Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Some ergodic theorems of linear systems of interacting diffusions

Some ergodic theorems of linear systems of interacting diffusions 4], Â_ ŒÆ êæ ÆÆ Nov, 2009, uà ŒÆ liuyong@math.pku.edu.cn yangfx@math.pku.edu.cn 1 1 30 1 The ergodic theory of interacting systems has

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### Optimization of Gene Prediction via More Accurate Phylogenetic Substitution Models

Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2011-78 2011 Optimization of

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

### Elasticity Theory Basics

G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman

### POWER SETS AND RELATIONS

POWER SETS AND RELATIONS L. MARIZZA A. BAILEY 1. The Power Set Now that we have defined sets as best we can, we can consider a sets of sets. If we were to assume nothing, except the existence of the empty

### Building a Smooth Yield Curve. University of Chicago. Jeff Greco

Building a Smooth Yield Curve University of Chicago Jeff Greco email: jgreco@math.uchicago.edu Preliminaries As before, we will use continuously compounding Act/365 rates for both the zero coupon rates

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### Equilibria and Dynamics of. Supply Chain Network Competition with Information Asymmetry. and Minimum Quality Standards

Equilibria and Dynamics of Supply Chain Network Competition with Information Asymmetry in Quality and Minimum Quality Standards Anna Nagurney John F. Smith Memorial Professor and Dong (Michelle) Li Doctoral

### Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

Lecture 8 : Coordinate Geometry The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 0 distance on the axis and give each point an identity on the corresponding

### Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Hidden Markov Models

8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

### Review for Calculus Rational Functions, Logarithms & Exponentials

Definition and Domain of Rational Functions A rational function is defined as the quotient of two polynomial functions. F(x) = P(x) / Q(x) The domain of F is the set of all real numbers except those for

### Solving Systems of Linear Equations

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

### Basics of Polynomial Theory

3 Basics of Polynomial Theory 3.1 Polynomial Equations In geodesy and geoinformatics, most observations are related to unknowns parameters through equations of algebraic (polynomial) type. In cases where

### Lecture 2: Universality

CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

### A Brief Study of the Nurse Scheduling Problem (NSP)

A Brief Study of the Nurse Scheduling Problem (NSP) Lizzy Augustine, Morgan Faer, Andreas Kavountzis, Reema Patel Submitted Tuesday December 15, 2009 0. Introduction and Background Our interest in the

### 1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

### A teaching experience through the development of hypertexts and object oriented software.

A teaching experience through the development of hypertexts and object oriented software. Marcello Chiodi Istituto di Statistica. Facoltà di Economia-Palermo-Italy 1. Introduction (1) This paper is concerned

### SOLVING POLYNOMIAL EQUATIONS

C SOLVING POLYNOMIAL EQUATIONS We will assume in this appendix that you know how to divide polynomials using long division and synthetic division. If you need to review those techniques, refer to an algebra

### A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from

### Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

### Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

### Analysis of a Production/Inventory System with Multiple Retailers

Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University

### Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### The Inverse of a Square Matrix

These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

### #1 Make sense of problems and persevere in solving them.

#1 Make sense of problems and persevere in solving them. 1. Make sense of problems and persevere in solving them. Interpret and make meaning of the problem looking for starting points. Analyze what is

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Solving Systems of Linear Equations

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

### Exam Introduction Mathematical Finance and Insurance

Exam Introduction Mathematical Finance and Insurance Date: January 8, 2013. Duration: 3 hours. This is a closed-book exam. The exam does not use scrap cards. Simple calculators are allowed. The questions

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De

A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De Machine Intelligence Unit, Indian Statistical Institute 203 B. T. Road Kolkata 700108 email id: rajat@isical.ac.in 1 Informatics

### Lecture 1: Systems of Linear Equations

MTH Elementary Matrix Algebra Professor Chao Huang Department of Mathematics and Statistics Wright State University Lecture 1 Systems of Linear Equations ² Systems of two linear equations with two variables

### A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

### MATHEMATICS (CLASSES XI XII)

MATHEMATICS (CLASSES XI XII) General Guidelines (i) All concepts/identities must be illustrated by situational examples. (ii) The language of word problems must be clear, simple and unambiguous. (iii)

### 1 Review of Newton Polynomials

cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael

### Estimation of the Number of Nucleotide Substitutions in the Control Region of Mitochondrial DNA in Humans and Chimpanzees

Estimation of the Number of Nucleotide Substitutions in the Control Region of Mitochondrial DNA in Humans and Chimpanzees Koichiro Tamura and Masatoshi Nei Department of Biology and Institute of Molecular

### a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

ROOTS OF POLYNOMIAL EQUATIONS In this unit we discuss polynomial equations. A polynomial in x of degree n, where n 0 is an integer, is an expression of the form P n (x) =a n x n + a n 1 x n 1 + + a 1 x

### Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene

Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### TOPIC 3: CONTINUITY OF FUNCTIONS

TOPIC 3: CONTINUITY OF FUNCTIONS. Absolute value We work in the field of real numbers, R. For the study of the properties of functions we need the concept of absolute value of a number. Definition.. Let

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### Computing divisors and common multiples of quasi-linear ordinary differential equations

Computing divisors and common multiples of quasi-linear ordinary differential equations Dima Grigoriev CNRS, Mathématiques, Université de Lille Villeneuve d Ascq, 59655, France Dmitry.Grigoryev@math.univ-lille1.fr

### Confidence intervals, t tests, P values

Confidence intervals, t tests, P values Joe Felsenstein Department of Genome Sciences and Department of Biology Confidence intervals, t tests, P values p.1/31 Normality Everybody believes in the normal

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### Coefficient of Potential and Capacitance

Coefficient of Potential and Capacitance Lecture 12: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay We know that inside a conductor there is no electric field and that

### THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

### Solving Mass Balances using Matrix Algebra

Page: 1 Alex Doll, P.Eng, Alex G Doll Consulting Ltd. http://www.agdconsulting.ca Abstract Matrix Algebra, also known as linear algebra, is well suited to solving material balance problems encountered

### Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Systems of Linear Equations

Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

### Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

### Deriving character tables: Where do all the numbers come from?

3.4. Characters and Character Tables 3.4.1. Deriving character tables: Where do all the numbers come from? A general and rigorous method for deriving character tables is based on five theorems which in

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Final Project Report

CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

### Amino Acids and Their Properties

Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

### III. Reaction Kinetics

III. Reaction Kinetics Lecture 13: Butler-Volmer equation Notes by ChangHoon Lim (and MZB) 1. Interfacial Equilibrium At lecture 11, the reaction rate R for the general Faradaic half-cell reaction was

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### DnaSP, DNA polymorphism analyses by the coalescent and other methods.

DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### 4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2015 B. Goldys and M. Rutkowski (USydney) Slides 4: Single-Period Market

### On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

### CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing

SE8393 Introduction to Bioinformatics Lecture 3: More problems, Global lignment DN sequencing Recall that in biological experiments only relatively short segments of the DN can be investigated. To investigate

### Cofactor Expansion: Cramer s Rule

Cofactor Expansion: Cramer s Rule MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Today we will focus on developing: an efficient method for calculating

### Sensitivity analysis of utility based prices and risk-tolerance wealth processes

Sensitivity analysis of utility based prices and risk-tolerance wealth processes Dmitry Kramkov, Carnegie Mellon University Based on a paper with Mihai Sirbu from Columbia University Math Finance Seminar,

### Chapter 4 One Dimensional Kinematics

Chapter 4 One Dimensional Kinematics 41 Introduction 1 4 Position, Time Interval, Displacement 41 Position 4 Time Interval 43 Displacement 43 Velocity 3 431 Average Velocity 3 433 Instantaneous Velocity

### Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples