CONSTRUCTING PAM MATRICES

Size: px
Start display at page:

Download "CONSTRUCTING PAM MATRICES"

Transcription

1 CONSTRUCTING PAM MATRICES WINFRIED JUST In this note we will use several different matrices to describe the same penomenon. So we must be very careful about distinguishing them by names. The standing assumption will be that we have a fixed alphabet of characters {c 1,c 2,...,c n }, and that we are looking at a process (evolution in our case) by which some of these characters mutate into each other. We will assume that for a fixed unit of time T the probability that character c i mutates into character c j is a fixed number m ij. The matrix M =[m ij ] 1 i,j n that lists these probabilities will be referred to as the mutation probability matrix. Note that the mutation probability matrix has two important properties: (1) all entries are nonnegative numbers and (2) the sum of numbers in each row is 1. Thus M is a stochastic matrix. There is another underlying assumption here, namely that the probability that character c i mutates into character c j over a time interval T does not depend on the prior history of the process. Thus we model evolution as a Markov process. In general, a (discrete time) Markov process or Markov chain traces a system through a sequence of steps. At each step the system could be in one of a fixed number of states. In our case, the states would be characters c 1,...,c n. If the system is in state i, then it switches to state j with probability t ij, called the transition probability. Note that the matrix of transition probabilities is exactly the same thing that we called a matrix of mutation probabilities. Here is an interesting observation: If M is a stochastic n n matrix, then there is usually a unique vector [q 1,...,q n ] such that (1) [q 1,...,q n ] M =[q 1,...,q n ] and (2) lim k M k = q 1 q 2... q n q 1 q 2... q n q 1 q 2... q n The operation is matrix multiplication; M k is obtained by multiplying M k times with itself. For our purposes it is not necessary to know the details of how matrix multiplication is performed; it suffices to know that it is a fairly standard operation that can be done on any reasonably powerful graphing calculator or computer algebra system, such as MATLAB. The vector [q 1,...,q n ] in the above observation is called the steady state or equilibrium vector of the process. It has a very intuitive interpretation: The matrix Date: 05/03/05 newpam.tex. 1

2 2 WINFRIED JUST multiplication equation [p 1,...,p n ] M =[r 1,...,r n ] says that if we have a long sequence of characters such that c 1 will be present in proportion p 1, c 2 will be present in proportion p 2,... c n will be present in proportion p n, and we let the sequence evolve for T units of time, then in the resulting sequence, c 1 will be present in proportion r 1, c 2 will be present in proportion r 2,...c n will be present in proportion r n. Thus equation (1) tells us that evolution will not change the proportions of characters as long as these proportions are in the steady state. Equation (2) tells us that if we start at any state and let the process run enough steps, it will spend a proportion of about q i of the time in state i. Alternatively, if we start with a long enough sequence of characters and let it evolve long enough, then the proportions of characters in the sequence will get very close to the numbers in the steady state vector. For this reason, we will refer to the numbers q i in the equilibrium vector as the target frequencies of the process. This leads us to an assumption that often underlies the construction of scoring matrices for sequence comparison and alignment and that is seldom clearly spelled out: It is assumed that the observed character frequencies in the data set that is used to construct the scoring matrix are the target frequencies of the mutation probability matrix for the sequences to be scored. A Markov process is reversible if it looks the same when run forwards or backwards. For example, the process described by matrix (3) below is oviously reversible; the process described by matrix (4) is not. (3) (4) Theorem 1. Let M be a mutation probability matrix with the following target frequencies: [q 1,...,q n ]. Then M describes a reversible Markov process if and only if for all 1 i, j n we have (5) q i m ij = q j m ji. A priori there seems to be no biological reason to assume that molecular evolution is a reversible Markov process. However, the assumption of time-reversibility simplifies the study of molecular evolution (see e.g. [5], page 69), and is being made for this very reason. The assumption is almost certainly wrong, however, as long as we always compare two sequences of different extant organisms, we are really looking at evolutionary time having run backwards from one organism to the last common ancestor and then forwards to the other organism (see below), and since our treatment of the two organisms is symmetric, the assumption of time-reversibility should lead to realistic results. As far as I can see, in the original construction of the PAM matrices in [2], timereversibility was not assumed. The assumptions about the evolutionary process that underlie the construction in [2] are not clearly spelled out in the paper, and the implicit assumptions that I see there do not seem more plausible to me than the assumption of time-reversibility. In [3], Dan Gusfield gives a description of the PAM

3 CONSTRUCTING PAM MATRICES 3 matrices that in his words (page 383) roughly, but not exactly reflects Dayhoff s construction. Gusfield s exposition does assume time-reversibility (implicitly), and I will base the following exposition on Gusfield s description. Now let us define what PAM means. The acronym stands for percent accepted mutation. Ideally, two sequences s and t are defined as being one PAM unit diverged if a series of accepted point mutations (and no indels) has converted s to t with an average of one accepted point-mutation per one hundred sequence positions. The term accepted here means a mutation that was incorporated into the molecular sequence and passed on to its progeny. Note that in very long sequences that are one PAM unit diverged we will see slightly less than 1% character substitutions, since some loci may have undergone multiple mutations. Fortunately, if the distance is only a few PAMs, the discrepancy is very small and can be ignored in the construction of our matrices. In the original construction of PAM matrices, Dayhoff started with pairs of protein sequences that were each at most 15% diverged. Here we will explain the process in a simplified way. While the real families of PAM (and BLOSUM) matrices are used for scoring amino acid sequences, we will construct here a family of baby-pam matrices for nucleotide sequences as a means of illustrating the constrution. Let us assume we have a collection of perfectly aligned sequences of nucleotides such that in each pair we see character substitutions at about 2% of the loci. We may think of each pair of sequences in this family as being 2 PAMs diverged. Now assume that by counting the observed percentages of character substitutions in this family, we get the following character substitution matrix A 2 =[a ij ] 1 i,j 4. (6) A C G T The meaning of the entries in the above sequence is the following: Picking randomly a pair <s,t>of sequence pairs in our collection, and picking randomly a locus k, the probability that s[k] = A = t[k] is0.195; the probability that s[k] = A and t[k] = C is 0.001; etc. Note that the probabilities for pairs of different characters sum up to This is why we say that the sequences are roughly 2 PAM units diverged. Let us also assume that the average C-G content in our collection of sequences is 60%, and that A s are as frequent as T s and C s are as frequent as G s. The first step in our construction of baby-pam matrices will be to reconstruct the mutation probability matrix M 2 that gave rise to the matrix A 2. Note that the letters s[k],t[k] at a given locus in two of our sequences are derived from a letter r[k] of an ancestral sequence r, that will have an evolutionary distance of about 1 PAM from each of the observed sequences s and t. Now here is a beautiful consequence of the assumption that molecular evolution is a time-reversible process: This detail does not matter! We can think of s evolving into t by going back through the ancestral sequence and then moving forward in time to become t, or we can think of t evolving backwards into r and then into s, and the derived mutation probability matrix will always be the same.

4 4 WINFRIED JUST Note that by our other assumption, the target frequencies for the matrix M 2 will be [q 1,q 2,q 3,q 4 ]=[0.2, 0.3, 0.3, 0.2]. Let us now treat s as the ancestral sequence. Then the probability a 12 of s[k] =A and t[k]=c is equal to This probability must be equal to q 1 m 12, where m 12 is the probability that a given A mutates to C. Thus we can calculate m 12 = a 12 /q 1 =0.001/0.2= In general we will have m ij = a ij /q i. These calculations lead to the following matrix M 2 of character mutation probabilities: (7) A C G T Now here comes the big question: How to derive scoring matrices for distantly related sequences from data about closely related sequences? In the PAM model, this problem is solved as follows: Evolutionary changes over a long period happen one generation at a time. Thus if we know the matrix M 2 of character mutation probabilities for sequences that are 2 PAMs apart, then we can construct the matrix M 4 of character mutation probabilities for sequences that are 4 PAMs apart by taking the product M 2 M 2. In general, the matrix M 2k of character mutation probabilities for sequences that are 2k PAMs apart can be obtained by taking the k-th power of M 2 (that is, multiplying M 2 k times by itself). Thus M 2k =(M 2 ) k. For example, the character mutation probability matrix for construction our baby-pam120 will be the matrix M 120 =(M 2 ) 60, which looks as follows: (8) A C G T The character mutation probability matrix for constructing our baby-pam250 will be the matrix M 250 =(M 2 ) 125, which looks as follows: (9) A C G T As a next step in the construction of baby-pam120 we must reconstruct the character substitution matrix A 120 from the character mutation probability matrix M 120. Entry a ij in A 120 will be the probability that in two sequences s and t that have an evolutionary distance of 120 PAM, at a randomly chosen locus k, we will have s[k] =c i and t[k] =c j. If we treat s as the ancestral sequence (as the asumption of time-reversibility allows us to do), then we can get a ij by multiplying q i (the probability of finding c i in position s[k]) by m ij (the probability that c i mutates to c j. In other words, A 120 can be obtained by multiplying the C and G

5 CONSTRUCTING PAM MATRICES 5 rowss of M 120 by 0.3 and the A and T rows of M 120 by 0.2. We get the following matrix A 120 : (10) A C G T Note that the above matrix is slightly inaccurate; in particular, the probabilities for A-G pairs and C-T pairs should be exactly the same. The discrepancy is due to rounding errors in the process of matrix multiplication. (Note that the matrices (8) and (9) suffer from the same defect.) Finally, we are ready to construct the baby-pam scoring matrix S 120 itself. Recall that an entry s ij of the latter matrix should be the log-odds score of comparing the probability of finding (c i,c j ) in a correctly aligned column with the probability of finding (c i,c j ) in randomly aligned sequences. In other words, we should have s ij = log 2 ( aij p ip j ). For example, in our case, s 12 = log 2 ( (.2)(.3)). Thus we get the matrix: (11) A C G T Let us remark that in the real PAM matrices, the scores are multiplied by two and rounded to the nearest integer. Let us look at two very important characteristics of the matrix (11). First let us ask ourselves: If we score an ungapped alignment of two random strings of length m each with matrix (11), what is the expected value of the alignment score? This expected value can be calculated as m times the expected score for a pair of random loci. The formula for the latter is E = s ij q i q j. 1 i,j n In our case, this means we have to calculate the sum (.73)(.2) 2 +(.66)(.2)(.3) + (.15)(.2)(.3) + +(.73)(.2) 2, which is equal to It is not hard to see why the average score for a column in a random alignment should be negative. If the average score for a random character pair were positive, then the scores for random extensions of a local alignment by random letters would tend to rise rather than fall, which would result in a lot of long, completely spurious local alignments. Now let us consider the related entity H = s ij a ij. 1 i,j n This is the average score of a correctly aligned character pair. Since our matrix (11) gives the scores in bits, we can think of H as the average amount of information

6 6 WINFRIED JUST supplied by a correctly aligned character pair. Accordingly, H is known as the (relative) entropy of the scoring matrix. For scoring matrix (11) the entropy is equal to H =(.73)(.06646) + (.66)(.03792) + +(.73)(.06646) =.134. The entropy of a scoring matrix allows us to answer the following question: Given a query sequence of length m and a data base of length l, how many letters in a local alignment will on the average be needed to reach a statistically significant level of alignment? One can show that the score of a significant local alignment should be at least log 2 (m) + log 2 (l) bits. If one correctly aligned character pair contributes on the average H units of information, then a local alignment with statistically significant score should be at least log 2 (m)+log 2 (l) H character pairs long. For example, if we search a nucleotide sequence of length 1, 000 bp against the whole human genome ( bp) using the scoring matrix (11), then a statistically significant local alignment would usually have to be at least (log 2 (1000) + log 2 ( ))/.134 = 242 bp long. Homework 1: Find the mutation probability matrix M 40 that can be derived from the matrix M 2 given above and that corresponds to an evolutionary distance of 40 PAMs. Homework 2: (1) Construct the character substitution matrix A 250 that corresponds to the character mutation probability matrix M 250 given above. (2) Use A 250 to construct the corresponding baby-pam250 scoring matrix S 250 with scores given in bits. (3) Find the expected score for random character mismatches and the entropy of S 250. References [1] S. Altschul. Amino acid substitution matrices from an information-theoretic perspective. J. of Mol. Biol., 219: , 1991 [2] M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt. A model of evolutionary change in proteins. Atlas of Protein sequence and structure, 5: , [3] D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press [4] S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89: , [5] W.-H. Li. Molecular evolution. Sinauer Associates Department of Mathematics,, Ohio University,, Athens, Ohio 45701, U.S.A. address: just@math.ohiou.edu

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Math 202-0 Quizzes Winter 2009

Math 202-0 Quizzes Winter 2009 Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Row Echelon Form and Reduced Row Echelon Form

Row Echelon Form and Reduced Row Echelon Form These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

More information

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

10 Evolutionarily Stable Strategies

10 Evolutionarily Stable Strategies 10 Evolutionarily Stable Strategies There is but a step between the sublime and the ridiculous. Leo Tolstoy In 1973 the biologist John Maynard Smith and the mathematician G. R. Price wrote an article in

More information

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012 Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

More information

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

Prime Time: Homework Examples from ACE

Prime Time: Homework Examples from ACE Prime Time: Homework Examples from ACE Investigation 1: Building on Factors and Multiples, ACE #8, 28 Investigation 2: Common Multiples and Common Factors, ACE #11, 16, 17, 28 Investigation 3: Factorizations:

More information

Linear Algebra Notes

Linear Algebra Notes Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R

A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R Federico Perea Justo Puerto MaMaEuSch Management Mathematics for European Schools 94342 - CP - 1-2001 - DE - COMENIUS - C21 University

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Zeros of a Polynomial Function

Zeros of a Polynomial Function Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Factorizations: Searching for Factor Strings

Factorizations: Searching for Factor Strings " 1 Factorizations: Searching for Factor Strings Some numbers can be written as the product of several different pairs of factors. For example, can be written as 1, 0,, 0, and. It is also possible to write

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

Notes on Continuous Random Variables

Notes on Continuous Random Variables Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

More information

Years after 2000. US Student to Teacher Ratio 0 16.048 1 15.893 2 15.900 3 15.900 4 15.800 5 15.657 6 15.540

Years after 2000. US Student to Teacher Ratio 0 16.048 1 15.893 2 15.900 3 15.900 4 15.800 5 15.657 6 15.540 To complete this technology assignment, you should already have created a scatter plot for your data on your calculator and/or in Excel. You could do this with any two columns of data, but for demonstration

More information

MATH2210 Notebook 1 Fall Semester 2016/2017. 1 MATH2210 Notebook 1 3. 1.1 Solving Systems of Linear Equations... 3

MATH2210 Notebook 1 Fall Semester 2016/2017. 1 MATH2210 Notebook 1 3. 1.1 Solving Systems of Linear Equations... 3 MATH0 Notebook Fall Semester 06/07 prepared by Professor Jenny Baglivo c Copyright 009 07 by Jenny A. Baglivo. All Rights Reserved. Contents MATH0 Notebook 3. Solving Systems of Linear Equations........................

More information

Lesson 7 Z-Scores and Probability

Lesson 7 Z-Scores and Probability Lesson 7 Z-Scores and Probability Outline Introduction Areas Under the Normal Curve Using the Z-table Converting Z-score to area -area less than z/area greater than z/area between two z-values Converting

More information

Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.

Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4. Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

MATH 10034 Fundamental Mathematics IV

MATH 10034 Fundamental Mathematics IV MATH 0034 Fundamental Mathematics IV http://www.math.kent.edu/ebooks/0034/funmath4.pdf Department of Mathematical Sciences Kent State University January 2, 2009 ii Contents To the Instructor v Polynomials.

More information

EQUATIONS and INEQUALITIES

EQUATIONS and INEQUALITIES EQUATIONS and INEQUALITIES Linear Equations and Slope 1. Slope a. Calculate the slope of a line given two points b. Calculate the slope of a line parallel to a given line. c. Calculate the slope of a line

More information

Chemical Kinetics. 2. Using the kinetics of a given reaction a possible reaction mechanism

Chemical Kinetics. 2. Using the kinetics of a given reaction a possible reaction mechanism 1. Kinetics is the study of the rates of reaction. Chemical Kinetics 2. Using the kinetics of a given reaction a possible reaction mechanism 3. What is a reaction mechanism? Why is it important? A reaction

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

Problem of the Month: Fair Games

Problem of the Month: Fair Games Problem of the Month: The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards:

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788

MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,

More information

Binary Adders: Half Adders and Full Adders

Binary Adders: Half Adders and Full Adders Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

More information

Fractions to decimals

Fractions to decimals Worksheet.4 Fractions and Decimals Section Fractions to decimals The most common method of converting fractions to decimals is to use a calculator. A fraction represents a division so is another way of

More information

Lies My Calculator and Computer Told Me

Lies My Calculator and Computer Told Me Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999 Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Lecture 1: Systems of Linear Equations

Lecture 1: Systems of Linear Equations MTH Elementary Matrix Algebra Professor Chao Huang Department of Mathematics and Statistics Wright State University Lecture 1 Systems of Linear Equations ² Systems of two linear equations with two variables

More information

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table. Sum and Product This problem gives you the chance to: use arithmetic and algebra to represent and analyze a mathematical situation solve a quadratic equation by trial and improvement Tom wants to find

More information

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication). MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix

More information

5.5. Solving linear systems by the elimination method

5.5. Solving linear systems by the elimination method 55 Solving linear systems by the elimination method Equivalent systems The major technique of solving systems of equations is changing the original problem into another one which is of an easier to solve

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

Zeros of Polynomial Functions

Zeros of Polynomial Functions Zeros of Polynomial Functions The Rational Zero Theorem If f (x) = a n x n + a n-1 x n-1 + + a 1 x + a 0 has integer coefficients and p/q (where p/q is reduced) is a rational zero, then p is a factor of

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Click on the links below to jump directly to the relevant section

Click on the links below to jump directly to the relevant section Click on the links below to jump directly to the relevant section What is algebra? Operations with algebraic terms Mathematical properties of real numbers Order of operations What is Algebra? Algebra is

More information

A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE

A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE R. J. Solomonoff Abstract Some preliminary work is presented on a very general new theory of inductive inference. The extrapolation of an

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections

More information

8 Square matrices continued: Determinants

8 Square matrices continued: Determinants 8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You

More information

Hill s Cipher: Linear Algebra in Cryptography

Hill s Cipher: Linear Algebra in Cryptography Ryan Doyle Hill s Cipher: Linear Algebra in Cryptography Introduction: Since the beginning of written language, humans have wanted to share information secretly. The information could be orders from a

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

8 Primes and Modular Arithmetic

8 Primes and Modular Arithmetic 8 Primes and Modular Arithmetic 8.1 Primes and Factors Over two millennia ago already, people all over the world were considering the properties of numbers. One of the simplest concepts is prime numbers.

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Hands-On Math Algebra

Hands-On Math Algebra Hands-On Math Algebra by Pam Meader and Judy Storer illustrated by Julie Mazur Contents To the Teacher... v Topic: Ratio and Proportion 1. Candy Promotion... 1 2. Estimating Wildlife Populations... 6 3.

More information

Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication)

Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication) Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication) Arno Eigenwillig und Kurt Mehlhorn An algorithm for multiplication of integers is taught already in primary school: To multiply

More information

3 Some Integer Functions

3 Some Integer Functions 3 Some Integer Functions A Pair of Fundamental Integer Functions The integer function that is the heart of this section is the modulo function. However, before getting to it, let us look at some very simple

More information

Introduction to Hill cipher

Introduction to Hill cipher Introduction to Hill cipher We have explored three simple substitution ciphers that generated ciphertext C from plaintext p by means of an arithmetic operation modulo 26. Caesar cipher: The Caesar cipher

More information

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Solutions to Problem Set 1

Solutions to Problem Set 1 YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467b: Cryptography and Computer Security Handout #8 Zheng Ma February 21, 2005 Solutions to Problem Set 1 Problem 1: Cracking the Hill cipher Suppose

More information

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1 Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms

More information

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works.

1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. MATH 13150: Freshman Seminar Unit 18 1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. 1.1. Bob and Alice. Suppose that Alice wants to send a message to Bob over the internet

More information

The Basics of FEA Procedure

The Basics of FEA Procedure CHAPTER 2 The Basics of FEA Procedure 2.1 Introduction This chapter discusses the spring element, especially for the purpose of introducing various concepts involved in use of the FEA technique. A spring

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information