CONSTRUCTING PAM MATRICES
|
|
- Melvin Dorsey
- 7 years ago
- Views:
Transcription
1 CONSTRUCTING PAM MATRICES WINFRIED JUST In this note we will use several different matrices to describe the same penomenon. So we must be very careful about distinguishing them by names. The standing assumption will be that we have a fixed alphabet of characters {c 1,c 2,...,c n }, and that we are looking at a process (evolution in our case) by which some of these characters mutate into each other. We will assume that for a fixed unit of time T the probability that character c i mutates into character c j is a fixed number m ij. The matrix M =[m ij ] 1 i,j n that lists these probabilities will be referred to as the mutation probability matrix. Note that the mutation probability matrix has two important properties: (1) all entries are nonnegative numbers and (2) the sum of numbers in each row is 1. Thus M is a stochastic matrix. There is another underlying assumption here, namely that the probability that character c i mutates into character c j over a time interval T does not depend on the prior history of the process. Thus we model evolution as a Markov process. In general, a (discrete time) Markov process or Markov chain traces a system through a sequence of steps. At each step the system could be in one of a fixed number of states. In our case, the states would be characters c 1,...,c n. If the system is in state i, then it switches to state j with probability t ij, called the transition probability. Note that the matrix of transition probabilities is exactly the same thing that we called a matrix of mutation probabilities. Here is an interesting observation: If M is a stochastic n n matrix, then there is usually a unique vector [q 1,...,q n ] such that (1) [q 1,...,q n ] M =[q 1,...,q n ] and (2) lim k M k = q 1 q 2... q n q 1 q 2... q n q 1 q 2... q n The operation is matrix multiplication; M k is obtained by multiplying M k times with itself. For our purposes it is not necessary to know the details of how matrix multiplication is performed; it suffices to know that it is a fairly standard operation that can be done on any reasonably powerful graphing calculator or computer algebra system, such as MATLAB. The vector [q 1,...,q n ] in the above observation is called the steady state or equilibrium vector of the process. It has a very intuitive interpretation: The matrix Date: 05/03/05 newpam.tex. 1
2 2 WINFRIED JUST multiplication equation [p 1,...,p n ] M =[r 1,...,r n ] says that if we have a long sequence of characters such that c 1 will be present in proportion p 1, c 2 will be present in proportion p 2,... c n will be present in proportion p n, and we let the sequence evolve for T units of time, then in the resulting sequence, c 1 will be present in proportion r 1, c 2 will be present in proportion r 2,...c n will be present in proportion r n. Thus equation (1) tells us that evolution will not change the proportions of characters as long as these proportions are in the steady state. Equation (2) tells us that if we start at any state and let the process run enough steps, it will spend a proportion of about q i of the time in state i. Alternatively, if we start with a long enough sequence of characters and let it evolve long enough, then the proportions of characters in the sequence will get very close to the numbers in the steady state vector. For this reason, we will refer to the numbers q i in the equilibrium vector as the target frequencies of the process. This leads us to an assumption that often underlies the construction of scoring matrices for sequence comparison and alignment and that is seldom clearly spelled out: It is assumed that the observed character frequencies in the data set that is used to construct the scoring matrix are the target frequencies of the mutation probability matrix for the sequences to be scored. A Markov process is reversible if it looks the same when run forwards or backwards. For example, the process described by matrix (3) below is oviously reversible; the process described by matrix (4) is not. (3) (4) Theorem 1. Let M be a mutation probability matrix with the following target frequencies: [q 1,...,q n ]. Then M describes a reversible Markov process if and only if for all 1 i, j n we have (5) q i m ij = q j m ji. A priori there seems to be no biological reason to assume that molecular evolution is a reversible Markov process. However, the assumption of time-reversibility simplifies the study of molecular evolution (see e.g. [5], page 69), and is being made for this very reason. The assumption is almost certainly wrong, however, as long as we always compare two sequences of different extant organisms, we are really looking at evolutionary time having run backwards from one organism to the last common ancestor and then forwards to the other organism (see below), and since our treatment of the two organisms is symmetric, the assumption of time-reversibility should lead to realistic results. As far as I can see, in the original construction of the PAM matrices in [2], timereversibility was not assumed. The assumptions about the evolutionary process that underlie the construction in [2] are not clearly spelled out in the paper, and the implicit assumptions that I see there do not seem more plausible to me than the assumption of time-reversibility. In [3], Dan Gusfield gives a description of the PAM
3 CONSTRUCTING PAM MATRICES 3 matrices that in his words (page 383) roughly, but not exactly reflects Dayhoff s construction. Gusfield s exposition does assume time-reversibility (implicitly), and I will base the following exposition on Gusfield s description. Now let us define what PAM means. The acronym stands for percent accepted mutation. Ideally, two sequences s and t are defined as being one PAM unit diverged if a series of accepted point mutations (and no indels) has converted s to t with an average of one accepted point-mutation per one hundred sequence positions. The term accepted here means a mutation that was incorporated into the molecular sequence and passed on to its progeny. Note that in very long sequences that are one PAM unit diverged we will see slightly less than 1% character substitutions, since some loci may have undergone multiple mutations. Fortunately, if the distance is only a few PAMs, the discrepancy is very small and can be ignored in the construction of our matrices. In the original construction of PAM matrices, Dayhoff started with pairs of protein sequences that were each at most 15% diverged. Here we will explain the process in a simplified way. While the real families of PAM (and BLOSUM) matrices are used for scoring amino acid sequences, we will construct here a family of baby-pam matrices for nucleotide sequences as a means of illustrating the constrution. Let us assume we have a collection of perfectly aligned sequences of nucleotides such that in each pair we see character substitutions at about 2% of the loci. We may think of each pair of sequences in this family as being 2 PAMs diverged. Now assume that by counting the observed percentages of character substitutions in this family, we get the following character substitution matrix A 2 =[a ij ] 1 i,j 4. (6) A C G T The meaning of the entries in the above sequence is the following: Picking randomly a pair <s,t>of sequence pairs in our collection, and picking randomly a locus k, the probability that s[k] = A = t[k] is0.195; the probability that s[k] = A and t[k] = C is 0.001; etc. Note that the probabilities for pairs of different characters sum up to This is why we say that the sequences are roughly 2 PAM units diverged. Let us also assume that the average C-G content in our collection of sequences is 60%, and that A s are as frequent as T s and C s are as frequent as G s. The first step in our construction of baby-pam matrices will be to reconstruct the mutation probability matrix M 2 that gave rise to the matrix A 2. Note that the letters s[k],t[k] at a given locus in two of our sequences are derived from a letter r[k] of an ancestral sequence r, that will have an evolutionary distance of about 1 PAM from each of the observed sequences s and t. Now here is a beautiful consequence of the assumption that molecular evolution is a time-reversible process: This detail does not matter! We can think of s evolving into t by going back through the ancestral sequence and then moving forward in time to become t, or we can think of t evolving backwards into r and then into s, and the derived mutation probability matrix will always be the same.
4 4 WINFRIED JUST Note that by our other assumption, the target frequencies for the matrix M 2 will be [q 1,q 2,q 3,q 4 ]=[0.2, 0.3, 0.3, 0.2]. Let us now treat s as the ancestral sequence. Then the probability a 12 of s[k] =A and t[k]=c is equal to This probability must be equal to q 1 m 12, where m 12 is the probability that a given A mutates to C. Thus we can calculate m 12 = a 12 /q 1 =0.001/0.2= In general we will have m ij = a ij /q i. These calculations lead to the following matrix M 2 of character mutation probabilities: (7) A C G T Now here comes the big question: How to derive scoring matrices for distantly related sequences from data about closely related sequences? In the PAM model, this problem is solved as follows: Evolutionary changes over a long period happen one generation at a time. Thus if we know the matrix M 2 of character mutation probabilities for sequences that are 2 PAMs apart, then we can construct the matrix M 4 of character mutation probabilities for sequences that are 4 PAMs apart by taking the product M 2 M 2. In general, the matrix M 2k of character mutation probabilities for sequences that are 2k PAMs apart can be obtained by taking the k-th power of M 2 (that is, multiplying M 2 k times by itself). Thus M 2k =(M 2 ) k. For example, the character mutation probability matrix for construction our baby-pam120 will be the matrix M 120 =(M 2 ) 60, which looks as follows: (8) A C G T The character mutation probability matrix for constructing our baby-pam250 will be the matrix M 250 =(M 2 ) 125, which looks as follows: (9) A C G T As a next step in the construction of baby-pam120 we must reconstruct the character substitution matrix A 120 from the character mutation probability matrix M 120. Entry a ij in A 120 will be the probability that in two sequences s and t that have an evolutionary distance of 120 PAM, at a randomly chosen locus k, we will have s[k] =c i and t[k] =c j. If we treat s as the ancestral sequence (as the asumption of time-reversibility allows us to do), then we can get a ij by multiplying q i (the probability of finding c i in position s[k]) by m ij (the probability that c i mutates to c j. In other words, A 120 can be obtained by multiplying the C and G
5 CONSTRUCTING PAM MATRICES 5 rowss of M 120 by 0.3 and the A and T rows of M 120 by 0.2. We get the following matrix A 120 : (10) A C G T Note that the above matrix is slightly inaccurate; in particular, the probabilities for A-G pairs and C-T pairs should be exactly the same. The discrepancy is due to rounding errors in the process of matrix multiplication. (Note that the matrices (8) and (9) suffer from the same defect.) Finally, we are ready to construct the baby-pam scoring matrix S 120 itself. Recall that an entry s ij of the latter matrix should be the log-odds score of comparing the probability of finding (c i,c j ) in a correctly aligned column with the probability of finding (c i,c j ) in randomly aligned sequences. In other words, we should have s ij = log 2 ( aij p ip j ). For example, in our case, s 12 = log 2 ( (.2)(.3)). Thus we get the matrix: (11) A C G T Let us remark that in the real PAM matrices, the scores are multiplied by two and rounded to the nearest integer. Let us look at two very important characteristics of the matrix (11). First let us ask ourselves: If we score an ungapped alignment of two random strings of length m each with matrix (11), what is the expected value of the alignment score? This expected value can be calculated as m times the expected score for a pair of random loci. The formula for the latter is E = s ij q i q j. 1 i,j n In our case, this means we have to calculate the sum (.73)(.2) 2 +(.66)(.2)(.3) + (.15)(.2)(.3) + +(.73)(.2) 2, which is equal to It is not hard to see why the average score for a column in a random alignment should be negative. If the average score for a random character pair were positive, then the scores for random extensions of a local alignment by random letters would tend to rise rather than fall, which would result in a lot of long, completely spurious local alignments. Now let us consider the related entity H = s ij a ij. 1 i,j n This is the average score of a correctly aligned character pair. Since our matrix (11) gives the scores in bits, we can think of H as the average amount of information
6 6 WINFRIED JUST supplied by a correctly aligned character pair. Accordingly, H is known as the (relative) entropy of the scoring matrix. For scoring matrix (11) the entropy is equal to H =(.73)(.06646) + (.66)(.03792) + +(.73)(.06646) =.134. The entropy of a scoring matrix allows us to answer the following question: Given a query sequence of length m and a data base of length l, how many letters in a local alignment will on the average be needed to reach a statistically significant level of alignment? One can show that the score of a significant local alignment should be at least log 2 (m) + log 2 (l) bits. If one correctly aligned character pair contributes on the average H units of information, then a local alignment with statistically significant score should be at least log 2 (m)+log 2 (l) H character pairs long. For example, if we search a nucleotide sequence of length 1, 000 bp against the whole human genome ( bp) using the scoring matrix (11), then a statistically significant local alignment would usually have to be at least (log 2 (1000) + log 2 ( ))/.134 = 242 bp long. Homework 1: Find the mutation probability matrix M 40 that can be derived from the matrix M 2 given above and that corresponds to an evolutionary distance of 40 PAMs. Homework 2: (1) Construct the character substitution matrix A 250 that corresponds to the character mutation probability matrix M 250 given above. (2) Use A 250 to construct the corresponding baby-pam250 scoring matrix S 250 with scores given in bits. (3) Find the expected score for random character mismatches and the entropy of S 250. References [1] S. Altschul. Amino acid substitution matrices from an information-theoretic perspective. J. of Mol. Biol., 219: , 1991 [2] M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt. A model of evolutionary change in proteins. Atlas of Protein sequence and structure, 5: , [3] D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press [4] S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89: , [5] W.-H. Li. Molecular evolution. Sinauer Associates Department of Mathematics,, Ohio University,, Athens, Ohio 45701, U.S.A. address: just@math.ohiou.edu
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationRapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
More informationAmino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More informationSequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationMath 202-0 Quizzes Winter 2009
Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
More informationRow Echelon Form and Reduced Row Echelon Form
These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation
More informationA linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form
Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationDERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationMATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform
MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More information10 Evolutionarily Stable Strategies
10 Evolutionarily Stable Strategies There is but a step between the sublime and the ridiculous. Leo Tolstoy In 1973 the biologist John Maynard Smith and the mathematician G. R. Price wrote an article in
More informationCOMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
More informationOnline Appendix to Stochastic Imitative Game Dynamics with Committed Agents
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider
More information1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
More informationPrime Time: Homework Examples from ACE
Prime Time: Homework Examples from ACE Investigation 1: Building on Factors and Multiples, ACE #8, 28 Investigation 2: Common Multiples and Common Factors, ACE #11, 16, 17, 28 Investigation 3: Factorizations:
More informationLinear Algebra Notes
Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationA simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R
A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R Federico Perea Justo Puerto MaMaEuSch Management Mathematics for European Schools 94342 - CP - 1-2001 - DE - COMENIUS - C21 University
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationZeros of a Polynomial Function
Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More information3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationFactorizations: Searching for Factor Strings
" 1 Factorizations: Searching for Factor Strings Some numbers can be written as the product of several different pairs of factors. For example, can be written as 1, 0,, 0, and. It is also possible to write
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationNotes on Continuous Random Variables
Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes
More informationYears after 2000. US Student to Teacher Ratio 0 16.048 1 15.893 2 15.900 3 15.900 4 15.800 5 15.657 6 15.540
To complete this technology assignment, you should already have created a scatter plot for your data on your calculator and/or in Excel. You could do this with any two columns of data, but for demonstration
More informationMATH2210 Notebook 1 Fall Semester 2016/2017. 1 MATH2210 Notebook 1 3. 1.1 Solving Systems of Linear Equations... 3
MATH0 Notebook Fall Semester 06/07 prepared by Professor Jenny Baglivo c Copyright 009 07 by Jenny A. Baglivo. All Rights Reserved. Contents MATH0 Notebook 3. Solving Systems of Linear Equations........................
More informationLesson 7 Z-Scores and Probability
Lesson 7 Z-Scores and Probability Outline Introduction Areas Under the Normal Curve Using the Z-table Converting Z-score to area -area less than z/area greater than z/area between two z-values Converting
More informationSection 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.
Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationMATH 10034 Fundamental Mathematics IV
MATH 0034 Fundamental Mathematics IV http://www.math.kent.edu/ebooks/0034/funmath4.pdf Department of Mathematical Sciences Kent State University January 2, 2009 ii Contents To the Instructor v Polynomials.
More informationEQUATIONS and INEQUALITIES
EQUATIONS and INEQUALITIES Linear Equations and Slope 1. Slope a. Calculate the slope of a line given two points b. Calculate the slope of a line parallel to a given line. c. Calculate the slope of a line
More informationChemical Kinetics. 2. Using the kinetics of a given reaction a possible reaction mechanism
1. Kinetics is the study of the rates of reaction. Chemical Kinetics 2. Using the kinetics of a given reaction a possible reaction mechanism 3. What is a reaction mechanism? Why is it important? A reaction
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationProblem of the Month: Fair Games
Problem of the Month: The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards:
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationMATCH Commun. Math. Comput. Chem. 61 (2009) 781-788
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,
More informationBinary Adders: Half Adders and Full Adders
Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
More informationFractions to decimals
Worksheet.4 Fractions and Decimals Section Fractions to decimals The most common method of converting fractions to decimals is to use a calculator. A fraction represents a division so is another way of
More informationLies My Calculator and Computer Told Me
Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More informationDatabase searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999
Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationLecture 1: Systems of Linear Equations
MTH Elementary Matrix Algebra Professor Chao Huang Department of Mathematics and Statistics Wright State University Lecture 1 Systems of Linear Equations ² Systems of two linear equations with two variables
More informationTom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.
Sum and Product This problem gives you the chance to: use arithmetic and algebra to represent and analyze a mathematical situation solve a quadratic equation by trial and improvement Tom wants to find
More informationAbstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).
MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix
More information5.5. Solving linear systems by the elimination method
55 Solving linear systems by the elimination method Equivalent systems The major technique of solving systems of equations is changing the original problem into another one which is of an easier to solve
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationZeros of Polynomial Functions
Zeros of Polynomial Functions The Rational Zero Theorem If f (x) = a n x n + a n-1 x n-1 + + a 1 x + a 0 has integer coefficients and p/q (where p/q is reduced) is a rational zero, then p is a factor of
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationClick on the links below to jump directly to the relevant section
Click on the links below to jump directly to the relevant section What is algebra? Operations with algebraic terms Mathematical properties of real numbers Order of operations What is Algebra? Algebra is
More informationA PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE
A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE R. J. Solomonoff Abstract Some preliminary work is presented on a very general new theory of inductive inference. The extrapolation of an
More informationOperation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
More informationSYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison
SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections
More information8 Square matrices continued: Determinants
8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You
More informationHill s Cipher: Linear Algebra in Cryptography
Ryan Doyle Hill s Cipher: Linear Algebra in Cryptography Introduction: Since the beginning of written language, humans have wanted to share information secretly. The information could be orders from a
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More information8 Primes and Modular Arithmetic
8 Primes and Modular Arithmetic 8.1 Primes and Factors Over two millennia ago already, people all over the world were considering the properties of numbers. One of the simplest concepts is prime numbers.
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationFinancial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2
Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded
More informationOne-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationHands-On Math Algebra
Hands-On Math Algebra by Pam Meader and Judy Storer illustrated by Julie Mazur Contents To the Teacher... v Topic: Ratio and Proportion 1. Candy Promotion... 1 2. Estimating Wildlife Populations... 6 3.
More informationKapitel 1 Multiplication of Long Integers (Faster than Long Multiplication)
Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication) Arno Eigenwillig und Kurt Mehlhorn An algorithm for multiplication of integers is taught already in primary school: To multiply
More information3 Some Integer Functions
3 Some Integer Functions A Pair of Fundamental Integer Functions The integer function that is the heart of this section is the modulo function. However, before getting to it, let us look at some very simple
More informationIntroduction to Hill cipher
Introduction to Hill cipher We have explored three simple substitution ciphers that generated ciphertext C from plaintext p by means of an arithmetic operation modulo 26. Caesar cipher: The Caesar cipher
More informationMatrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products
Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More informationSolutions to Problem Set 1
YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467b: Cryptography and Computer Security Handout #8 Zheng Ma February 21, 2005 Solutions to Problem Set 1 Problem 1: Cracking the Hill cipher Suppose
More informationRecursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1
Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms
More information1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works.
MATH 13150: Freshman Seminar Unit 18 1. The RSA algorithm In this chapter, we ll learn how the RSA algorithm works. 1.1. Bob and Alice. Suppose that Alice wants to send a message to Bob over the internet
More informationThe Basics of FEA Procedure
CHAPTER 2 The Basics of FEA Procedure 2.1 Introduction This chapter discusses the spring element, especially for the purpose of introducing various concepts involved in use of the FEA technique. A spring
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More information