LECTURE 13: Cross-validation

Size: px
Start display at page:

Download "LECTURE 13: Cross-validation"

Transcription

1 LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity

2 Itroductio () Almost ivariably, all the patter recoitio techiques that we have itroduced have oe or more free parameters The umber of eihbors i a knn Classificatio Rule The badwidth of the kerel fuctio i Kerel Desity Estimatio The umber of features to preserve i a Subset Selectio problem Two issues arise at this poit Model Selectio How do we select the optimal parameter(s) for a ive classificatio problem? Validatio Oce we have chose a model, how do we estimate its true error rate? The true error rate is the classifier s error rate whe tested o the ENTIRE POPULATION If we had access to a ulimited umber of examples, these questios would have a straihtforward aswer Choose the model that provides the lowest error rate o the etire populatio Ad, of course, that error rate is the true error rate However, i real applicatios oly a fiite set of examples is available This umber is usually smaller tha we would hope for! Why? Data collectio is a very expesive process Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 2

3 Itroductio (2) Oe may be tempted to use the etire traii data to select the optimal classifier, the estimate the error rate This aïve approach has two fudametal problems The fial model will ormally overfit the traii data: it will ot be able to eeralize to ew data The problem of overfitti is more proouced with models that have a lare umber of parameters The error rate estimate will be overly optimistic (lower tha the true error rate) I fact, it is ot ucommo to have 00% correct classificatio o traii data The techiques preseted i this lecture will allow you to make the best use of your (limited) data for Traii Model selectio ad Performace estimatio Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 3

4 The holdout method Split dataset ito two roups Traii set: used to trai the classifier Test set: used to estimate the error rate of the traied classifier Total umber of examples Traii Set Test Set The holdout method has two basic drawbacks I problems where we have a sparse dataset we may ot be able to afford the luxury of setti aside a portio of the dataset for testi Sice it is a sile trai-ad-test experimet, the holdout estimate of error rate will be misleadi if we happe to et a ufortuate split The limitatios of the holdout ca be overcome with a family of resampli methods at the expese of hiher computatioal cost Cross Validatio Radom Subsampli K-Fold Cross-Validatio Leave-oe-out Cross-Validatio Bootstrap Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 4

5 Radom Subsampli Radom Subsampli performs K data splits of the etire dataset Each data split radomly selects a (fixed) umber of examples without replacemet For each data split we retrai the classifier from scratch with the traii examples ad the estimate E i with the test examples Total umber of examples Experimet Test example Experimet 2 Experimet 3 The true error estimate is obtaied as the averae of the separate estimates E i This estimate is siificatly better tha the holdout estimate E = K K i= E i Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 5

6 K-Fold Cross-validatio Create a K-fold partitio of the the dataset For each of K experimets, use K- folds for traii ad a differet fold for testi This procedure is illustrated i the followi fiure for K=4 Total umber of examples Experimet Experimet 2 Experimet 3 Experimet 4 Test examples K-Fold Cross validatio is similar to Radom Subsampli The advatae of K-Fold Cross validatio is that all the examples i the dataset are evetually used for both traii ad testi As before, the true error is estimated as the averae error rate o test examples E = K K E i i= Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 6

7 Leave-oe-out Cross Validatio Leave-oe-out is the deeerate case of K-Fold Cross Validatio, where K is chose as the total umber of examples For a dataset with N examples, perform N experimets For each experimet use N- examples for traii ad the remaii example for testi Total umber of examples Experimet Experimet 2 Experimet 3 Sile test example Experimet N As usual, the true error is estimated as the averae error rate o test examples N E = E i N i= Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 7

8 How may folds are eeded? With a lare umber of folds + The bias of the true error rate estimator will be small (the estimator will be very accurate) - The variace of the true error rate estimator will be lare - The computatioal time will be very lare as well (may experimets) With a small umber of folds + The umber of experimets ad, therefore, computatio time are reduced + The variace of the estimator will be small - The bias of the estimator will be lare (coservative or smaller tha the true error rate) I practice, the choice of the umber of folds depeds o the size of the dataset For lare datasets, eve 3-Fold Cross Validatio will be quite accurate For very sparse datasets, we may have to use leave-oe-out i order to trai o as may examples as possible A commo choice for K-Fold Cross Validatio is K=0 Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 8

9 The bootstrap () The bootstrap is a resampli techique with replacemet From a dataset with N examples Radomly select (with replacemet) N examples ad use this set for traii The remaii examples that were ot selected for traii are used for testi This value is likely to chae from fold to fold Repeat this process for a specified umber of folds (K) As before, the true error is estimated as the averae error rate o test examples Complete dataset X X 2 X 3 X 5 Experimet X 3 X X 3 X 3 X 5 X 2 Experimet 2 X 5 X 5 X 3 X X 2 Experimet 3 X 5 X 5 X X 2 X X 3 Experimet K X X 2 X 3 X 5 Traii sets Test sets Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 9

10 The bootstrap (2) Compared to basic cross-validatio, the bootstrap icreases the variace that ca occur i each fold [Efro ad Tibshirai, 993] This is a desirable property sice it is a more realistic simulatio of the real-life experimet from which our dataset was obtaied Cosider a classificatio problem with C classes, a total of N examples ad N i examples for each class ω i The a priori probability of choosi a example from class ω i is N i /N Oce we choose a example from class ω i, if we do ot replace it for the ext selectio, the the a priori probabilities will have chaed sice the probability of choosi a example from class ω i will ow be (N i -)/N Thus, sampli with replacemet preserves the a priori probabilities of the classes throuhout the radom selectio process A additioal beefit of the bootstrap is its ability to obtai accurate measures of BOTH the bias ad variace of the true error estimate Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 0

11 Bias ad variace of a statistical estimate Cosider the problem of estimati a parameter α of a ukow distributio G To emphasize the fact that α cocers G we will refer to it as α(g) We collect N examples X={x, x 2,, x N } from the distributio G These examples defie a discrete distributio G with mass /N at each of the examples We compute the statistic α =α(g ) as a estimator of α(g) I the cotext of this lecture, α(g ) is the estimate of the true error rate for our classifier How ood is this estimator? The oodess of a statistical estimator is measured by BIAS: How much it deviates from the true value VARIANCE: How much variability it shows for differet samples X={x, x 2,, x N } of the populatio G Example: If we try to estimate the mea of the populatio with the sample mea = E [ α' ( G) ] α( G) where E [ X] x ( x) Bias G Var = E G [( α' E [ ]) ] 2 G α' The bias of the sample mea is kow to be ZERO From elemetary statistics, the stadard deviatio of the sample mea is equal to std N(N ) ( x) = ( x i x) i= This term is also kow i statistics as the STANDARD ERROR Ufortuately, there is o such a eat alebraic formula for almost ay estimate other tha the sample mea N 2 G + = dx Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity

12 Bias ad variace estimates with the bootstrap The bootstrap, with its eleat simplicity, allows us to estimate bias ad variace for practically ay statistical estimate, be it a scalar or vector (matrix) Here we will oly describe the estimatio procedure If you are iterested i more details, the textbook Advaced alorithms for eural etworks [Masters, 995] has a excellet itroductio to the bootstrap The bootstrap estimate of bias ad variace Cosider a dataset of N examples X={x, x 2,, x N } from the distributio G This dataset defies a discrete distributio G Compute α =α(g ) as our iitial estimate of α(g) Let {x *, x 2 *,, x N *} be a bootstrap dataset draw from X={x, x 2,, x N } Estimate the parameter α usi this bootstrap dataset α*(g*) Geerate K bootstrap datasets ad obtai K estimates {α* (G*), α* 2 (G*),, α* K (G*)} The ratioale i the bootstrap method is that the effect of eerati a bootstrap dataset from the distributio G is similar to the effect of obtaii the dataset X={x, x 2,, x N } from the oriial distributio G I other words, the distributio {α* (G*), α* 2 (G*),, α* K (G*)} is related to the iitial estimate α i the same fashio as multiple estimates α are related to the true value α, so the bias ad variace estimates of α are: Bias Var ( α' ) = [ α α' ] K i ( α' ) = ( α α ) K i= where 2 α = K K i= α i Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 2

13 Example: estimati bias ad variace Assume a small dataset x={3,5,2,,7} We wat to compute the bias ad variace of the sample mea α =3.6 We eerate a umber of bootstrap samples (three i this case) Assume that the first bootstrap yields the dataset {7,3,2,3,} We compute the sample mea α* =3.2 The secod bootstrap sample yields the dataset {5,,,3,7} We compute the sample mea α* 2 =3.4 The third bootstrap sample yields the dataset {2,2,7,,3} We compute the sample mea α* 3 =3.0 We averae these estimates ad obtai a averae of α* =3.2 What are the bias ad variace of the sample mea α Bias(α ) = = -0.4 Therefore, we coclude that the re-sampli process itroduces a dowward bias o the mea, so we would be iclied to use = 4.0 as a ubiased estimate of α Variace(α ) = ½*[( ) 2 +( ) 2 +( ) 2 ] = 0.04 NOTES We have doe this exercise for the sample mea (so you could trace the computatios), but α could be ay other statistical operator. Here lies the real power of this procedure!! How may bootstrap samples should we use? As a rule of thumb, several hudred re-samples will be sufficiet for most problems Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity Adapted from [Masters,995] 3

14 Three-way data splits () If model selectio ad true error estimates are to be computed simultaeously, the data eeds to be divided ito three disjoit sets [Ripley, 996] Traii set: a set of examples used for leari: to fit the parameters of the classifier I the MLP case, we would use the traii set to fid the optimal weihts with the back-prop rule Validatio set: a set of examples used to tue the parameters of a classifier I the MLP case, we would use the validatio set to fid the optimal umber of hidde uits or determie a stoppi poit for the back-propaatio alorithm Test set: a set of examples used oly to assess the performace of a fully-traied classifier I the MLP case, we would use the test to estimate the error rate after we have chose the fial model (MLP size ad actual weihts) After assessi the fial model o the test set, YOU MUST NOT tue the model ay further! Why separate test ad validatio sets? The error rate estimate of the fial model o validatio data will be biased (smaller tha the true error rate) sice the validatio set is used to select the fial model After assessi the fial model o the test set, YOU MUST NOT tue the model ay further! Procedure outlie. Divide the available data ito traii, validatio ad test set 2. Select architecture ad traii parameters 3. Trai the model usi the traii set 4. Evaluate the model usi the validatio set 5. Repeat steps 2 throuh 4 usi differet architectures ad traii parameters 6. Select the best model ad trai it it usi data from the traii ad validatio sets 7. Assess this fial model usi the test set This outlie assumes a holdout method If Cross-Validatio or Bootstrap are used, steps 3 ad 4 have to be repeated for each of the K folds Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 4

15 Three-way data splits (2) Test set Traii set Validatio set Model Error Σ Model 2 Error 2 Σ Mi Fial Model Σ Fial Error Model 3 Error 3 Σ Model 4 Error 4 Σ Model Selectio Error Rate Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M Uiversity 5

L13: cross-validation

L13: cross-validation Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

Lecture 13: Validation

Lecture 13: Validation Lecture 3: Validation g Motivation g The Holdout g Re-sampling techniques g Three-way data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Sampling Distribution And Central Limit Theorem

Sampling Distribution And Central Limit Theorem () Samplig Distributio & Cetral Limit Samplig Distributio Ad Cetral Limit Samplig distributio of the sample mea If we sample a umber of samples (say k samples where k is very large umber) each of size,

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu Multi-server Optimal Badwidth Moitorig for QoS based Multimedia Delivery Aup Basu, Iree Cheg ad Yizhe Yu Departmet of Computig Sciece U. of Alberta Architecture Applicatio Layer Request receptio -coectio

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

Quadrat Sampling in Population Ecology

Quadrat Sampling in Population Ecology Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Confidence Intervals

Confidence Intervals Cofidece Itervals Cofidece Itervals are a extesio of the cocept of Margi of Error which we met earlier i this course. Remember we saw: The sample proportio will differ from the populatio proportio by more

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized? 5.4 Amortizatio Questio 1: How do you fid the preset value of a auity? Questio 2: How is a loa amortized? Questio 3: How do you make a amortizatio table? Oe of the most commo fiacial istrumets a perso

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

STATISTICAL METHODS FOR BUSINESS

STATISTICAL METHODS FOR BUSINESS STATISTICAL METHODS FOR BUSINESS UNIT 7: INFERENTIAL TOOLS. DISTRIBUTIONS ASSOCIATED WITH SAMPLING 7.1.- Distributios associated with the samplig process. 7.2.- Iferetial processes ad relevat distributios.

More information

Lecture 16: Address decoding

Lecture 16: Address decoding Lecture 16: Address decodi Itroductio to address decodi Full address decodi Partial address decodi Implemeti address decoders Examples Microprocessor-based System Desi Ricardo Gutierrez-Osua Wriht State

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

I. Why is there a time value to money (TVM)?

I. Why is there a time value to money (TVM)? Itroductio to the Time Value of Moey Lecture Outlie I. Why is there the cocept of time value? II. Sigle cash flows over multiple periods III. Groups of cash flows IV. Warigs o doig time value calculatios

More information

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the. Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

15.075 Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

15.075 Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011 15.075 Exam 3 Istructor: Cythia Rudi TA: Dimitrios Bisias November 22, 2011 Gradig is based o demostratio of coceptual uderstadig, so you eed to show all of your work. Problem 1 A compay makes high-defiitio

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value Cocept 9: Preset Value Is the value of a dollar received today the same as received a year from today? A dollar today is worth more tha a dollar tomorrow because of iflatio, opportuity cost, ad risk Brigig

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Optimal Adaptive Bandwidth Monitoring for QoS Based Retrieval

Optimal Adaptive Bandwidth Monitoring for QoS Based Retrieval 1 Optimal Adaptive Badwidth Moitorig for QoS Based Retrieval Yizhe Yu, Iree Cheg ad Aup Basu (Seior Member) Departmet of Computig Sciece Uiversity of Alberta Edmoto, AB, T6G E8, CAADA {yizhe, aup, li}@cs.ualberta.ca

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as: A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

Intelligent Sensor Placement for Hot Server Detection in Data Centers - Supplementary File

Intelligent Sensor Placement for Hot Server Detection in Data Centers - Supplementary File Itelliget Sesor Placemet for Hot Server Detectio i Data Ceters - Supplemetary File Xiaodog Wag, Xiaorui Wag, Guoliag Xig, Jizhu Che, Cheg-Xia Li ad Yixi Che The Ohio State Uiversity, USA Michiga State

More information

Time Value of Money. First some technical stuff. HP10B II users

Time Value of Money. First some technical stuff. HP10B II users Time Value of Moey Basis for the course Power of compoud iterest $3,600 each year ito a 401(k) pla yields $2,390,000 i 40 years First some techical stuff You will use your fiacial calculator i every sigle

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC. SPC Formulas ad Tables 1 This documet cotais a collectio of formulas ad costats useful for SPC chart costructio. It assumes you are already familiar with SPC. Termiology Geerally, a bar draw over a symbol

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote

More information

One-sample test of proportions

One-sample test of proportions Oe-sample test of proportios The Settig: Idividuals i some populatio ca be classified ito oe of two categories. You wat to make iferece about the proportio i each category, so you draw a sample. Examples:

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Topic 5: Confidence Intervals (Chapter 9)

Topic 5: Confidence Intervals (Chapter 9) Topic 5: Cofidece Iterval (Chapter 9) 1. Itroductio The two geeral area of tatitical iferece are: 1) etimatio of parameter(), ch. 9 ) hypothei tetig of parameter(), ch. 10 Let X be ome radom variable with

More information

Queuing Systems: Lecture 1. Amedeo R. Odoni October 10, 2001

Queuing Systems: Lecture 1. Amedeo R. Odoni October 10, 2001 Queuig Systems: Lecture Amedeo R. Odoi October, 2 Topics i Queuig Theory 9. Itroductio to Queues; Little s Law; M/M/. Markovia Birth-ad-Death Queues. The M/G/ Queue ad Extesios 2. riority Queues; State

More information

3 Basic Definitions of Probability Theory

3 Basic Definitions of Probability Theory 3 Basic Defiitios of Probability Theory 3defprob.tex: Feb 10, 2003 Classical probability Frequecy probability axiomatic probability Historical developemet: Classical Frequecy Axiomatic The Axiomatic defiitio

More information

Chapter XIV: Fundamentals of Probability and Statistics *

Chapter XIV: Fundamentals of Probability and Statistics * Objectives Chapter XIV: Fudametals o Probability ad Statistics * Preset udametal cocepts o probability ad statistics Review measures o cetral tedecy ad dispersio Aalyze methods ad applicatios o descriptive

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

CHAPTER 11 Financial mathematics

CHAPTER 11 Financial mathematics CHAPTER 11 Fiacial mathematics I this chapter you will: Calculate iterest usig the simple iterest formula ( ) Use the simple iterest formula to calculate the pricipal (P) Use the simple iterest formula

More information

Data-Enhanced Predictive Modeling for Sales Targeting

Data-Enhanced Predictive Modeling for Sales Targeting Data-Ehaced Predictive Modelig for Sales Targetig Saharo Rosset Richard D. Lawrece Abstract We describe ad aalyze the idea of data-ehaced predictive modelig (DEM). The term ehaced here refers to the case

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information