# CHAPTER 14 MORE ABOUT REGRESSION

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp between the hand-spans (cm) and heghts (nches) of 167 college students, and found that the pattern of the relatonshp n ths sample could be descrbed by the equaton Average hand-span = Heght An equaton lke the one relatng hand-span to heght s called a regresson equaton, and the term smple regresson s sometmes used to descrbe the analyss of a straght-lne relatonshp (lnear relatonshp) between a response varable (y-varable) and an explanatory varable (xvarable). In Chapter 5, we only used regresson methods to descrbe a sample and dd not make statstcal nferences about the larger populaton. Now, we consder how to make nferences about a relatonshp n the populaton represented by the sample. Some questons nvolvng the populaton that we mght ask when analyzng a relatonshp are: 1. Does the observed relatonshp also occur n the populaton? For example, s the observed relatonshp between hand-span and heght strong enough to conclude that the relatonshp also holds n the populaton? 2. For a lnear relatonshp, what s the slope of the regresson lne n the populaton? For example, n the larger populaton, what s the slope of the regresson lne that connects hand-spans to heghts? 3. What s the mean value of the response varable (y) for ndvduals wth a specfc value of the explanatory varable (x)? For example, what s the mean hand-span n a populaton of people 65 nches tall? 4. What nterval of values predcts the value of the response varable (y) for an ndvdual wth a specfc value of the explanatory varable (x)? For example, what nterval predcts the hand-span of an ndvdual 65 nches tall? 14.1 Sample and Populaton Regresson Models A regresson model descrbes the relatonshp between a quanttatve response varable (the y-varable) and one or more explanatory varables (x-varables). The y-varable s sometmes called the dependent varable, and because regresson models may be used to make predctons, the x-varables may be called the predctor varables. The labels response varable and explanatory varable may be used for the varables on the y-axs and x-axs, respectvely, even f there s not an obvous way to assgn these labels n the usual sense Any regresson model has two mportant components. The most obvo us component s the equaton that descrbes how the mean value of the y-varable s connected to specfc values of the x-varable. The equaton stated before for the connecton between hand-span and heght, Average hand-span = Heght, s an example. In ths Chapter, we focus on lnear relatonshps so a straght-lne equaton wll be used, but t s mportant to note that some relatonshps are curvlnear. The second component of a regresson model descrbes how ndvduals vary from the regresson lne. Fgure 14.1, whch s dentcal to Fgure 5.6, dsplays the raw data for the sample of n=167 hand-spans and heghts along wth the regresson lne that estmates how the mean hand-span s connected to specfc heghts. Notce that most ndvduals vary from the lne. When 631

2 we examne sample data, we wll fnd t useful to estmate the general sze of the devatons from the lne. When we consder a model for the relatonshp wthn the populaton represented by a sample, we wll state assumptons about the dstrbuton of devatons from the lne. If the sample represents a larger populaton, we need to dstngush between the regresson lne for the sample and the regresson lne for the populaton. The observed data can be used to determne the regresson lne for the sample, but the regresson lne for the populaton can only be magned. Because we do not observe the whole populaton, we wll not know numercal values for the ntercept and slope of the regresson lne n the populaton. As n nearly every statstcal problem, the statstcs from a sample are used to estmate the unknown populaton parameters, whch n ths case are the slope and ntercept of the regresson lne. Fgure 14.1 Regresson Lne Lnkng Hand-Span and Heght for a Sample of College Students The Regresson Lne for the Sample In Chapter 5, we ntroduced ths notaton for the regresson lne that descrbes sample data: yˆ = b0 + b1 x. In any gven stuaton, the sample s used to determne values for b 0 and b 1. ŷ s spoken as y-hat and t s also referred to ether as predcted y or estmated y. b 0 s the ntercept of the straght lne. The ntercept s the value of ŷ when x = 0. b 1 s the slope of the straght lne. The slope tells us how much of an ncrease (or decrease) there s for ŷ when the x-varable ncreases by one unt. The sgn of the slope tells us whether ŷ ncreases or decreases when x ncreases. If the slope s 0, there s no lnear relatonshp between x and y because ŷ s the same for all values of x. The equaton descrbng the relatonshp between hand-span and heght for the sample of college students can be wrtten as ŷ = x. In ths equaton: ŷ estmates the average hand-span for any specfc heght x. If heght=70 nches, for nstance, ŷ = (70)= 21.5 cm. 632

3 The ntercept s b 0 = 3. Whle necessary for the lne, ths value does not have a useful statstcal nterpretaton n ths example. It estmates the average hand-span for ndvduals who have heght = 0 nches, an mpossble heght far from the range of the observed heghts. It also s an mpossble hand span. The slope s b 1 = Ths value tells us that the average ncrease n hand-span s 0.35 centmeters for every one-nch ncrease n heght. Remnder: The Least-Squares Crteron In Chapter 5, we descrbed the least-squares crteron. Ths mathematcal crteron s used to determne numercal values of the ntercept and slope of a sample regresson lne. The leastsquares lne s the lne, among all possble lnes, that has the smallest sum of squared dfferences between the sample values of y and the correspondng values of ŷ. Devatons from the Regresson Lne n the Sample The terms random error, resdual varaton, and resdual error all are used as synonyms for the term devaton. Most commonly, the word resdual s used to descrbe the devaton of an observed y-value from the sample regresson lne. A resdual s easy to compute. It smply s the dfference between the observed y-value for an ndvdual and the value of ŷ determned from the x-value for that ndvdual. Example 1. Resduals n the Hand-Span and Heght Regresson Consder a person 70 nches tall whose hand-span s 23 centmeters. The sample regresson lne s ŷ = x, so ŷ = (70) = 21.5 cm for ths person. The resdual = observed y- predcted y = y- ŷ = = 1.5 cm. Fgure 14.2 llustrates ths resdual. For an observaton y n the sample, the resdual s e = y ŷ. y = the value of the response varable for the observaton. ŷ = b0 + b1x where x s the value of the explanatory varable for the observaton. Techncal Note : The sum of the resduals s 0 for any least-squares regresson lne. The "least squares" formulas for determnng the equaton always result n y = yˆ, so e =

4 Fgure 14.2 Resdual for a person 70 nches tall wth a hand span = 23 centmeters. The resdual s the dfference between observed y=23 and ŷ =21.5, the predcted value for a person 70 nches tall. The Regresson Lne for the Populaton The regresson equaton for a smple lnear relatonshp n a populaton can be wrtten as: E( Y ) = β 0 + β1 x E(Y) represents the mean or expected value of y for ndvduals n the populaton who all have the same partcular value of x. Note that ŷ s an estmate of E(Y). β 0 s the ntercept of the straght lne n the populaton. β 1 s the slope of the lne n the populaton. Note that f the slope β 1 = 0, there s no lnear relatonshp n the populaton. Unless we measure the entre populaton, we cannot know the numercal values of β 0 and β 1. These are populaton parameters that we estmate usng the correspondng sample statstcs. In the hand-span and heght example, b 1 =0.35 s a sample statstc that estmates the populaton parameter β 1, and b 0 = -3 s a sample statstc that estmates the populaton parameter β 0. Devatons from the Regresson Lne n the Populaton To make statstcal nferences about the populaton, two assumptons about how the y- values vary from the populaton regresson lne are necessary. Frst, we assume that the general sze of the devaton of y-values from the lne s the same for all values of the explanatory varable (x), an assumpton called the constant varance assumpton. Ths assumpton may or may not be correct n any partcular stuaton, and a scatter plot should be examned to see f t s reasonable or not. In Fgure 14.1, the constant varance assumpton looks reasonable because the magntude of the devaton from the lne appears to be about the same across the range of observed heghts. The second assumpton about the populaton s that for any specfc value of x, the dstrbuton of y-values s a normal dstrbuton. Equvalently, ths assumpton s that devatons from the populaton regresson lne have a normal curve dstrbuton. Fgure 14.3 llustrates ths assumpton along wth the other elements of the populaton regresson model for a lnear 634

5 relatonshp. The lne E( Y ) = β 0 + β1 x descrbes the mean of y, and the normal curves descrbe devatons from the mean. Fgure 14.3 Regresson Model for Populaton Summary of the Smple Regresson Model A useful format for expressng the components of the populaton regresson model s Y = MEAN + DEVIATION. Ths conceptual equaton states that for any ndvdual, the value of the response varable (y) can be constructed by combnng two components: The MEAN, whch n the populaton s the lne E( Y ) = β 0 + β1 x f the relatonshp s lnear. There are other possble relatonshps, such as curvlnear, a specal case of whch s a 2 quadratc relatonshp, E(Y) = β0 +β1x + β2x. Relatonshps that are not lnear wll not be dscussed n ths book. The ndvdual's DEVIATION = y - MEAN, whch s what s left unexplaned after accountng for the mean y-value at that ndvdual's x-value. Ths format also apples to the sample, although techncally we should use the term "estmated mean" when referrng to the sample regresson lne. Example 1 Contnued. MEAN and DEVIATION for Heght and Hand-Span Regresson. Recall that the sample regresson lne for hand spans and heghts s ŷ = x. Although t s not lkely to be true, let's assume for convenence that ths equaton also holds n the populaton. If your heght s x=70 nches and your hand span s y=23 cm., then: MEAN = (70) = 21.5, DEVIATION= Y - MEAN = = 1.5, and y = 23 = MEAN + DEVIATION = In other words, your handspan s 1.5 cm above the mean for people wth your heght. 635

6 In the theoretcal development of procedures for makng statstcal nferences for a regresson model, the collecton of all DEVIATIONS n the populaton s assumed to have a 2 normal dstrbuton wth mean 0 and standard devaton σ (so, the varance s σ ). The value of the standard devaton σ s an unknown populaton parameter that s estmated usng the sample. Ths standard devaton can be nterpreted n the usual way that we nterpret a standard devaton. It s, roughly the average dstance between ndvdual values of y and the mean of y as descrbed by the regresson lne. In other words, t s roughly the sze of the average devaton across all ndvduals n the range of x-values. Keepng the regresson notaton straght for populatons and samples can be confusng. Although we have not yet ntroduced all relevant notaton, a summary at ths stage wll help you keep t straght. Smple Lnear Regresson Model For ( x1, y1),(x 2, y2),...,(x n, yn ), a sample of n observatons of the explanatory varable x and the response varable y from a large populaton, the smple lnear regresson model descrbng the relatonshp between y and x s: Populaton verson Mean: Indvdual: E 0 1 ( Y ) = β + β x y = β +β x + ε = E( Y) + ε 0 1 The devatons ε are assumed to follow a normal dstrbuton wth mean 0 and standard devaton σ. Sample verson Mean: ˆ = b + b x y 0 1 Indvdual: y = b + b x + e = yˆ e where e s the resdual for ndvdual. The sample statstcs b 0 and b 1 estmate the populaton parameters β,β 0 1. The mean of the resduals s 0, and the resduals can be used to estmate the populaton standard devaton σ Estmatng the Standard Devaton From the Mean Recall that the standard devaton n the regresson model measures, roughly, the average devaton of y-values from the mean (the regresson lne). Expressed another way, the standard devaton for regresson measures the general sze of the resduals. Ths s an mportant and useful statstc for descrbng ndvdual varaton n a regresson problem, and t also provdes nformaton about how accurately the regresson equaton mght predct y-values for ndvduals. A relatvely small standard devaton from the regresson lne ndcates that ndvdual data ponts generally fall close to the lne, so predctons based on the lne wll be close to the actual values. The calculaton of the estmate of standard devaton s based on the sum of the squared resduals for the sample. Ths quantty s called the sum of squared errors and s denoted by SSE. Synonyms for sum of squared errors are resdual sum of squares or sum of squared resduals. To fnd the SSE, resduals are calculated for all observatons, then the resduals are squared and summed. The standard devaton for the sample s Sum of Squared Resduals SSE s = =, and ths sample statstc estmates the populaton n-2 n 2 standard devaton σ. 636

7 Estmatng the Standard Devaton for a Smple Regresson Model 2 2 SSE = ( y yˆ ) = e 2 SSE ( y yˆ ) s = = n 2 n 2 The statstc s s an estmate of the populaton standard devaton σ. Remember that n the regresson context, σ s the standard devaton of the y-values at each x, not the standard devaton of the whole populaton of y-values. Example 2. Re latonshp Between Heght and Weght for College Men Fgure 14.4 dsplays regresson results from the Mntab program and a scatter plot for the relatonshp between y = weght (pounds) and x = heght (nches) n a sample of n=43 men n a Penn State statstcs class. The regresson lne for the sample s ŷ = x, and ths lne s drawn onto the plot. We see from the plot that there s consderable varaton from the lne at any gven heght. The standard devaton, shown n the row of computer output mmedately above the plot, s "s=24.00." Ths value roughly measures, for any gven heght, the general sze of the devatons of ndvdual weghts from the mean weght for the heght. The standard devaton from the regresson lne can be nterpreted n conjuncton wth the Emprcal Rule for bell-shaped data stated n Secton 2.7. Recall, for nstance, that about 95% of ndvduals wll fall wthn two standard devatons of the mean. As an example, consder men who are 72 nches tall. For men wth ths heght, the estmated average weght determned from the regresson equaton s (72) = 186 pounds. The estmated standard devaton from the regresson lne s s=24 pounds, so we can estmate that about 95% of men 72 nches tall have weghts wthn 2 24=48 pounds of 186 pounds, whch s 186 ± 48, or 138 to 234 pounds. Thnk about whether ths makes sense for all the men you know who are 72 nches (6 feet) tall. 637

8 Fgure 14.4 The Relatonshp Between Weght and Heght for n=43 College Men The regresson equaton s Weght = Heght Predctor Coef SE Coef T P Constant Heght S = R-Sq = 32.3% R-Sq(adj) = 30.7% The Proporton of Varaton Explaned by x In Chapter 5, we learned that a statstc denoted as r 2 s used to measure how well the explanatory varable actually does explan the varaton n the response varable. Ths statstc s also denoted as R 2 (rather than r 2 ), and the value s commonly expressed as a percent. Researchers typcally use the phrase proporton of varaton explaned by x n conjuncton wth the value of r 2. For example, f r 2 = 0.60 (or 60%), the researcher may wrte that the explanatory varable explans 60% of the varaton n the response varable. The formula for r 2 presented n Chapter 5 was 2 SSTO SSE r = SSTO The quantty SSTO s the sum of squared dfferences between observed y values and the sample mean y. It measures the sze of the devatons of the y-values from the overall mean of y, whereas SSE measures the devatons of the y-values from the predcted values ŷ. 638

9 Example 2 Contnued. R 2 Heghts and Weghts of College Men In Fgure 14.4, we can fnd the nformaton the "R-sq = 32.3%" for the relatonshp between weght and heght. A researcher mght wrte the varable heght explans 32.3% of the varaton n the weghts of college men. Ths sn t a partcularly mpressve statstc. As we noted before, there s substantal devaton of ndvdual weghts from the regresson lne so a predcton of a college man's weght based on heght may not be partcularly accurate. Example 3. Drver Age and Hghway Sgn Readng Dstance In Example 5.2, we examned data for the relatonshp between y=maxmum dstance (feet) at whch a drver can read a hghway sgn and x = the age of the drver. There were n=30 observatons n the data set. Fgure 14.5 dsplays Mntab regresson output for these data. The equaton descrbng the lnear relatonshp n the sample s Average dstance = Age From the output, we learn that the standard devaton from the regresson lne s s=49.76 and R- sq=64.2%. Roughly, the average devaton from the regresson lne s about 50 feet, and the proporton of varaton n sgn readng dstances explaned by age s 0.642, or 64.2%. Fgure 14.5 Mntab Output: Sgn Readng Dstance and Drver Age The regresson equaton s Dstance = Age Predctor Coef SE Coef T P Constant Age S = R-Sq = 64.2% R-Sq(adj) = 62.9% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Unusual Observatons Obs Age Dstance Ft SE Ft Resdual St Resd R R denotes an observaton wth a large standardzed resdual The "Analyss of Varance" table provdes the peces needed to compute r 2 and s: SSE=69334 SSE s = = = n 2 28 SSTO= SSTO-SSE = = r 2 = =.642 or 64.2%

10 14.3 Inference about the Lnear Regresson Relatonshp When researchers do a regresson analyss, they occasonally know based on past research or common sense that the varables are ndeed related. In some nstances, however, t may be necessary to do a hypothess test n order to make the generalzaton that two varables are related n the populaton represented by the sample. The statstcal sgnfcance of a lnear relatonshp can be evaluated by testng whether or not the slope s 0. Recall that f the slope s 0 n a smple regresson model, the two varables are not related because changes n the x-varable wll not lead to changes n the y-varable. The usual null hypothess and alternatve hypotheses about β 1, the slope of the populaton lne E( Y ) = β 0 + β1 x, are: H o : β 1 = 0 (the populaton slope s 0, so y and x are not lnearly related.) H a : β 1 0 (the populaton slope s not 0, so y and x are lnearly related.) The alternatve hypothess may be one-sded or two-sded, although most statstcal software uses the two sded alternatve. The test statstc used to do the hypothess test s a t statstc wth the same general format that we saw n Chapter 13. That format, and ts applcaton to ths stuaton, s sample statstc null value b1 0 t = = standard error s. e.( b1 ) Ths s a standardzed statstc for the dfference between the sample slope and 0, the null value. Notce that a large value of the sample slope (ether postve or negatve) relatve to ts standard error wll gve a large value of t. If the mathematcal assumptons about the populaton model descrbed n Secton 14.1 are correct, the statstc has a t dstrbuton wth n-2 degrees of freedom. The p-value for the test s determned usng that dstrbuton. By hand calculatons of the sample slope and ts standard error are cumbersome. Fortunately, the regresson analyss of most statstcal software ncludes a t-statstc and a p-value for ths sgnfcance test. Techncal Note: In case you ever need to compute the values by hand, here are the formulas for the sample slope and ts standard error: sy b 1 = r s s s.e.(b 1) =, where s = 2 (x x) x SSE n 2 In the formula for the sample slope, s x and s y are the sample standard devatons of the x and y values respectvely, and r s the correlaton between x and y. Example 3 Contnued: Drver Age and Hghway Sgn Readng Dstance Fgure 14.5 presents the Mntab output for the regresson of sgn readng dstance and drver age. The sample estmate of the slope s b 1 = Ths sample slope s dfferent than 0, but s t enough dfferent to enable us to generalze that a lnear relatonshp exsts n the populaton represented by ths sample? The part of the Mntab output that can be used to test the statstcal sgnfcance of the relatonshp s shown n bold n Fgure 14.5, and the relevant p-value s underlned (by the authors of ths text, not by Mntab). Ths lne of the output provdes nformaton about the sample slope, the standard error of the sample slope, the t statstc for testng statstcal sgnfcance and the p- value for the test of: 640

11 H o : β 1 = 0 (the populaton slope s 0, so y and x are not lnearly related.) H a : β 1 0 (the populaton slope s not 0, so y and x are lnearly related.) The test statstc s: sample statstc null value b t = = = = 7.09 standard error s. e.( b1) The p-value s, to 3 decmal places, Ths means the probablty s vrtually 0 that the observed slope could be as far from 0 or farther than t s f there s no lnear relatonshp n the populaton. So, as we mght expect for these varables, we can conclude that the relatonshp between the two varables n the sample represents a real relatonshp n the populaton. Confdence Interval for the Populaton Slope The sgnfcance test of whether or not the populaton slope s 0 only tells us f we can declare the relatonshp to be statstcally sgnfcant. If we decde that the true slope s not 0, we mght ask, What s the value of the slope? We can answer ths queston wth a confdence nterval for β 1, the populaton slope. The format for ths confdence nterval s the same as the general format used n Chapters 10 and 12, whch s sample estmate multpler standard error The estmate of the populaton slope β 1 s b 1, the slope of the least-squares regresson lne for the sample. As shown already, the standard error formula s complcated and we ll usually rely on statstcal software to determne ths value. The multpler wll be labeled t* and s determned usng a t-dstrbuton wth df = n-2. Table 12.1 can be used to fnd the multpler for the desred confdence level. Formula for Confdence Interval for β 1, the Populaton Slope A confdence nterval for β 1 s b ± * 1 t s.e.(b1) The multpler t* s found usng a t-dstrbuton wth n-2 degrees of freedom, and s such that the probablty between t* and +t* equals the confdence level for the nterval. Example 3 Contnued. 95% Confdence Interval for Slope Between Age and Sgn Readng Dstance In Fgure 14.4, we see that the estmated slope s b 1=-3.01 and s.e.( b 1 )= There are n=30 observatons so df=28 for fndng t*. For a 95% confdence level, t*=2.05 (see Table 12.1). The 95% confdence nterval for the populaton slope s -3.01± ± to 2.14 Wth 95% confdence, we can estmate that n the populaton of drvers represented by ths sample, the mean sgn readng dstance decreases somewhere between 3.88 and 2.14 feet for each one-year ncrease n age. 641

12 Testng Hypotheses about the Correlaton Coeffcent In Chapter 5, we learned that the correlaton coeffcent s 0 when the regresson lne s horzontal. In other words, f the slope of the regresson lne s 0, the correlaton s 0. Ths means that the results of a hypothess test for the populaton slope can also be nterpreted as applyng to equvalent hypotheses about the correlaton between x and y n the populaton. As we dd for the regresson model, we use dfferent notaton to dstngush between a correlaton computed for a sample and a correlaton wthn a populaton. It s commonplace to use the symbol ρ (pronounced rho ) to represent the correlaton between two varables wthn a populaton. Usng ths notaton, null and alternatve hypotheses of nterest are: H 0 : ρ = 0 (x and y are not correlated) H a : ρ 0 (x and y are correlated) The results of the hypothess test descrbed before for the populaton slope β 1 can be used for these hypotheses as well. If we reject H 0 : β 1 = 0, we also reject H 0 : ρ = 0. If we decde n favor of H a : β 1 0, we also decde n favor of H a : ρ 0. Many statstcal software programs, ncludng Mntab, wll gve a p-value for testng whether the populaton correlaton s 0 or not. Ths p-value wll be the same as the p-value gven for testng whether the populaton slope s 0 or not. In the followng Mntab output for the relatonshp between pulse rate and weght n a sample of 35 college women, notce that s gven as the p-value for testng that the slope s 0 (look under P n the regresson results) and for testng that the correlaton s 0. Because ths s not a small p-value, we can reject the null hypotheses for the slope and the correlaton. Regresson Analyss: Pulse versus Weght The regresson equaton s Pulse = Weght Predctor Coef SE Coef T P Constant Weght Correlatons: Pulse, Weght Pearson correlaton of Pulse and Weght = P-Value = The Effect of Sample Sze on Sgnfcance The sze of a sample always affects whether a specfc observed result acheves statstcal sgnfcance. For example, r =.183 s not a statstcally sgnfcant correlaton for a sample sze of n=35, as n the pulse and weght example, but t would be statstcally sgnfcant f n=1,000. Wth very large sample szes, weak relatonshps wth low correlaton values can be statstcally sgnfcant. The moral of the story here s that wth a large sample sze, t may not be sayng much to say that two varables are sgnfcantly related. Ths only means that we thnk the correlaton s not 0. To assess the practcal sgnfcance of the result, we should carefully examne the observed strength of the relatonshp. 642

13 14.4 Predctng the Value of Y for an Indvdual An mportant use of a regresson equaton s to estmate or predct the unknown value of a response varable for an ndvdual wth a known specfc value of the explanatory varable. Usng the data descrbed n Example 3, for nstance, we can predct the maxmum dstance at whch an ndvdual can read a hghway sgn by substtutng hs or her age for x n the sample regresson equaton. Consder a person 21 years old. The predcted dstance s approxmately ŷ = = 514 feet. There wll be varaton among 21 year-olds wth regard to the sgn readng dstance, so the predcted dstance of 514 feet s not lkely to be the exact dstance for the next 21 year old who vews the sgn. Rather than predctng that the dstance wll be exactly 514 feet, we should nstead predct that the dstance wll be wthn a partcular nterval of values. A 95% predcton nterval for the value of the response varable (y) accounts for the varaton among ndvduals wth a partcular value of x. Ths nterval can be nterpreted n two equvalent ways. The 95% predcton nterval estmates the central 95% of the values of y for members of the populaton wth a specfed value of x. The probablty s 0.95 that a randomly selected ndvdual from the populaton wth a specfed value of x falls nto the correspondng 95% predcton nterval. Notce that a predcton nterval dffers conceptually from a confdence nterval. A confdence nterval estmates an unknown populaton parameter, whch s a numercal characterstc or summary of the populaton. An example n ths Chapter s a confdence nterval for the slope of the populaton lne. A predcton nterval, however, does not estmate a parameter; nstead t estmates the potental data value for an ndvdual. Equvalently, t descrbes an nterval nto whch a specfed percentage of the populaton may fall. As wth most regresson calculatons, the by hand formulas for predcton ntervals are formdable. Statstcal software can be used to create the nterval. Fgure 14.6 shows Mntab output that ncludes the 95% predcton ntervals for three dfferent ages (21 years old, 30 years old, and 45 years old). The ntervals are toward the bottom rght sde of the dsplay n a column labeled "95% PI" and are hghlghted wth bold type. (Note: The term Ft s a synonym for ŷ, the estmate of the average response at the specfc x value.) Here s what we can conclude: The probablty s 0.95 that a randomly selected 21 year-old wll read the sgn at somewhere between roughly 407 and 620 feet. The probablty s 0.95 that a randomly selected 30 year-old wll read the sgn at somewhere between roughly 381and 592 feet. The probablty s 0.95 that a randomly selected 45 year-old wll read the sgn at somewhere between roughly 338 and 545 feet. We can also nterpret each nterval as an estmate of the sgn readng dstances for the central 95% of a populaton of drvers wth a specfed age. For nstance, about 95% of all drvers 21 years old wll be able to read the sgn at a dstance somewhere between 407 and 620 feet. 643

14 Fgure 14.6 Mntab output showng predcton nterval of dstance The regresson equaton s Dstance = Age Predctor Coef SE Coef T P Constant Age S = R-Sq = 64.2% R-Sq(adj) = 62.9% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Unusual Observatons Obs Age Dstance Ft SE Ft Resdual St Resd R R denotes an observaton wth a large standardzed resdual Predcted Values for New Observatons New Obs Ft SE Ft 95.0% CI 95.0% PI ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Values of Predctors for New Observatons New Obs Age We re not lmted to usng only 95% predcton ntervals. Wth Mntab, we can descrbe any central percentage of the populaton that we wsh. For example, here are 50% predcton ntervals for the sgn readng dstance at the three specfc ages we consdered above. Age Ft 50.0% PI ( , ) ( , ) ( , ) For each specfc age, the 50% predcton nterval estmates the central 50% of the maxmum sgn readng dstances n a populaton of drvers wth that age. For example, we can estmate that 50% of drvers 21 years old would have a maxmum sgn readng dstance somewhere between about 478 feet and 549 feet. The dstances for the other 50% of 21 year-old drvers would be predcted to be outsde ths range wth 25% beyond 549 feet and 25% below 478 feet. Interpretaton of a Predcton Interval A predcton nterval estmates the value of y for an ndvdual wth a partcular value of x, or equvalently, the range of values of the response varable for a specfed central percentage of a populaton wth a partcular value of x. 644

15 Techncal Note: The formula for the predcton nterval for y at a specfc x s: where 2 2 ŷ± t* s + [s.e.(ft)] 2 1 ( x x) s. e.( ft) = s + 2 n ( x x) The multpler t* s found usng a t-dstrbuton wth n-2 degrees of freedom, and s such that the probablty between t* and +t* equals the desred level for the nterval. Note: The s.e.(ft), and thus the wdth of the nterval, depends upon how far the specfed x-value s from x. The further the specfc x s from the mean, the wder the nterval. When n s large, s.e.(ft) wll be small, and the predcton nterval wll be approxmately ŷ± t*s Estmatng the Mean Y at a Specfed X In the prevous secton, we focused on the estmaton of the values of the response varable for ndvduals. A researcher may nstead want to estmate the mean value of the response varable for ndvduals wth a partcular value of the explanatory varable. We mght ask, What s the mean weght for college men who are 6 feet tall? Ths queston only asks about the mean weght n a group wth a common heght, and t s not concerned wth the devatons of ndvduals from that mean. In techncal terms, we wsh to estmate the populaton mean E(Y) for a specfc value of x that s of nterest to us. To make ths estmate, we use a confdence nterval. Ths format for ths confdence nterval s agan: sample estmate multpler standard error The sample estmate of E(Y) s the value of ŷdetermned by substtutng the x-value of nterest nto yˆ = b0 + b1 x, the least-squares regresson lne for the sample. The standard error of ŷ s the s.e.(ft) shown n the Techncal Note n the prevous secton, and ts value s usually provded by statstcal software. The multpler s found usng a t-dstrbuton wth df=n-2, and Appendx A-3 can be used to determne ts value. Example 2 Revsted. Estmatng Mean Weght of College Men at Varous Heghts Based on the sample of n=43 college men n Example 2, let s estmate the mean weght n the populaton of college men for each of three dfferent heghts: 68 nches, 70 nches, and 72 nches. Fgure 14.7 shows Mntab output that ncludes the three dfferent confdence ntervals for these three dfferent heghts. These ntervals are toward the bottom of the dsplay n a column labeled 95% CI. The frst entry n that column s the estmate of the populaton mean weght for men who are 68 nches tall. Wth 95% confdence, we can estmate that mean weght of college men 68 nches tall s somewhere between and pounds. The second row under 95% CI contans the nformaton that the 95% confdence nterval for the mean weght of college men 70 nches tall s to pounds. The 95% confdence nterval for the mean weght for men 72 nches tall s to pounds. Agan, t s mportant to realze that the confdence ntervals for E(Y) do not descrbe the varaton among ndvduals. They only are estmates of the mean weghts for specfc heghts. The predcton ntervals for ndvdual responses descrbe the varaton among ndvduals. You may have notced that 95% predcton ntervals, labeled 95% PI, are next to the confdence 645

16 ntervals n the output. Among men 70 nches tall, for nstance, we would estmate that 95% of the ndvdual weghts would be n the nterval from about 122 to about 221 pounds. Fgure 14.7 Mntab Output wth Confdence Intervals For Mean Weght The regresson equaton s Weght = Heght Predctor Coef SE Coef T P Constant Heght S = R-Sq = 32.3% R-Sq(adj) = 30.7% --- Some Output Omtted ---- Predcted Values for New Observatons New Obs Ft SE Ft 95.0% CI 95.0% PI ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Values of Predctors for New Observatons New Obs Heght Checkng Condtons for Usng Regresson Models for Inference There are a few condtons that should be at least approxmately true when we use a regresson model to make an nference about a populaton. Of the fve condtons that follow, the frst two are partcularly crucal. Condtons for Lnear Regresson 1. The form of the equaton that lnks the mean value of y to x must be correct. For nstance, we won t make proper nferences f we use a straght lne to descrbe a curved relatonshp. 2. There should not be any extreme outlers that nfluence the results unduly. 3. The standard devaton of the values of y from the mean y s the same regardless of the value of the x varable. In other words, y values are smlarly spread out at all values of x. 4. For ndvduals n the populaton wth the same partcular value of x, the dstrbuton of the values of y s a normal dstrbuton. Equvalently, the dstrbuton of devatons from the mean value of y s a normal dstrbuton. Ths condton can be relaxed f the sample sze s large. 5. Observatons n the sample are ndependent of each other. 646

17 Checkng the Condtons wth Plots A scatter plot of the raw data and plots of the resduals provde nformaton about the valdty of the assumptons. Remember that a resdual s the dfference between an observed value and the predcted value for that observaton, and that some assumptons made for a lnear regresson model have to do wth how y-values devate from the regresson lne. If the propertes of the resduals for the sample appear to be consstent wth the mathematcal assumptons made about devatons wthn the populaton, we can use the model to make statstcal nferences. Condtons 1, 2 and 3 can be checked usng two useful plots: A scatter plot of y versus x for the sample (y vs x) A scatter plot of the resduals versus x for the sample (resds vs x) If Condton 1 holds for a lnear relatonshp, then: The plot of y vs x should show ponts randomly scattered around an magnary straght lne. The plot of resds vs x should show ponts randomly scattered around a horzontal lne at resd = 0. If Condton 2 holds, extreme outlers should not be evdent n ether plot. If condton 3 holds, nether plot should show ncreasng or decreasng spread n the ponts as x ncreases. Example 2 Contnued. Checkng the Condtons for the Weght and Heght Problem Fgure 14.4 dsplayed a scatter plot of the weghts and heghts of n=43 college men. In that plot, t appears that a straght-lne s a sutable model for how mean weght s lnked to heght. In Fgure 14.8 there s a plot of the resduals ( e ) versus the correspondng values of heght for these 43 men. Ths plot s further evdence that the rght model has been used. If the rght model has been used, the way n whch ndvduals devate from the lne (resduals) wll not be affected by the value of the explanatory varable. The somewhat random lookng blob of ponts n Fgure 14.8 s the way a plot of resduals versus x should look f the rght equaton for the mean has been used. Both plots (Fgures 14.4 and 14.8) also show that there are no extreme outlers and that the heghts have approxmately the same varance across the range of heghts n the sample. Therefore, Condtons 2 and 3 appear to be met. Fgure 14.8 Plot of Resduals versus X for Example 2. The Absence of a Pattern Indcates the Rght Model Has Been Used 647

18 Condton 4, whch s that devatons from the regresson lne are normally dstrbuted, s dffcult to verfy but t s also the least mportant of the condtons because the nference procedures for regresson are robust. Ths means that f there are no major outlers or extreme skewness, the nference procedures work well even f the dstrbuton of y-values s not a normal dstrbuton. In Chapters 12 and 13, we saw that confdence ntervals and hypothess tests for a mean or a dfference between two means also were robust. To examne the dstrbuton of the devatons from the lne, a hstogram of the resduals s useful although for small samples a hstogram may not be nformatve. A more advanced plot called a normal probablty plot can also be used to check whether the resduals are normally dstrbuted, but we do not provde the detals n ths text. Fgure 14.9 dsplays a hstogram of the resduals for Example 2. It appears that the resduals are approxmately normally dstrbuted, so Condton 4 s met. Fgure 14.9 Hstogram of Resduals for Example 2 Condton 5 follows from the data collecton process. It s met as long as the unts are measured ndependently. It would not be met f the same ndvduals were measured across the range of x-values, such as f x=average speed and y=gas mleage were to be measured for multple tanks of gas on the same cars. More complcated models are needed for dependent observatons, and those models wll not be dscussed n ths book. Correctons When Condtons Are Not Met There are some steps that can be taken f Condtons 1, 2 or 3 are not met. If Condton 1 s not met, more complcated models can be used. For nstance, Fgure shows a typcal plot of resduals that occurs when a straght-lne model s used to descrbe data that are curvlnear. It may help to thnk of the resduals as predcton errors that would occur f we use the regresson lne to predct the value of y for the ndvduals n the sample. In the plot shown n Fgure 14.10, the predcton errors are all negatve n the central regon of X and nearly all postve for outer values of X. Ths occurs because the wrong model s beng used to make the predctons. A curvlnear model, such as the quadratc model dscussed earler, may be more approprate. Fgure A Resdual Plot Indcatng the Wrong Model Has Been Used 648

19 Condton 2, that there are no nfluental outlers, can be checked graphcally wth the scatter plot of y versus x and the plot of resduals versus x. The approprate correcton f there are outlers depends on the reason for the outlers. The same consderatons and correctve acton dscussed n Chapter 2 would be taken, dependng on the cause of the outler. For nstance, Fgure shows a scatter plot and a resdual plot for the data of Exercse 38 n Chapter 5. A potental outler s seen n both plots. In ths example, the x-varable s weght and the y-varable s tme to chug a beverage. The outler probably represents a legtmate data value. The relatonshp appears to be lnear for weghts rangng up to about 210 pounds, but then t appears to change. It could ether become quadratc, or t could level off. We do not have enough data to determne what happens for hgher weghts. The soluton n ths case would be to remove the outler, and use the lnear regresson relatonshp only for body weghts under about 210 pounds. Determnng the relatonshp for hgher body weghts would requre a larger sample of ndvduals n that range. 649

20 Fgure Scatter plot and Resdual Plot Wth an Outler If ether Condton 1 or Condton 3 s not met, a transformaton may be requred. Ths s equvalent to usng a dfferent model. Fortunately, often the same transformaton wll correct problems wth Condtons 1,3, and 4. For nstance, when the response varable s monetary, such as salares, t s often more approprate to use the relatonshp ln(y) = b 0 + b 1 x + e In other words, to assume that there s a lnear relatonshp between the natural log of y and the x- values. Ths s called a log transformaton on the y's. We wll not pursue transformatons further n ths book. 650

### CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

### Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

### b) The mean of the fitted (predicted) values of Y is equal to the mean of the Y values: c) The residuals of the regression line sum up to zero: = ei

Mathematcal Propertes of the Least Squares Regresson The least squares regresson lne obeys certan mathematcal propertes whch are useful to know n practce. The followng propertes can be establshed algebracally:

### Introduction to Regression

Introducton to Regresson Regresson a means of predctng a dependent varable based one or more ndependent varables. -Ths s done by fttng a lne or surface to the data ponts that mnmzes the total error. -

### The covariance is the two variable analog to the variance. The formula for the covariance between two variables is

Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.

### Questions that we may have about the variables

Antono Olmos, 01 Multple Regresson Problem: we want to determne the effect of Desre for control, Famly support, Number of frends, and Score on the BDI test on Perceved Support of Latno women. Dependent

### THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

### 1. Measuring association using correlation and regression

How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

### benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

### Chapter 14 Simple Linear Regression

Sldes Prepared JOHN S. LOUCKS St. Edward s Unverst Slde Chapter 4 Smple Lnear Regresson Smple Lnear Regresson Model Least Squares Method Coeffcent of Determnaton Model Assumptons Testng for Sgnfcance Usng

### 9.1 The Cumulative Sum Control Chart

Learnng Objectves 9.1 The Cumulatve Sum Control Chart 9.1.1 Basc Prncples: Cusum Control Chart for Montorng the Process Mean If s the target for the process mean, then the cumulatve sum control chart s

### The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15

The Analyss of Covarance ERSH 830 Keppel and Wckens Chapter 5 Today s Class Intal Consderatons Covarance and Lnear Regresson The Lnear Regresson Equaton TheAnalyss of Covarance Assumptons Underlyng the

### SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

### THE TITANIC SHIPWRECK: WHO WAS

THE TITANIC SHIPWRECK: WHO WAS MOST LIKELY TO SURVIVE? A STATISTICAL ANALYSIS Ths paper examnes the probablty of survvng the Ttanc shpwreck usng lmted dependent varable regresson analyss. Ths appled analyss

### x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

### STATISTICAL DATA ANALYSIS IN EXCEL

Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

### Lecture 10: Linear Regression Approach, Assumptions and Diagnostics

Approach to Modelng I Lecture 1: Lnear Regresson Approach, Assumptons and Dagnostcs Sandy Eckel seckel@jhsph.edu 8 May 8 General approach for most statstcal modelng: Defne the populaton of nterest State

### PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

### CHAPTER 7 THE TWO-VARIABLE REGRESSION MODEL: HYPOTHESIS TESTING

CHAPTER 7 THE TWO-VARIABLE REGRESSION MODEL: HYPOTHESIS TESTING QUESTIONS 7.1. (a) In the regresson contet, the method of least squares estmates the regresson parameters n such a way that the sum of the

### Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.

Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

### Analysis of Covariance

Chapter 551 Analyss of Covarance Introducton A common tas n research s to compare the averages of two or more populatons (groups). We mght want to compare the ncome level of two regons, the ntrogen content

### Economic Interpretation of Regression. Theory and Applications

Economc Interpretaton of Regresson Theor and Applcatons Classcal and Baesan Econometrc Methods Applcaton of mathematcal statstcs to economc data for emprcal support Economc theor postulates a qualtatve

### ErrorPropagation.nb 1. Error Propagation

ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then

### Binary Dependent Variables. In some cases the outcome of interest rather than one of the right hand side variables is discrete rather than continuous

Bnary Dependent Varables In some cases the outcome of nterest rather than one of the rght hand sde varables s dscrete rather than contnuous The smplest example of ths s when the Y varable s bnary so that

### The OC Curve of Attribute Acceptance Plans

The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

### 8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

### HYPOTHESIS TESTING OF PARAMETERS FOR ORDINARY LINEAR CIRCULAR REGRESSION

HYPOTHESIS TESTING OF PARAMETERS FOR ORDINARY LINEAR CIRCULAR REGRESSION Abdul Ghapor Hussn Centre for Foundaton Studes n Scence Unversty of Malaya 563 KUALA LUMPUR E-mal: ghapor@umedumy Abstract Ths paper

### NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

### MULTIPLE LINEAR REGRESSION IN MINITAB

MULTIPLE LINEAR REGRESSION IN MINITAB Ths document shows a complcated Mntab multple regresson. It ncludes descrptons of the Mntab commands, and the Mntab output s heavly annotated. Comments n { } are used

### Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

### Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral Part The Calbraton Curve, Correlaton Coeffcent and Confdence Lmts CHM314 Instrumental Analyss Department of Chemstry, Unversty of Toronto Dr.

### EXPLORATION 2.5A Exploring the motion diagram of a dropped object

-5 Acceleraton Let s turn now to moton that s not at constant elocty. An example s the moton of an object you release from rest from some dstance aboe the floor. EXPLORATION.5A Explorng the moton dagram

### Can Auto Liability Insurance Purchases Signal Risk Attitude?

Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

### Solution of Algebraic and Transcendental Equations

CHAPTER Soluton of Algerac and Transcendental Equatons. INTRODUCTION One of the most common prolem encountered n engneerng analyss s that gven a functon f (, fnd the values of for whch f ( = 0. The soluton

### I. SCOPE, APPLICABILITY AND PARAMETERS Scope

D Executve Board Annex 9 Page A/R ethodologcal Tool alculaton of the number of sample plots for measurements wthn A/R D project actvtes (Verson 0) I. SOPE, PIABIITY AD PARAETERS Scope. Ths tool s applcable

### What is Candidate Sampling

What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

### Descriptive Statistics (60 points)

Economcs 30330: Statstcs for Economcs Problem Set 2 Unversty of otre Dame Instructor: Julo Garín Sprng 2012 Descrptve Statstcs (60 ponts) 1. Followng a recent government shutdown, Mnnesota Governor Mark

### Regression Models for a Binary Response Using EXCEL and JMP

SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

### CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

### Capital asset pricing model, arbitrage pricing theory and portfolio management

Captal asset prcng model, arbtrage prcng theory and portfolo management Vnod Kothar The captal asset prcng model (CAPM) s great n terms of ts understandng of rsk decomposton of rsk nto securty-specfc rsk

### Meta-Analysis of Hazard Ratios

NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These

### SIX WAYS TO SOLVE A SIMPLE PROBLEM: FITTING A STRAIGHT LINE TO MEASUREMENT DATA

SIX WAYS TO SOLVE A SIMPLE PROBLEM: FITTING A STRAIGHT LINE TO MEASUREMENT DATA E. LAGENDIJK Department of Appled Physcs, Delft Unversty of Technology Lorentzweg 1, 68 CJ, The Netherlands E-mal: e.lagendjk@tnw.tudelft.nl

### Multivariate EWMA Control Chart

Multvarate EWMA Control Chart Summary The Multvarate EWMA Control Chart procedure creates control charts for two or more numerc varables. Examnng the varables n a multvarate sense s extremely mportant

### An Alternative Way to Measure Private Equity Performance

An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

### H 1 : at least one is not zero

Chapter 6 More Multple Regresson Model The F-test Jont Hypothess Tests Consder the lnear regresson equaton: () y = β + βx + βx + β4x4 + e for =,,..., N The t-statstc gve a test of sgnfcance of an ndvdual

### Describing Communities. Species Diversity Concepts. Species Richness. Species Richness. Species-Area Curve. Species-Area Curve

peces versty Concepts peces Rchness peces-area Curves versty Indces - mpson's Index - hannon-wener Index - rlloun Index peces Abundance Models escrbng Communtes There are two mportant descrptors of a communty:

### Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

### The Analysis of Outliers in Statistical Data

THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

### 1 Example 1: Axis-aligned rectangles

COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

### Linear Regression Analysis for STARDEX

Lnear Regresson Analss for STARDEX Malcolm Halock, Clmatc Research Unt The followng document s an overvew of lnear regresson methods for reference b members of STARDEX. Whle t ams to cover the most common

### Nandini Dendukuri 1,2 Caroline Reinhold 3,4

Dendukur and Renhold Correlaton and Regresson Research Fundamentals of Clncal Research for Radologsts Downloaded from www.ajronlne.org by 37.44.07.0 on 0/3/7 from I address 37.44.07.0. Copyrght ARRS. For

### Introduction: Analysis of Electronic Circuits

/30/008 ntroducton / ntroducton: Analyss of Electronc Crcuts Readng Assgnment: KVL and KCL text from EECS Just lke EECS, the majorty of problems (hw and exam) n EECS 3 wll be crcut analyss problems. Thus,

### The Magnetic Field. Concepts and Principles. Moving Charges. Permanent Magnets

. The Magnetc Feld Concepts and Prncples Movng Charges All charged partcles create electrc felds, and these felds can be detected by other charged partcles resultng n electrc force. However, a completely

### Nasdaq Iceland Bond Indices 01 April 2015

Nasdaq Iceland Bond Indces 01 Aprl 2015 -Fxed duraton Indces Introducton Nasdaq Iceland (the Exchange) began calculatng ts current bond ndces n the begnnng of 2005. They were a response to recent changes

### Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

### Calculation of Sampling Weights

Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

### Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

### Section 5.4 Annuities, Present Value, and Amortization

Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

### The Probit Model. Alexander Spermann. SoSe 2009

The Probt Model Aleander Spermann Unversty of Freburg SoSe 009 Course outlne. Notaton and statstcal foundatons. Introducton to the Probt model 3. Applcaton 4. Coeffcents and margnal effects 5. Goodness-of-ft

### Brigid Mullany, Ph.D University of North Carolina, Charlotte

Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

### Chapter 7. Random-Variate Generation 7.1. Prof. Dr. Mesut Güneş Ch. 7 Random-Variate Generation

Chapter 7 Random-Varate Generaton 7. Contents Inverse-transform Technque Acceptance-Rejecton Technque Specal Propertes 7. Purpose & Overvew Develop understandng of generatng samples from a specfed dstrbuton

### Recurrence. 1 Definitions and main statements

Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

### Study on CET4 Marks in China s Graded English Teaching

Study on CET4 Marks n Chna s Graded Englsh Teachng CHE We College of Foregn Studes, Shandong Insttute of Busness and Technology, P.R.Chna, 264005 Abstract: Ths paper deploys Logt model, and decomposes

### II. PROBABILITY OF AN EVENT

II. PROBABILITY OF AN EVENT As ndcated above, probablty s a quantfcaton, or a mathematcal model, of a random experment. Ths quantfcaton s a measure of the lkelhood that a gven event wll occur when the

### Ping Pong Fun - Video Analysis Project

Png Pong Fun - Vdeo Analyss Project Objectve In ths experment we are gong to nvestgate the projectle moton of png pong balls usng Verner s Logger Pro Software. Does the object travel n a straght lne? What

### Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

### Passive Filters. References: Barbow (pp 265-275), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Passve Flters eferences: Barbow (pp 6575), Hayes & Horowtz (pp 360), zzon (Chap. 6) Frequencyselectve or flter crcuts pass to the output only those nput sgnals that are n a desred range of frequences (called

### Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

### Graph Theory and Cayley s Formula

Graph Theory and Cayley s Formula Chad Casarotto August 10, 2006 Contents 1 Introducton 1 2 Bascs and Defntons 1 Cayley s Formula 4 4 Prüfer Encodng A Forest of Trees 7 1 Introducton In ths paper, I wll

### Prediction of Wind Energy with Limited Observed Data

Predcton of Wnd Energy wth Lmted Observed Data Shgeto HIRI, khro HOND Nagasak R&D Center, MITSISHI HEVY INDSTRIES, LTD, Nagasak, 8539 JPN Masaak SHIT Nagasak Shpyard & Machnery Works, MITSISHI HEVY INDSTRIES,

### THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

### Question 2: What is the variance and standard deviation of a dataset?

Queston 2: What s the varance and standard devaton of a dataset? The varance of the data uses all of the data to compute a measure of the spread n the data. The varance may be computed for a sample of

### Communication Networks II Contents

8 / 1 -- Communcaton Networs II (Görg) -- www.comnets.un-bremen.de Communcaton Networs II Contents 1 Fundamentals of probablty theory 2 Traffc n communcaton networs 3 Stochastc & Marovan Processes (SP

### 7 ANALYSIS OF VARIANCE (ANOVA)

7 ANALYSIS OF VARIANCE (ANOVA) Chapter 7 Analyss of Varance (Anova) Objectves After studyng ths chapter you should apprecate the need for analysng data from more than two samples; understand the underlyng

### On the correct model specification for estimating the structure of a currency basket

On the correct model specfcaton for estmatng the structure of a currency basket Jyh-Dean Hwang Department of Internatonal Busness Natonal Tawan Unversty 85 Roosevelt Road Sect. 4, Tape 106, Tawan jdhwang@ntu.edu.tw

### Examples of Multiple Linear Regression Models

ECON *: Examples of Multple Regresson Models Examples of Multple Lnear Regresson Models Data: Stata tutoral data set n text fle autoraw or autotxt Sample data: A cross-sectonal sample of 7 cars sold n

### Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University

Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton

### Chapter 15 Multiple Regression

Chapter 5 Multple Regresson In chapter 9, we consdered one dependent varable (Y) and one predctor (regressor or ndependent varable) (X) and predcted Y based on X only, whch also known as the smple lnear

### Binomial Link Functions. Lori Murray, Phil Munz

Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

### An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

### Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

Statstcs Rudolf N. Cardnal Graduate-level statstcs for psychology and neuroscence NOV n practce, and complex NOV desgns Verson of May 4 Part : quck summary 5. Overvew of ths document 5. Background knowledge

### PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato

### Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

### Control Charts for Means (Simulation)

Chapter 290 Control Charts for Means (Smulaton) Introducton Ths procedure allows you to study the run length dstrbuton of Shewhart (Xbar), Cusum, FIR Cusum, and EWMA process control charts for means usng

### 9 Arithmetic and Geometric Sequence

AAU - Busness Mathematcs I Lecture #5, Aprl 4, 010 9 Arthmetc and Geometrc Sequence Fnte sequence: 1, 5, 9, 13, 17 Fnte seres: 1 + 5 + 9 + 13 +17 Infnte sequence: 1,, 4, 8, 16,... Infnte seres: 1 + + 4

### Forecasting the Direction and Strength of Stock Market Movement

Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

### Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

### Quantization Effects in Digital Filters

Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

### The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

### Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

### Experiment 8 Two Types of Pendulum

Experment 8 Two Types of Pendulum Preparaton For ths week's quz revew past experments and read about pendulums and harmonc moton Prncples Any object that swngs back and forth can be consdered a pendulum

### Portfolio Loss Distribution

Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

### Linear Regression, Regularization Bias-Variance Tradeoff

HTF: Ch3, 7 B: Ch3 Lnear Regresson, Regularzaton Bas-Varance Tradeoff Thanks to C Guestrn, T Detterch, R Parr, N Ray 1 Outlne Lnear Regresson MLE = Least Squares! Bass functons Evaluatng Predctors Tranng

### Quality Adjustment of Second-hand Motor Vehicle Application of Hedonic Approach in Hong Kong s Consumer Price Index

Qualty Adustment of Second-hand Motor Vehcle Applcaton of Hedonc Approach n Hong Kong s Consumer Prce Index Prepared for the 14 th Meetng of the Ottawa Group on Prce Indces 20 22 May 2015, Tokyo, Japan

### Chapter 2. Determination of appropriate Sample Size

Chapter Determnaton of approprate Sample Sze Dscusson of ths chapter s on the bass of two of our publshed papers Importance of the sze of sample and ts determnaton n the context of data related to the

### Statistical Methods to Develop Rating Models

Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

### + + + - - This circuit than can be reduced to a planar circuit

MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to