A. Karpinski

Size: px
Start display at page:

Download "A. Karpinski"

Transcription

1 Chapter 5 Smple Lnear Regresson Regresson Dagnostcs and Remedal Measures Page. Resduals and regresson assumptons 5-. Resdual plots to detect lack of ft Resdual plots to detect homogenety of varance 5-4. Resdual plots to detect non-normalty Identfyng outlers and nfluental observatons Remedal Measures: 5-33 n overvew of alternatve regresson models 7. Remedal Measures: Transformatons Karpnsk

2 Smple Lnear Regresson Regresson Dagnostcs and Remedal Measures. Resduals and regresson assumptons The regresson assumptons can be stated n terms of the resduals ε ~ NID (, σ ) o ll observatons are ndependent and randomly selected from the populaton (or equvalently, the resdual terms, ε s, are ndependent) o The resduals are normally dstrbuted at each level of X o The varance of the resduals s constant across all levels of X We must also assume that the regresson model s the correct model o The relatonshp between the predctor and outcome varable s lnear o No relevant varables have been omtted o No error n the measurement of predctor varables Types of resduals o (Unstandardzed) resduals, e e = Y Yˆ resdual s the devaton of the observed value from the predcted value on the orgnal scale of the data If the regresson model fts the data perfectly, then there would be no resduals. In practce, we always have resduals, but the presence of many large resduals can ndcate that the model does not ft the data well If the resduals are normally dstrbuted, then we would expect to fnd 5% of resduals greater than σ from the mean % of resduals greater than.5σ from the mean.% of resduals greater than 3 σ from the mean It can be dffcult to eyeball standard devatons from the mean, so we often turn to standardzed resduals 4-. Karpnsk

3 o Standardzed resduals, e~ Y Yˆ Y Yˆ ~ e = = σ MSE Standardzed resduals are z-scores. Why? The average of the resduals s zero e e = = n e The standard devaton of the resduals s MSE ( e e) e Var( e) = = n n SSE = = MSE n So a standardzed resdual would be gven by: e e e Y Y e~ = = = σ MSE MSE e Because standardzed resduals are z-scores, we can easly detect outlers. When examnng standardzed resduals, we should fnd: 5% of e~ s greater than % of e~ s greater than. 5.% of e~ s greater than 3 o Studentzed resduals, e MSE s the overall varance of the resduals It turns out that the varance of an ndvdual resdual s a bt more complcated. Each resdual has ts own varance, dependng on ts dstance from X When resduals are standardzed usng resdual-specfc standard devatons, the resultng resdual s called a studentzed resdual. In large samples, t makes lttle dfference whether standardzed or studentzed are used. However, n small samples, studentzed resduals gve more accurate results. Because SPSS makes the use of studentzed resduals easy, t s good practce to examne studentzed resduals rather than standardzed resduals 4-3. Karpnsk

4 Obtanng resduals n SPSS REGRESSION /DEPENDENT dollars /METHOD=ENTER mles /SVE RESID (resd) ZRESID (zresd) SRESID (sresd). o RESID produces unstandardzed resduals o ZRESID produces standardzed resduals o SRESID produces studentzed resduals o Each resdual appears n a new data column n the data edtor RESID ZRESID SRESID You can see the dfference between standardzed and studentzed resduals s small, but t can make a dfference n how the model ft s nterpreted Because all the regresson assumptons can be stated n terms of the resduals, examnng resduals and resdual plots can be very useful n verfyng the assumptons o In general, we wll rely on resdual plots to evaluate the regresson assumptons rather than rely on statstcal tests of those assumptons 4-4. Karpnsk

5 . Resdual plots to detect lack of ft There are several reasons why a regresson model mght not ft the data well ncludng: o The relatonshp between X and Y mght not be lnear o Important varables mght be omtted from the model To detect non-lnearty n the relatonshp between X and Y, you can: o Create a scatterplot of X aganst Y Look for non-lnear relatonshps between X and Y Ths scatterplot s not nformatve regardng the regresson assumptons. o Plot the resduals aganst the X values The resduals have lnear assocaton between X and Y removed. If X and Y are lnearly related, then all that should be remanng for the resduals to capture s random error Thus, any departure from a random scatterplot ndcates problems In general, ths graph s easer to nterpret than the smple scatterplot and an added advantage of ths graph (f studentzed resduals are used) s that you can easly spot outlers In smple lnear regresson, a plot of e vs X s dentcal to a plot of e vs Yˆ. Thus, there s no need to examne both of these plots. The predcted values are the part of the Ys that have a lnear relatonshp wth X, so Ŷ and X wll always be perfectly correlated when there s only one predctor. In multple regresson, dfferent nformaton may be obtaned from a plot of e vs X and from a plot of e vs Ŷ Karpnsk

6 Example #: good lnear regresson model (n = ) o scatterplot of X aganst Y GRPH /SCTTERPLOT(BIVR)=x WITH y y x.6.8. o The X-Y relatonshp looks lnear Plot the resduals aganst the X values GRPH /SCTTERPLOT(BIVR)=x WITH sresd... Studentzed Resdual o The plot looks random so we have evdence that there s no non-lnear relatonshp between X and Y o We also see that no outlers are present o Ths graph s as good as t gets! x Karpnsk

7 Example #: nonlnear relatonshp between X and Y (n = ) o scatterplot of X aganst Y GRPH /SCTTERPLOT(BIVR)=x WITH y y x o The X-Y relatonshp looks mostly lnear.8. Plot the resduals aganst the X values GRPH /SCTTERPLOT(BIVR)=x WITH sresd Studentzed Resdual x o Ths graph has a slght U-shape, suggestng the possblty of a non-lnear relatonshp between X and Y o We also see one outler Karpnsk

8 Example #3: second nonlnear relatonshp between X and Y (n = ) o scatterplot of X aganst Y GRPH /SCTTERPLOT(BIVR)=x WITH y y x o The X-Y looks slghtly curvlnear n ths case. Plot the resduals aganst the X values GRPH /SCTTERPLOT(BIVR)=x WITH sresd... Studentzed Resdual x.6.8. o Ths graph has a strong U-shape, ndcatng a non-lnear relatonshp between X and Y o Notce that t s easer to detect the non-lnearty n the resdual plot than n the scatterplot 4-8. Karpnsk

9 o You can not determne lack-of-ft/non-lnearty from the sgnfcance tests on the regresson parameters REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x In ths case, we fnd evdence for a strong lnear relatonshp between X and Y, b =.887,t(98) =8.95, p <. [r =.94] Model B (Constant) X a. Dependent Varable: Y Unstandardzed Coeffcents Standard zed Coeffcen ts Coeffcents a Correlatons t Sg. Zero-order Partal Part Std. Error Beta Ths lnear relatonshp between X and Y accounts for 88.5% of the varance n Y. Model Summary Model R R Square djusted R Square Std. Error of the Estmate.94 a a. Predctors: (Constant), X Yet from the resdual plot, we know that ths lnear model s ncorrect and does not ft the data well Despte the level of sgnfcance and the large percentage of the varance accounted for, we should not report ths erroneous model Detectng the omsson of an mportant varable by lookng at the resduals s very dffcult! 4-9. Karpnsk

10 3. Resdual plots to detect homogenety of varance We assume that the varance of the resduals s constant across all levels of predctor varable(s) To examne f the resduals are homoscedastc, we can plot the resduals aganst the predcted values o If the resduals are homoscedastc, then ther varablty should be constant over the range GOOD BD BD o s prevously mentoned, plottng resduals aganst ftted values (Yˆ ) or aganst the predctor (X) produces the same plots when there s only one X varable. In multple regresson, a plot of the resduals aganst ftted values (Yˆ ) s generally preferred, but n ths case t makes no dfference o The raw resduals and the standardzed resduals do not take nto account the fact the varance of each resdual s dfferent (and depends on ts dstance from the mean of X). For plots to examne homogenety, t s partcularly mportant to use the studentzed resduals 4-. Karpnsk

11 Example #: homoscedastc model (n = ) GRPH /SCTTERPLOT(BIVR)=sresd WITH pred... Studentzed Resdual Unstandardzed Predcted Value o The band of resduals s constant across the entre length of the observed predcted values.8. Example #: heteroscedastc model (n = ) GRPH /SCTTERPLOT(BIVR)=sresd WITH pred. 4.. Studentzed Resdual Unstandardzed Predcted Value o Ths pattern where the varance ncreases as Y ncreases s a common form of heteroscedastcty Karpnsk

12 o In ths case, the unequal heteroscedastcty s also apparent from the X-Y scatterplot. But n general, volatons of the varance assumpton are easer to spot n the resdual plots GRPH /SCTTERPLOT(BIVR)=y WITH x.. 5. y x 5.. o s n the case of lookng for non-lnearty, examnng the regresson model provdes no clues that the model assumptons have been volated REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x. Model Summary Model R R Square djusted R Square Std. Error of the Estmate.33 a a. Predctors: (Constant), X Model (Constant) X a. Dependent Varable: Y Unstandardzed Coeffcents Standard zed Coeffcen ts Coeffcents a Correlatons t Sg. Zero-order Partal Part B Std. Error Beta Karpnsk

13 4. Resdual plots to detect non-normalty s for NOV, symmetry s more mportant than normalty There are a number of technques that we can use to check normalty of the resduals. In general, these are the same technques we used to check normalty n NOV o Boxplots or hstograms of resduals o normal P-P plot of the resduals o Coeffcents of skewness/kurtoss may also be used Normalty s dffcult to check and can be nfluenced by other volatons of assumptons. good strategy s to check and address all other assumptons frst, and then turn to checkng normalty These tests are not foolproof o Techncally, we assume that the resduals are normally dstrbuted at each level of the predctor varable(s) o It s possble (but unlkely) that the dstrbuton of resduals mght be leftskewed for some values of X and rght skewed for other values so that, on average, the resduals appear normal. o If you are concerned about ths possblty and f you have a very large sample, you could dvde the Xs nto a equal categores, and check normalty separately for each of the a subsamples (you would want at least 3-5 observatons per group). In general, ths s not necessary. Example #: Normally dstrbuted resduals (N = ) EXMINE VRIBLES=sresd /PLOT BOXPLOT HISTOGRM NPPLOT. Studentzed Resdual Mean 5% Trmmed Mean Medan Varance Std. Devaton Mnmum Maxmum Range Interquartle Range Skewness Kurtoss Descrptves Statstc Std. Error o The mean s approxmately equal to the medan o The coeffcents of skewness and kurtoss are relatvely small 4-3. Karpnsk

14 3 - - Tests of Normalty -3 N = Studentzed Resdual Studentzed Resdual Shapro-Wlk Statstc df Sg o Plots can also be obtaned drectly from the regresson command REGRESSION /DEPENDENT y /METHOD=ENTER z /RESIDULS HIST(SRESID) NORM(SRESID) /SVE sresid (sresd). 6 Hstogram Dependent Varable: Y. Normal P-P Plot of Regresson Studentzed Resdual Regresson Studentzed Resdual Observed Cum Prob o The hstogram and P-P plot are as good as they get. There are no problems wth the normalty assumpton Karpnsk

15 Example #: Non-normally dstrbuted resduals (N = ) REGRESSION /DEPENDENT y /METHOD=ENTER z /RESIDULS HIST(ZRESID) NORM(ZRESID) /SVE ZRESID (zresd). Hstogram Dependent Varable: Y. Normal P-P Plot of Regresson Standardzed Resdual Regresson Standardzed Resdual Observed Cum Prob EXMINE VRIBLES=zresd /PLOT BOXPLOT HISTOGRM NPPLOT /STTISTICS DESCRIPTIVES. Descrptves Standardzed Resdual Mean 5% Trmmed Mean Medan Varance Std. Devaton Mnmum Maxmum Range Interquartle Range Skewness Kurtoss Statstc Std. Error Tests of Normalty Standardzed Resdual Shapro-Wlk Statstc df Sg o ll sgns pont to non-normal, non-symmetrcal resduals. There s a volaton of the normalty assumpton n ths case Karpnsk

16 5. Identfyng outlers and nfluental observatons Observatons wth large resduals are called outlers But remember, when the resduals are normally dstrbuted, we expect a small percentage of resduals to be large We expect 5% of e s greater than We expect % of e s greater than.5 We expect.% of e s greater than 3 Expected number of resduals # of observatons > >.5 > o Many people use e > as a check for outlers, but ths crteron results n too many observatons beng dentfed as outlers. In large samples, we expect a large num ber of observatons to have resduals greater than o more reasonable cut-off for outlers s to use >.5 or even e > 3 e 4-6. Karpnsk

17 There are multple knds of outlers.5 #.4.3 #.. #3 # o # s an Y outler o # s an X outler o #3 and #4 are outlers for both X and Y When we examne extreme observatons, we want to know: o Is t an outler? (.e., Does t dffer from the rest of the observed data?) o Is t an nfluental observaton? (.e., Does t have an mpact on the regresson equaton?) Clearly, each of the values hghlghted on the graph s an outler, but how wll each nfluence estmaton of the regresson lne? o Outler # Influence on the ftted values: Influence on the slope: o Outler # Influence on the ftted values: Influence on the slope: o Outler #3 Influence on the ftted values: Influence on the slope: o Outler #4 Influence on the ftted values: Influence on the slope: 4-7. Karpnsk

18 Not all outlers are equally nfluental. It s not enough to dentfy outlers; we must also consder the nfluence each may have (partcularly on the estmaton of the slope) Methods of dentfyng outlers and nfluental ponts: o Examnaton of the studentzed resduals o scatterplot of studentzed resduals wth X o Examnaton of the studentzed deleton resduals o Examnaton of leverage values o Examnaton of Cook s dstance (Cook s D) o Examnaton of DFBETs Detectng extremty on the Y values: Studentzed Deleton Resduals o deleton resdua l s the dfference between the observed Y and the predcted Y ˆ based on a model wth the th () value observaton deleted d = Y Y ˆ ( ) o The deleton resdual s a measure of how much the th observaton nfluences the overall regresson equaton (that s, the ftted values) o If the th observaton has no nfluence on the regresson lne then Y = Y ˆ ( ) and d = o The greater the nfluence of the observaton, the greater the deleton resdual o Note that we cannot determne f the observaton nfluences the estmaton of the ntercept or of the slope. We can only tell that t has an nfluence on at least one of the parameters n the regresson equaton. o The sze of the deleton resduals wll be determned, n part, by the scale of the Y values. In order to create deleton resduals that do not depend on the scale of Y, we can d vde d by ts standard devaton to obtan a studentzed deleton resdual d Y Y ˆ = ( ) s(d ) o Studentzed deleton resduals can be nterpreted lke z-scores (or more precsely, lke t-scores) 4-8. Karpnsk

19 Detectng extremty on the X values: Leverage values o It can be shown (proof omtted) that the predcted value for the th observaton can be wrtten as a lnear combnaton of the observed Y values Y ˆ = hy + hy hy h n Y n Where h,h,...,h n are known as leverage values or leverage weghts h n o The leverage values are computed by only usng the X value(s). o Because the h s are computed by only usng the X value(s), h measures the role of the X value(s) n determnng how mportant Y s n affectng Y ˆ. j o Thus, leverage values are helpful n dentfyng outlyng X observatons that nfluence Y ˆ o To dentfy large leverage values, we compare h to the average leverage value. The standard rule of thumb s f the h s -3 tmes as large as the th average leverage value, then X observaton(s) for the partcpant should be examned The average leverage value s: h = p n Where p = the number of parameters n the regresson model ( for smple lnear regresson) n = the number of partcpants nd so the rule-of -thumb cutoff value s: h > p n (large samples) or 3p h > (small samples) n o Other common cut-off values nclude h >.5 (large leverage); h <. 5 (moderate leverage) < Look for a large gap n the dstrbuton of h s 4-9. Karpnsk

20 Detectng nfluence on the ftted values: Cook s Dstance (979) o Cook s D s another measure of the nfluence an outlyng observaton has on the regresson coeffcents. It combnes resduals and leverage values nto a sngle number. e h D = p * MSE ( h ) Where: e s the (unstandardzed) resdual for the th observaton p s the number of parameters n the regresson model h s the leverage for the th observaton o D for each observaton depends on two factors: The resdual: Larger resduals lead to larger D s The leverage: Larger leverage values lead to larger D s o The th observato n can be nfluental (have a large D ) by Havng a large e and only a moderate h Havng a moderate e and a large h Havng a large e and a large h o D s consdered to be large (ndcatng an nfluental observaton) f t falls at or above the 5 th percentle of the F-dstrbuton F crt ( α =. 5, dfn, dfe) dfn = # of parameters n the model = p ( for smple lnear regresson) dfe = degrees of freedom for error = N p For example, wth a smple lnear regresson model (p = ) wth 45 observatons (dfe = 45-=43) = F( α =.5, dfn, dfe) = F( α =.5,,43) =.74 D crt In ths case, observatons wth Cook s D values greater than.74 should be nvestgated as possbly beng nfluental 4-. Karpnsk

21 Detectng nfluence slope parameters: DFBETs o DFBET s a senstvty analyss of an ndvdual observatons nfluence on the slope parameter. For each observaton, Y bˆ + bˆ X + ε Y = ( ) = bˆ ( ) + bˆ ( ) X ( ) ft two regresson models: wth all observatons + ε ( ) wth observaton removed BFDET s the studentzed dfference n the slope estmates bˆ bˆ ( ) DFBET = SE bˆ ) ( ( ) o By studentzng the DFBET, they can be nterpreted lke z-scores (or more precsely, lke t-scores). DFBET > crt o When run n SPSS, you wll obtan a DFBET for each parameter n the model (a DFBET for the ntercept and a DFBET for the slope). In general, we would only be nterested n the DFBET for the slope These methods of outlers and nfluence often work well, but can be neffectve at tmes. Ideally, the dfferent procedures would dentfy the same cases, but ths does not always happen. The use of these procedures requres thought and good judgment on the part of the analyst. Once nfluental ponts are dentfed: o Check to make sure there has not been a data codng or data entry error. o Conduct a senstvty analyss to see how much your conclusons would change f the outlyng ponts were dropped. o Never drop data ponts wthout tellng your audence why those observatons were omtted. In general, t s not advsable to drop observatons from your analyss o The presence of many outlers may ndcate an mproper model Perhaps the relatonshp s not lnear Perhaps the outlers are due to a varable omtted from the model 4-. Karpnsk

22 al u esd ed R udentz St.5 #.4.3 #.. #3 # Baselne example: No outlers ncluded REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x /CSEWISE PLOT(SRESID) LL /SVE PRED (pred) SRESID (sresd). o The regresson model Y = X r XY =. 876 R =. 767 o Examnng resduals Casewse Dagnostcs a Case Number Stud. Resdual Y Predcted Value Resdual a. Dependent Varable: Y x Look for Studentzed Resduals larger than Karpnsk

23 o Examnng nfluence statstcs REGRESSION /DEPENDENT y /METHOD=ENTER x /SVE COOK (cook) LEVER (level) SDRESID (sdresd) SDBET. Lst var = ID cook level sdresd sdb_. d cook level sdresd SDB_ o Crtcal values Cook s D: D crt = F( α =.5,,3) =.74 p 4 Leverage: h crt > = =. 6 N 5 ~ Studentzed Deleton Resduals: d crt >. 5 Studentzed DFBET: DFBET crt > o In ths case, we do not dentfy any outlers or nfluental observatons 4-3. Karpnsk

24 Stud entze d Re sdu al Example #: Outler # ncluded REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x /CSEWISE PLOT(SRESID) LL /SVE PRED (pred) SRESID (sresd). o The regresson model Y = X r XY =. 89 R =. 654 (Slope s unchanged) o Examnng resduals Casewse Dagnostcs a Case Number a. Dependent Varable: Y Stud. Predcted Resdual Y Value Resdual x Look for Studentzed Resduals larger than.5 Observaton #6 looks problematc 4-4. Karpnsk

25 o Examnng nfluence statstcs REGRESSION /DEPENDENT y /METHOD=ENTER x /SVE COOK (cook) LEVER (level) SDRESID (sdresd) SDBET. Lst var = ID cook level sdresd sdb_. d cook level sdresd SDB_ o Crtcal values Cook s D: D crt = F( α =.5,,4) =.695 p 4 Leverage: h crt > = =. 54 N 6 ~ Studentzed Deleton Resduals: d crt >. 5 Studentzed DFBET: DFBET crt > o Observaton #6 Has large resdual and deleton resdual Has OK Cook s D, leverage, and DFBET Karpnsk

26 S tud entz ed Re sd ual Example #: Only outler # ncluded REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x /CSEWISE PLOT(SRESID) LL /SVE PRED (pred) SRESID (sresd). o The regresson model Y = X r =. 76 R =. 58 XY o Examnng resduals x Look for Studentzed Resduals larger than.5 Observaton #7 looks problematc 4-6. Karpnsk

27 o Examnng nfluence statstcs REGRESSION /DEPENDENT y /METHOD=ENTER x /SVE COOK (cook) LEVER (level) SDRESID (sdresd) SDBET. Lst var = ID cook level sdresd sdb_. d cook level sdresd SDB_ o Crtcal values Cook s D: D crt = F( α =.5,,4) =.695 p 4 Leverage: h crt > = =. 54 N 6 ~ Studentzed Deleton Resduals: d crt >. 5 Studentzed DFBET: DFBET crt > o Observaton #7 Has large resdual, deleton resdual, Cook s D, leverage, and DFBET Karpnsk

28 Stu de nt ze d R es du al Example #3: Only outler #3 ncluded REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x /CSEWISE PLOT(SRESID) LL /SVE PRED (pred) SRESID (sresd). o The regresson model Y = X r XY =. 9 R =. 8 (No change n slope or ntercept) o Examnng resduals x Look for Studentzed Resduals larger than.5 ll observatons are OK 4-8. Karpnsk

29 o Examnng nfluence statstcs REGRESSION /DEPENDENT y /METHOD=ENTER x /SVE COOK (cook) LEVER (level) SDRESID (sdresd) SDBET. Lst var = ID cook level sdresd. d cook level sdresd SDB_ o Crtcal values Cook s D: D crt = F( α =.5,,4) =.695 p 4 Leverage: h crt > = =. 54 N 6 ~ Studentzed Deleton Resduals: d crt >. 5 Studentzed DFBET: DFBET > o Observaton #8 Has large leverage Has OK resdual, deleton resdual, Cook s D, and DFBET crt 4-9. Karpnsk

30 Stu de nt ze d R es du al Example #4: Only outler #4 ncluded REGRESSION /STTISTICS COEFF OUTS R NOV ZPP /DEPENDENT y /METHOD=ENTER x /CSEWISE PLOT(SRESID) LL /SVE PRED (pred) SRESID (sresd). o The regresson model Y =. 4 +.X r =. 4 R =. 6 o Examnng resduals XY x Look for Studentzed Resduals larger than.5 Observaton #9 s clearly problematc 4-3. Karpnsk

31 o Examnng nfluence statstcs REGRESSION /DEPENDENT y /METHOD=ENTER x /SVE COOK (cook) LEVER (level) SDRESID (sdresd) SDBET. Lst var = ID cook level sdresd sdb_. d cook level sdresd SDB_ o Crtcal values Cook s D: D crt = F( α =.5,,4) =.695 p 4 Leverage: h crt > = =. 54 N 6 ~ Studentzed Deleton Resduals: d crt >.5 Studentzed DFBET: DFBET > o Observaton #9 Has large resdual, deleton resdual, Cook s D, leverage, and DFBET crt 4-3. Karpnsk

32 Summary and comparson:.5 #.4.3 #.. #3 # Obs Problematc? Regresson Equaton r XY e~ d D h DFB Baselne Y = X rxy =. 876 No No No No No # Y = X rxy =. 89 Yes Yes No No No # Y = X rxy =. 76 Yes Yes Yes Yes Yes #3 Y = X rxy =. 9 No No No Yes No #4 Y = X r =. 43 Yes Yes Yes Yes Yes XY o Usng a combnaton of all the methods, we (properly) dentfy outlers # and #4 as problematc. Outler # may or may not be problematc, dependng on our purposes Karpnsk

33 6. Remedal Measures: n overvew of alternatve regresson models When regresson assumptons are volated, you have two optons o Explore transformatons of X or Y so that the smple lnear regresson model can be used approprately o bandon the smple lnear regresson model and use a more approprate model. More complex regresson models are beyond the scope of ths course, but let me hghlght some possble alternatve models that could be explored. Polynomal regresson o When the regresson functon s not lnear, a model that has non-lnear terms may better ft the data. Y = β + β X + β + ε X Y 3 = β + β X + β X + β X β k X 3 k + ε o In these models, we are fttng/estmatng non-lnear regresson lnes that correspond wth polynomal curves. Ths approach s very s mlar to the trend contrasts that we conducted n NOV except: For polynomal regresson, the predctor varable (IV) s contnuous; n NOV t s categorcal. In polynomal regresson, we obtan the actual equaton of the (polynomal) lne that best fts the data. Weghted least squares regresson o The regresson queston we have been usng s known as ordnary least squares (OLS) regresson. In the OLS framework, we solve for the model parameters by mnmzng the squared resduals (squared devatons from the predcted lne) Karpnsk

34 SSE ( ˆ = e = Y Y ) = ( Y b b X o When we mnmze the resduals, solve for the parameters, and compute p-values, we need the resduals to have equal varance across the range of X values. If ths equal varance assumpton s volated, then the OLS regresson parameters wll be based. o In OLS regresson, each observaton s treated equally. But f some observatons are more precse than others (.e., they have smaller varance), t make sense that they should be gven more weght than the less precse values. o In weghted least squares regresson, each observaton s weghted by the nverse of ts varance ) w = σ SSE ( ˆ = we = w Y Y ) = w ( Y b b X Observatons wth a large varance are gven a small weght; observatons wth a small varance are gven a bg weght. o Issues wth weghted least squares regresson We do not know the varances of the resduals; ths value must be estmated. The process of estmatng these varances s not trval partcularly n small datasets. R s unnterpretable for weghted least squares regresson (but that does not stop most programs from prntng t out!). ) Karpnsk

35 Robust regresson o When assumptons are volated and/or outlers are present, OLS regresson parameters may be based. Robust regresson s a seres of approaches that computes estmates of regresson parameters usng technques that are robust to volatons of OLS assumptons o Least absolute resdual (LR) regresson estmates regresson parameters by mnmzng the sum of the absolute devatons of the Y observatons from the regresson lne: Y Yˆ = Y b b X e = Ths approach reduces the nfluence of outlers o Least medan of squares (LMS) regresson estmates regresson parameters by mnmzng the medan of the squared devatons of the Y observatons from the regresson lne: SSE = Medan( e ) = medan( Y Yˆ) = medan( Y b b X ) Ths approach also reduces the nfluence of outlers o Iteratve reweghed least squares (IRLS) regresson s a form of weghted least squares regresson where the weghts for each observaton are M- estmators of the resduals (Huber and Tukey-Bsquare estmators are the most commonly used M-estmators). Ths approach reduces the nfluence of outlers o Dsadvantages of these approaches nclude: They are not commonly ncluded n statstcal software They are robust to outlers, but they requre that other regresson assumptons be satsfed. They are not commonly used n psychology and, thus, psychologsts may regard these methods skeptcally Karpnsk

36 Non-parametrc regresson o Non-parametrc regresson technques do not estmate model parameters. For a regresson analyss, we are usually nterested n estmatng a slope, thus non-parametrc methods are of lmted utlty from an nferental perspectve. o In general, non-parametrc technques can be used to explore the shape of the regresson lne. These methods provde a smoothed curve that fts the data, but do not provde an equaton for ths lne or allow for nference on ths lne. o Prevously, we examned the followng data and concluded that the relatonshp between X and Y was non-lnear (see p. 5-8)..8.6 y x Let s examne ths data wth non-parametrc technques to explore the relatonshp between X and Y. Method of movng averages o The method of movng averages can be used to obtan a smoothed curve that fts the data. o To use ths method, you must specfy a wndow (w) the number of observatons you wll average across. Frst, average the Y values assocated wth the frst w responses (the w smallest X values), and plot that pont. Next, you dscard the frst value, add the next pont along the X axs, compute the average of ths set of w Y values, and plot that pont. Contnue movng the down the X axs, untl you have used all the ponts. Then draw a lne to connect the smoothed average ponts Karpnsk

37 o For example, f w = 3, then you take the three smallest X values, average the Y values assocated wth these ponts, and plot that pont. Next, take the nd, 3 rd, and 4 th smallest X values and repeat the process... o n example of the method of movng averages, comparng dfferent w values: Smoothng nterval = 5 Smoothng nterval = Y.5 Y X X Smoothng nterval = 5 Smoothng nterval = Y.5 Y X X If the wndow s too small, t does not smooth the data enough; f the wndow s too large, t can smooth too much and you lose the shape of the data. In ths case, w = 5 looks about rght Karpnsk

38 o Issues regardng the method of movng averages: verages are affected by outlers, so t can be preferable to use the method of movng medans The method of movng averages s partcularly useful for tme-seres data If the data are unequally spaced and/or have gaps along the X axs, the method of movng averages can provde some wacky results. In EXCEL, you can use the method of movng averages and you can specfy w. Loess smoothng o Loess stands for locally weghted scatterplot smoothng (the w got dropped somewhere along the way). o Loess smoothng s a more sophstcated method of smoothng than the method of movng averages. In each neghborhood, a regresson lne s estmated and the ftted lne s used for the smoothed lne Ths regresson s weghted to gve cases further from the mddle X value less weght. Ths process of fttng a lnear regresson lne s repeated n each neghborhood so that observatons wth large resduals n the prevous teraton receve less weght. Loess Smoothng..8.6 y x Karpnsk

39 o One partcularly useful applcaton of loess smoothng s to confrm a ftted regresson functon Ft a regresson functon and graph 95% confdence bands for the ftted lne Ft a loess smoothed curve through the data. If the loess curve stays wthn the confdence bands, the ft of the regresson lne s good. If the loess curve strays from the confdence bands, the ft of the regresson lne s not good y y x x Good Ft Poor Ft 7. Remedal Measures: Transformatons If the data do not satsfy the regresson assumptons, a transformaton appled to ether the X varable or to the Y varable may make the smple lnear regresson model approprate for the transformed data. General rules of thumb: o Transformatons on X Can be used to lnearze a non-lnear relatonshp o Transformatons on Y Can be used to fx problems of nonnormalty and unequal error varances Once normalty and homoscedastcty are acheved, t may be necessary to transform X to acheve lnearty Karpnsk

40 Prototypcal patterns and transformatons of X Non Lnear Relatonshp # o Try X = ln(x) or X = X Ln Transformaton Non Lnear Relatonshp # o Try X = X or X = exp(x) Non Lnear Relatonshp #3 X Transformaton /X Transformaton ( ) o Try X =/X or X = exp X Karpnsk

41 Transformng Y o If the data are non-normal and/or heteroscedastctc, a transformaton on Y may be useful o It can be very dffcult to determne the most approprate transformaton n Y to fx the data o One popular class of transformatons s the famly of power transformatons Y = Y λ λ = Y = Y λ = Y = Y λ =.5 Y = Y λ = Y = ln(y) by defnton λ =.5 Y = Y λ = λ = Y = Y Y = Y To determne a λ that works: Guess (tral and error) Use the Box-Cox procedure (unfortunately not mplemented n SPSS) Warnngs and cauton s about transformatons o Do not transform the data because of a small number of outlers o fter transformng the data, recheck the ft of the regresson model usng resdual analyss o Once the data are transformed, and a regresson run on the transformed data, b and b apply to the transformed data and not to the orgnal data/scale o For psychologcal data, f the orgnal data are not lnear, but the transformed data are, t can often be very dffcult to nterpret the results 4-4. Karpnsk

42 Transformatons: n example o Let s examne the relatonshp between the number of credts taken n a mnor and nterest n takng further coursework n that dscplne. o unversty collects data on students X = Number of credts completed n the mnor Y = Interest n takng another course n the mnor dscplne o Frst, let s plot the data OLS Regresson Lne 5. Interest n further coursework Loess Curve 5 Number of credts Ths relatonshp looks non-lnear. We can try a square root or a log transformaton to acheve lnearty Interest n furt her course work. 5.. ework Inte rest n further cours sqrtx o The log transformaton appears to work, so we should check the remanng assumptons lnx Karpnsk

43 REGRESSION /DEPENDENT ybr /METHOD=ENTER lnx /RESIDULS HIST(SRESID) NORM(SRESID) /SVE RESID (resd) ZRESID (zresd) SRESID (sresd) pred (pred). Hstogram Normal P-P Plot of Regresson Studentzed Resdual Dependent Varable: Interest n further coursework Dependent Varable: Interest n further coursework. 4.8 Frequency Expected Cum Prob Regresson Studentzed Resdual 3 Mean = -.3E-4 Std. Dev. =.5 N = Observed Cum Prob.8. The resduals appear to be normally dstrbuted 4.. Student zed R esdual Unstandardzed Predcted Value 5. We mght worry about an outler, but homoscedastcty seems ok Karpnsk

44 o Now, we can analyze the ln-transformed data. Model Summary b Model djusted Std. Error of R R Square R Square the Estmate.83 a a. Predctors: (Constant), lnx b. Dependent Varable: Interest n further coursework Model (Constant) lnx Unstandardzed Coeffcents Coeffcents a Standardzed Coeffcents B Std. Error Beta t Sg a. Dependent Varable: Interest n further coursework There s a strong lnear relatonshp between ln of credts taken and nterest n takng addtonal courses n the dscplne, β.8, t(98) = 3.83, p <., R =.66 = djusted o But n ths case, the non-lnear relatonshp s nterestng (and nterpretable). We would be better off wth an approach where we could model the non-lnearty than wth ths approach where we try to transform to lnearty. o In ths case, polynomal regresson may be very useful. o Note that f we had not graphed or explored our data, we would have mssed the non-lnear relatonshp altogether! Model Summary Model R R Square djusted R Square Std. Error of the Estmate.749 a a. Predctors: (Constant), Number of credts Model (Constant) Number of credts Coeffcents a Unstandardzed Coeffcents a. Dependent Varable: Interest n further coursework Standardzed Coeffcents B Std. Error Beta t Sg Karpnsk

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Economic Interpretation of Regression. Theory and Applications

Economic Interpretation of Regression. Theory and Applications Economc Interpretaton of Regresson Theor and Applcatons Classcal and Baesan Econometrc Methods Applcaton of mathematcal statstcs to economc data for emprcal support Economc theor postulates a qualtatve

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on

More information

Although ordinary least-squares (OLS) regression

Although ordinary least-squares (OLS) regression egresson through the Orgn Blackwell Oxford, TEST 0141-98X 003 5 31000 Orgnal Joseph Teachng G. UK Artcle Publshng Esenhauer through Statstcs the Ltd Trust Orgn 001 KEYWODS: Teachng; egresson; Analyss of

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Fuzzy Regression and the Term Structure of Interest Rates Revisited

Fuzzy Regression and the Term Structure of Interest Rates Revisited Fuzzy Regresson and the Term Structure of Interest Rates Revsted Arnold F. Shapro Penn State Unversty Smeal College of Busness, Unversty Park, PA 68, USA Phone: -84-865-396, Fax: -84-865-684, E-mal: afs@psu.edu

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc. Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

ESTIMATING THE MARKET VALUE OF FRANKING CREDITS: EMPIRICAL EVIDENCE FROM AUSTRALIA

ESTIMATING THE MARKET VALUE OF FRANKING CREDITS: EMPIRICAL EVIDENCE FROM AUSTRALIA ESTIMATING THE MARKET VALUE OF FRANKING CREDITS: EMPIRICAL EVIDENCE FROM AUSTRALIA Duc Vo Beauden Gellard Stefan Mero Economc Regulaton Authorty 469 Wellngton Street, Perth, WA 6000, Australa Phone: (08)

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Meta-Analysis of Hazard Ratios

Meta-Analysis of Hazard Ratios NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These

More information

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral Part The Calbraton Curve, Correlaton Coeffcent and Confdence Lmts CHM314 Instrumental Analyss Department of Chemstry, Unversty of Toronto Dr.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

The Mathematical Derivation of Least Squares

The Mathematical Derivation of Least Squares Pscholog 885 Prof. Federco The Mathematcal Dervaton of Least Squares Back when the powers that e forced ou to learn matr algera and calculus, I et ou all asked ourself the age-old queston: When the hell

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

Question 2: What is the variance and standard deviation of a dataset?

Question 2: What is the variance and standard deviation of a dataset? Queston 2: What s the varance and standard devaton of a dataset? The varance of the data uses all of the data to compute a measure of the spread n the data. The varance may be computed for a sample of

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

14.74 Lecture 5: Health (2)

14.74 Lecture 5: Health (2) 14.74 Lecture 5: Health (2) Esther Duflo February 17, 2004 1 Possble Interventons Last tme we dscussed possble nterventons. Let s take one: provdng ron supplements to people, for example. From the data,

More information

Evaluating the generalizability of an RCT using electronic health records data

Evaluating the generalizability of an RCT using electronic health records data Evaluatng the generalzablty of an RCT usng electronc health records data 3 nterestng questons Is our RCT representatve? How can we generalze RCT results? Can we use EHR* data as a control group? *) Electronc

More information

Portfolio Loss Distribution

Portfolio Loss Distribution Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2016. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

More information

Faraday's Law of Induction

Faraday's Law of Induction Introducton Faraday's Law o Inducton In ths lab, you wll study Faraday's Law o nducton usng a wand wth col whch swngs through a magnetc eld. You wll also examne converson o mechanc energy nto electrc energy

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Gender differences in revealed risk taking: evidence from mutual fund investors

Gender differences in revealed risk taking: evidence from mutual fund investors Economcs Letters 76 (2002) 151 158 www.elsever.com/ locate/ econbase Gender dfferences n revealed rsk takng: evdence from mutual fund nvestors a b c, * Peggy D. Dwyer, James H. Glkeson, John A. Lst a Unversty

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Traditional versus Online Courses, Efforts, and Learning Performance

Traditional versus Online Courses, Efforts, and Learning Performance Tradtonal versus Onlne Courses, Efforts, and Learnng Performance Kuang-Cheng Tseng, Department of Internatonal Trade, Chung-Yuan Chrstan Unversty, Tawan Shan-Yng Chu, Department of Internatonal Trade,

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

International University of Japan Public Management & Policy Analysis Program

International University of Japan Public Management & Policy Analysis Program Internatonal Unversty of Japan Publc Management & Polcy Analyss Program Practcal Gudes To Panel Data Modelng: A Step by Step Analyss Usng Stata * Hun Myoung Park, Ph.D. kucc65@uj.ac.jp 1. Introducton.

More information

total A A reag total A A r eag

total A A reag total A A r eag hapter 5 Standardzng nalytcal Methods hapter Overvew 5 nalytcal Standards 5B albratng the Sgnal (S total ) 5 Determnng the Senstvty (k ) 5D Lnear Regresson and albraton urves 5E ompensatng for the Reagent

More information

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC Approxmatng Cross-valdatory Predctve Evaluaton n Bayesan Latent Varables Models wth Integrated IS and WAIC Longha L Department of Mathematcs and Statstcs Unversty of Saskatchewan Saskatoon, SK, CANADA

More information

Stress test for measuring insurance risks in non-life insurance

Stress test for measuring insurance risks in non-life insurance PROMEMORIA Datum June 01 Fnansnspektonen Författare Bengt von Bahr, Younes Elonq and Erk Elvers Stress test for measurng nsurance rsks n non-lfe nsurance Summary Ths memo descrbes stress testng of nsurance

More information

An Analysis of the relationship between WTI term structure and oil market fundamentals in 2002-2009

An Analysis of the relationship between WTI term structure and oil market fundamentals in 2002-2009 MPRA Munch Personal RePEc Archve An Analyss of the relatonshp between WTI term structure and ol market fundamentals n 00-009 Mleno Cavalcante Petrobras S.A., Unversdade de Fortaleza. August 00 Onlne at

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

Series Solutions of ODEs 2 the Frobenius method. The basic idea of the Frobenius method is to look for solutions of the form 3

Series Solutions of ODEs 2 the Frobenius method. The basic idea of the Frobenius method is to look for solutions of the form 3 Royal Holloway Unversty of London Department of Physs Seres Solutons of ODEs the Frobenus method Introduton to the Methodology The smple seres expanson method works for dfferental equatons whose solutons

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Criminal Justice System on Crime *

Criminal Justice System on Crime * On the Impact of the NSW Crmnal Justce System on Crme * Dr Vasls Sarafds, Dscplne of Operatons Management and Econometrcs Unversty of Sydney * Ths presentaton s based on jont work wth Rchard Kelaher 1

More information

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C Whte Emerson Process Management Abstract Energy prces have exhbted sgnfcant volatlty n recent years. For example, natural gas prces

More information

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Estimation of Dispersion Parameters in GLMs with and without Random Effects Mathematcal Statstcs Stockholm Unversty Estmaton of Dsperson Parameters n GLMs wth and wthout Random Effects Meng Ruoyan Examensarbete 2004:5 Postal address: Mathematcal Statstcs Dept. of Mathematcs Stockholm

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines. A statstcal approach to determne Mcrobologcally Influenced Corroson (MIC) Rates of underground gas ppelnes. by Lech A. Grzelak A thess submtted to the Delft Unversty of Technology n conformty wth the requrements

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

A Practitioner's Guide to Generalized Linear Models

A Practitioner's Guide to Generalized Linear Models A Practtoner's Gude to Generalzed Lnear Models A CAS Study Note Duncan Anderson, FIA Sholom Feldblum, FCAS Claudne Modln, FCAS Dors Schrmacher, FCAS Ernesto Schrmacher, ASA Neeza Thand, FCAS Thrd Edton

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8 Statstcs Rudolf N. Cardnal Graduate-level statstcs for psychology and neuroscence NOV n practce, and complex NOV desgns Verson of May 4 Part : quck summary 5. Overvew of ths document 5. Background knowledge

More information

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY: Federco Podestà RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY: THE CASE OF POOLED TIME SERIES CROSS-SECTION ANALYSIS DSS PAPERS SOC 3-02 INDICE 1. Advantages and Dsadvantages of Pooled Analyss...

More information

World currency options market efficiency

World currency options market efficiency Arful Hoque (Australa) World optons market effcency Abstract The World Currency Optons (WCO) maket began tradng n July 2007 on the Phladelpha Stock Exchange (PHLX) wth the new features. These optons are

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Analysis of Demand for Broadcastingng servces

Analysis of Demand for Broadcastingng servces Analyss of Subscrpton Demand for Pay-TV Manabu Shshkura * Norhro Kasuga ** Ako Tor *** Abstract In ths paper, we wll conduct an analyss from an emprcal perspectve concernng broadcastng demand behavor and

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

General Iteration Algorithm for Classification Ratemaking

General Iteration Algorithm for Classification Ratemaking General Iteraton Algorthm for Classfcaton Ratemakng by Luyang Fu and Cheng-sheng eter Wu ABSTRACT In ths study, we propose a flexble and comprehensve teraton algorthm called general teraton algorthm (GIA)

More information

ENVIRONMENTAL MONITORING Vol. II - Statistical Analysis and Quality Assurance of Monitoring Data - Iris Yeung

ENVIRONMENTAL MONITORING Vol. II - Statistical Analysis and Quality Assurance of Monitoring Data - Iris Yeung STATISTICAL ANALYSIS AND QUALITY ASSURANCE OF MONITORING DATA Irs Yeung Cty Unversty of Hong Kong, Kowloon, Hong Kong Keywords: AIC, ARIMA model, BIC, cluster analyss, dscrmnant analyss, factor analyss,

More information

A machine vision approach for detecting and inspecting circular parts

A machine vision approach for detecting and inspecting circular parts A machne vson approach for detectng and nspectng crcular parts Du-Mng Tsa Machne Vson Lab. Department of Industral Engneerng and Management Yuan-Ze Unversty, Chung-L, Tawan, R.O.C. E-mal: edmtsa@saturn.yzu.edu.tw

More information

Hedging Interest-Rate Risk with Duration

Hedging Interest-Rate Risk with Duration FIXED-INCOME SECURITIES Chapter 5 Hedgng Interest-Rate Rsk wth Duraton Outlne Prcng and Hedgng Prcng certan cash-flows Interest rate rsk Hedgng prncples Duraton-Based Hedgng Technques Defnton of duraton

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Media Mix Modeling vs. ANCOVA. An Analytical Debate

Media Mix Modeling vs. ANCOVA. An Analytical Debate Meda M Modelng vs. ANCOVA An Analytcal Debate What s the best way to measure ncremental sales, or lft, generated from marketng nvestment dollars? 2 Measurng ROI From Promotonal Spend Where possble to mplement,

More information

This study examines whether the framing mode (narrow versus broad) influences the stock investment decisions

This study examines whether the framing mode (narrow versus broad) influences the stock investment decisions MANAGEMENT SCIENCE Vol. 54, No. 6, June 2008, pp. 1052 1064 ssn 0025-1909 essn 1526-5501 08 5406 1052 nforms do 10.1287/mnsc.1070.0845 2008 INFORMS How Do Decson Frames Influence the Stock Investment Choces

More information