A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets
|
|
- Ruby Snow
- 7 years ago
- Views:
Transcription
1 A Review ad Compariso of Methods for Detectig Outliers i Uivariate Data Sets by Sogwo Seo BS, Kyughee Uiversity, Submitted to the Graduate Faculty of Graduate School of Public Health i partial fulfillmet of the requiremets for the degree of Master of Sciece Uiversity of Pittsburgh 6
2 UNIVERSITY OF PITTSBURGH Graduate School of Public Health This thesis was preseted by Sogwo Seo It was defeded o April 6, 6 ad approved by: Laura Cassidy, Ph D Assistat Professor Departmet of Biostatistics Graduate School of Public Health Uiversity of Pittsburgh Ravi K. Sharma, Ph D Assistat Professor Departmet of Behavioral ad Commuity Health Scieces Graduate School of Public Health Uiversity of Pittsburgh Thesis Director: Gary M. Marsh, Ph D Professor Departmet of Biostatistics Graduate School of Public Health Uiversity of Pittsburgh ii
3 Gary M. Marsh, Ph D A Review ad Compariso of Methods for Detectig Outliers i Uivariate Data Sets Sogwo Seo, M.S. Uiversity of Pittsburgh, 6 Most real-world data sets cotai outliers that have uusually large or small values whe compared with others i the data set. Outliers may cause a egative effect o data aalyses, such as ANOVA ad regressio, based o distributio assumptios, or may provide useful iformatio about data whe we look ito a uusual respose to a give study. Thus, outlier detectio is a importat part of data aalysis i the above two cases. Several outlier labelig methods have bee developed. Some methods are sesitive to extreme values, like the SD method, ad others are resistat to extreme values, like Tukey s method. Although these methods are quite powerful with large ormal data, it may be problematic to apply them to oormal data or small sample sizes without kowledge of their characteristics i these circumstaces. This is because each labelig method has differet measures to detect outliers, ad expected outlier percetages chage differetly accordig to the sample size or distributio type of the data. May kids of data regardig public health are ofte skewed, usually to the right, ad logormal distributios ca ofte be applied to such skewed data, for istace, surgical procedure times, blood pressure, ad assessmet of toxic compouds i evirometal aalysis. This paper reviews ad compares several commo ad less commo outlier labelig methods ad presets iformatio that shows how the percet of outliers chages i each method accordig to the skewess ad sample size of logormal distributios through simulatios ad applicatio to real data sets. These results may help establish guidelies for the choice of outlier detectio methods i skewed data, which are ofte see i the public health field. iii
4 TABLE OF CONTENTS 1. INTRODUCTION BACKGROUND OUTLIER DETECTION METHOD STATEMENT OF PROBLEM OUTLIER LABELING METHOD STANDARD DEVIATION (SD) METHOD Z-SCORE THE MODIFIED Z-SCORE TUKEY S METHOD (BOXPLOT) ADJUSTED BOXPLOT MAD E METHOD MEDIAN RULE SIMULATION STUDY AND RESULTS FOR THE FIVE SELECTED LABELING METHODS APPLICATION RECOMMENDATIONS DISCUSSION AND CONCLUSIONS APPENDIX A... 4 THE EXPECTATION, STANDARD DEVIATION AND SKEWNESS OF A LOGNORMAL DISTRIBUTION.4 APPENDIX B... 4 MAXIMUM Z SCORE.4 APPENDIX C CLASSICAL AND MEDCOUPLE (MC) SKEWNESS..44 iv
5 APPENDIX D BREAKDOWN POINT.47 APPENDIX E PROGRAM CODE FOR OUTLIER LABELING METHODS...48 BIBLIOGRAPHY v
6 LIST OF TABLES Table 1: Basic Statistic of a Simple Data Set... Table : Basic Statistic After Chagig 7 ito 77 i the Simple Data Set... Table 3: Computatio ad Maskig Problem of the Z-Score Table 4: Computatio of Modified Z-Score ad its Compariso with the Z-Score... 1 Table 5: The Average Percetage of Left Outliers, Right Outliers ad the Average Total Percet of Outliers for the Logormal Distributios with the Same Mea ad Differet Variaces (mea=, variace=.,.4,.6,.8, 1. ) ad the Stadard Normal Distributio with Differet Sample Sizes Table 6: Iterval, Left, Right, ad Total Number of Outliers Accordig to the Five Outlier Methods vi
7 LIST OF FIGURES Figure 1: Probability desity fuctio for a ormal distributio accordig to the stadard deviatio... 5 Figure : Theoretical Chage of Outliers Percetage Accordig to the Skewess of the Logormal Distributios i the SD Method ad Tukey s Method... 7 Figure 3: Desity Plot ad Dotplot of the Logormal Distributio (sample size=5) with Mea=1 ad SD=1, ad its Logarithm, Y=log(x)... 8 Figure 4: Boxplot for the Example Data Set Figure 5: Boxplot ad Dotplot. (Note: No outlier show i the boxplot) Figure 6: Chage of theiitervals of Two Differet Boxplot Methods Figure 7: Stadard Normal Distributio ad Logormal Distributios... Figure 8: Chage i the Outlier Percetages Accordig to the Skewess of the Data... Figure 9: Chage i the Total Percetages of Outliers Accordig to the Sample Size... 5 Figure 1: Histogram ad Basic Statistics of Case 1-Case Figure 11: Flowchart of Outlier Labelig Methods Figure 1: Chage of the Two Types of Skewess Coefficiets Accordig to the Sample Size ad Data Distributio. (Note: This results came from the previous simulatio. All the values are i Table 5 ) vii
8 1. INTRODUCTION This chapter cosists of two sectios: the Backgroud ad Outlier Detectio Method. I the Backgroud, basic ideas of a outlier are discussed such as defiitios, features, ad reasos to detect outliers. I the Outlier Detectio Method sectio, characteristics of the two kids of outlier detectio methods are described briefly: formal ad iformal tests. 1.1 BACKGROUND Observed variables ofte cotai outliers that have uusually large or small values whe compared with others i a data set. Some data sets may come from homogeeous groups; others from heterogeeous groups that have differet characteristics regardig a specific variable, such as height data ot stratified by geder. Outliers ca be caused by icorrect measuremets, icludig data etry errors, or by comig from a differet populatio tha the rest of the data. If the measuremet is correct, it represets a rare evet. Two aspects of a outlier ca be cosidered. The first aspect to ote is that outliers cause a egative effect o data aalysis. Osbome ad Overbay (4) briefly categorized the deleterious effects of outliers o statistical aalyses: 1) Outliers geerally serve to icrease error variace ad reduce the power of statistical tests. ) If o-radomly distributed, they ca decrease ormality (ad i multivariate aalyses, violate assumptios of sphericity ad multivariate ormality), alterig the odds of makig both Type I ad Type II errors. 3) They ca seriously bias or ifluece estimates that may be of substative iterest. The followig example simply shows how oe outlier ca highly distort the mea, variace, ad 95% cofidece iterval for the mea. Let s suppose there is a simple data set composed of data poits 1,, 3, 4, 5, 6, 7 ad its basic statistics are as show i Table 1. Now, 1
9 let s replace data poit 7 with 77. As show i Table, the mea ad variace of the data are much larger tha that of the origial data set due to oe uusual data value, 77. The 95% cofidece iterval for the mea is also much broader because of the large variace. It may cause potetial problems whe data aalysis that is sesitive to a mea or variace is coducted. Table 1: Basic Statistic of a Simple Data Set Mea Media Variace 95 % Cofidece Iterval for the mea [. to 6.] Table : Basic Statistic After Chagig 7 ito 77 i the Simple Data Set Mea Media Variace 95 % Cofidece Iterval for the mea [ to 39.74] The secod aspect of outliers is that they ca provide useful iformatio about data whe we look ito a uusual respose to a give study. They could be the extreme values sittig apart from the majority of the data regardless of distributio assumptios. The followig two cases are good examples of outlier aalysis i terms of the secod aspect of a outlier: 1) to idetify medical practitioers who uder- or over-utilize specific procedures or medical equipmet, such as a x-ray istrumet; ) to idetify Primary Care Physicias (PCPs) with iordiately high Member Dissatisfactio Rates (MDRs) (MDRs = the umber of member complaits / PCP practice size) compared to other PCPs. 3 I summary, there are two reasos for detectig outliers. The first reaso is to fid outliers which ifluece assumptios of a statistical test, for example, outliers violatig the ormal distributio assumptio i a ANOVA test, ad deal with them properly i order to improve statistical aalysis. This could be cosidered as a prelimiary step for data aalysis. The secod reaso is to use the outliers themselves for the purpose of obtaiig certai critical iformatio about the data as was show i the above examples.
10 1. OUTLIER DETECTION METHOD There are two kids of outlier detectio methods: formal tests ad iformal tests. Formal ad iformal tests are usually called tests of discordacy ad outlier labelig methods, respectively. Most formal tests eed test statistics for hypothesis testig. They are usually based o assumig some well-behavig distributio, ad test if the target extreme value is a outlier of the distributio, i.e., weather or ot it deviates from the assumed distributio. Some tests are for a sigle outlier ad others for multiple outliers. Selectio of these tests maily depeds o umbers ad type of target outliers, ad type of data distributio. 1 May various tests accordig to the choice of distributios are discussed i Barett ad Lewis (1994) ad Iglewicz ad Hoagli (1993). Iglewicz ad Hoagli (1993) reviewed ad compared five selected formal tests which are applicable to the ormal distributio, such as the Geeralized ESD, Kurtosis statistics, Shapiro-Wilk, the Boxplot rule, ad the Dixo test, through simulatios. Eve though formal tests are quite powerful uder well-behavig statistical assumptios such as a distributio assumptio, most distributios of real-world data may be ukow or may ot follow specific distributios such as the ormal, gamma, or expoetial. Aother limitatio is that they are susceptible to maskig or swampig problems. Acua ad Rodriguez (4) defie these problems as follows: Maskig effect: It is said that oe outlier masks a secod outlier if the secod outlier ca be cosidered as a outlier oly by itself, but ot i the presece of the first outlier. Thus, after the deletio of the first outlier the secod istace is emerged as a outlier. Swampig effect: It is said that oe outlier swamps a secod observatio if the latter ca be cosidered as a outlier oly uder the presece of the first oe. I other words, after the deletio of the first outlier the secod observatio becomes a o-outlyig observatio. May studies regardig these problems have bee coducted by Barett ad Lewis (1994), Iglewicz ad Hoagli (1993), Davies ad Gather (1993), ad Bedre ad Kale (1987). O the other had, most outlier labelig methods, iformal tests, geerate a iterval or criterio for outlier detectio istead of hypothesis testig, ad ay observatios beyod the iterval or criterio is cosidered as a outlier. Various locatio ad scale parameters are mostly employed i each labelig method to defie a reasoable iterval or criterio for outlier detectio. There are two reasos for usig a outlier labelig method. Oe is to fid possible outliers as a screeig device before coductig a formal test. The other is to fid the extreme values away 3
11 from the majority of the data regardless of the distributio. While the formal tests usually require test statistics based o the distributio assumptios ad a hypothesis to determie if the target extreme value is a true outlier of the distributio, most outlier labelig methods preset the iterval usig the locatio ad scale parameters of the data. Although the labelig method is usually simple to use, some observatios outside the iterval may tur out to be falsely idetified outliers after a formal test whe the outliers are defied as oly observatios that deviate from the assumig distributio. However, if the purpose of the outlier detectio is ot a prelimiary step to fid the extreme values violatig the distributio assumptios of the mai statistical aalyses such as the t-test, ANOVA, ad regressio, but maily to fid the extreme values away from the majority of the data regardless of the distributio, the outlier labelig methods may be applicable. I additio, for a large data set that is statistically problematic, e.g., whe it is difficult to idetify the distributio of the data or trasform it ito a proper distributio such as the ormal distributio, labelig methods ca be used to detect outliers. This paper focuses o outlier labelig methods. Chapter presets the possible problems whe labelig methods are applied to skewed data. I Chapter 3, seve outlier labelig methods are outlied. I Chapter 4, the average percetages of outliers i the stadard ormal ad log ormal distributios with the same mea ad differet variaces is computed to compare the outlier percetage of the selected five outlier labelig methods accordig to the degree of the skewess ad differet sample sizes. I Chapter 5, the five selected methods are applied to real data sets. 4
12 . STATEMENT OF PROBLEM Outlier-labelig methods such as the Stadard Deviatio (SD) ad the boxplot are commoly used ad are easy to use. These methods are quite reasoable whe the data distributio is symmetric ad moud-shaped such as the ormal distributio. Figure 1 shows that about 68%, 95%, ad 99.7% of the data from a ormal distributio are withi 1,, ad 3 stadard deviatios of the mea, respectively. If data follows a ormal distributio, this helps to estimate the likelihood of havig extreme values i the data 3, so that the observatio two or three stadard deviatios away from the mea may be cosidered as a outlier i the data. Figure 1: Probability desity fuctio for a ormal distributio accordig to the stadard deviatio. The boxplot which was developed by Tukey (1977) is aother very helpful method sice it makes o distributioal assumptios or does it deped o a mea or stadard deviatio. 19 The lower quartile (q1) is the 5th percetile, ad the upper quartile (q3) is the 75th percetile of the data. The iter-quartile rage (IQR) is defied as the iterval betwee q1 ad q3. 5
13 Tukey (1997) defied q1-(1.5*iqr) ad q3+(1.5*iqr) as ier feces, q1-(3*iqr) ad q3+(3*iqr) as outer feces, the observatios betwee a ier fece ad its earby outer fece as outside, ad aythig beyod outer feces as far out. 31 High () reamed the outside potetial outliers ad the far out problematic outliers. 19 The outside ad far out observatios ca also be called possible outliers ad probable outliers, respectively. This method is quite effective, especially whe workig with large cotiuous data sets that are ot highly skewed. 19 Although Tukey s method is quite effective whe workig with large data sets that are fairly ormally distributed, may distributios of real-world data do ot follow a ormal distributio. They are ofte highly skewed, usually to the right, ad i such cases the distributios are frequetly closer to a logormal distributio tha a ormal oe. 1 The logormal distributio ca ofte be applied to such data i a variety of forms, for istace, persoal icome, blood pressure, ad assessmet of toxic compouds i evirometal aalysis. I order to illustrate how the theoretical percetage of outliers chages accordig to the skewess of the data i the SD method (Mea ± SD, Mea ± 3 SD) ad Tukey s method, logormal distributios with the same mea () but differet stadard deviatios (.,.4,.6,.8, 1., 1.) are used for the data sets with differet degrees of skewess, ad the stadard ormal distributio is used for the data set whose skewess is zero. The computatio of the mea, stadard deviatio, ad skewess i a logormal distributio is i Appedix A. Accordig to Figure, the two methods show a differet patter, e.g., the outlier percetage of Tukey s method icreases, ulike the SD method. It shows that the results of outlier detectio may chage depedig o the outlier detectio methods or the distributio of the data. 6
14 Outlier Skewess SD Method (Mea ± SD) 3 SD Method (Mea ± 3 SD) Tukey's Method (1.5 IQR) Tukey's Method (3 IQR) Figure : Theoretical Chage of Outliers Percetage Accordig to the Skewess of the Logormal Distributios i the SD Method ad Tukey s Method Whe data are highly skewed or i other respects depart from a ormal distributio, trasformatios to ormality is a commo step i order to idetify outliers usig a method which is quite effective i a ormal distributio. Such a trasformatio could be useful whe the idetificatio of outliers is coducted as a prelimiary step for data aalysis ad it helps to make possible the selectio of appropriate statistical procedures for estimatig ad testig as well. 1 However, if a outlier itself is a primary cocer i a give study, as was show i a previous example i the idetificatio of medical practitioers who uder- or over-utilize such medical equipmet as x-ray istrumets, a trasformatio of the data could affect our ability to idetify outliers. For example, 5 radom samples (x) are geerated through statistical software R i order to show the effect of the trasformatio. The radom variable X has a logormal distributio (Mea=1, SD=1), ad its logarithm, Y=log(x), has a ormal distributio. If the observatios which are beyod the mea by two stadard deviatios are cosidered outliers, the expected outliers before ad after trasformatio are totally differet. As show i Figure 3, while three observatios which have large values are cosidered as outliers i the origial 5 radom samples(x), after log trasformatio of these samples, two observatios of small values appear to be outliers, ad the former large valued observatios are o loger cosidered to be outliers. The vertical lies i each graph represet cutoff values (Mea ± *SD). Lower ad 7
15 upper cutoff values are ( , ) ad ( ,.76336), respectively, i the logormal data(x) ad its logarithm(y). Although this approach is ot be affected by extreme values because it does ot deped o the extreme observatios after trasformatio, after a artificial trasformatio of the data, however, the data may be reshaped so that true outliers are ot detected or other observatios may be falsely idetified as outliers. 1 dlorm(x, 1, 1, )..1. dorm(y, 1, 1, ) x y x y Figure 3: Desity Plot ad Dotplot of the Logormal Distributio (sample size=5) with Mea=1 ad SD=1, ad its Logarithm, Y=log(x). Several methods to idetify outliers have bee developed. Some methods are sesitive to extreme values like the SD method, ad others are resistat to extreme values like Tukey s method. The objective of this paper is to review ad compare several commo ad less commo labelig methods for idetifyig outliers ad to preset iformatio that shows how the average percetage of outliers chages i each method accordig to the degree of skewess ad sample size of the data i order to help establish guidelies for the choice of outlier detectio methods i skewed data whe a outlier itself is a primary cocer i a give study. 8
16 3. OUTLIER LABELING METHOD This chapter reviews seve outlier labelig methods ad gives examples of simple umerical computatios for each test. 3.1 STANDARD DEVIATION (SD) METHOD The simple classical approach to scree outliers is to use the SD (Stadard Deviatio) method. It is defied as SD Method: x ± SD 3 SD Method: x ± 3 SD, where the mea is the sample mea ad SD is the sample stadard deviatio. The observatios outside these itervals may be cosidered as outliers. Accordig to the Chebyshev iequality, if a radom variable X with mea μ ad variace σ exists, the for ay k >, 1 P[ X μ kσ ] k 1 P[ X μ < kσ ] 1-, k > k the iequality [1-(1/k) ] eables us to determie what proportio of our data will be withi k stadard deviatios of the mea 3. For example, at least 75%, 89%, ad 94% of the data are withi, 3, ad 4 stadard deviatios of the mea, respectively. These results may help us determie the likelihood of havig extreme values i the data 3. Although Chebychev's therom is true for ay data from ay distributio, it is limited i that it oly gives the smallest proportio of observatios withi k stadard deviatios of the mea. I the case of whe the distributio of a 9
17 radom variable is kow, a more exact proportio of observatios ceterig aroud the mea ca be computed. For istace, if certai data follow a ormal distributio, approximately 68%, 95%, ad 99.7% of the data are withi 1,, ad 3 stadard deviatios of the mea, respectively; thus, the observatios beyod two or three SD above ad below the mea of the observatios may be cosidered as outliers i the data. The example data set, X, for a simple example of this method is as follows: 3., 3.4, 3.7, 3.7, 3.8, 3.9, 4, 4, 4.1, 4., 4.7, 4.8, 14, 15. For the data set, x = 5.46, SD=3.86, ad the itervals of the SD ad 3 SD methods are (-.5, 13.18) ad (-6.11, 17.4), respectively. Thus, 14 ad 15 are beyod the iterval of the SD method ad there are o outliers i the 3 SD method. 3. Z-SCORE Aother method that ca be used to scree data for outliers is the Z-Score, usig the mea ad stadard deviatio. Z i xi x =, where X i ~ N (µ, σ ), ad sd is the stadard deviatio of data. sd The basic idea of this rule is that if X follows a ormal distributio, N (µ, σ ), the Z follows a stadard ormal distributio, N (, 1), ad Z-scores that exceed 3 i absolute value are geerally cosidered as outliers. This method is simple ad it is the same formula as the 3 SD method whe the criterio of a outlier is a absolute value of a Z-score of at least 3. It presets a reasoable criterio for idetificatio of the outlier whe data follow the ormal distributio. Accordig to Shiffler (1988), a possible maximum Z-score is depedet o sample size, ad it is computed as ( 1) /. The proof is give i Appedix B. Sice o z-score exceeds 3 i a sample size less tha or equal to 1, the z-score method is ot very good for outlier labelig, particularly i small data sets 1. Aother limitatio of this rule is that the stadard deviatio ca be iflated by a few or eve a sigle observatio havig a extreme value. Thus it ca cause a maskig problem, i.e., the less extreme outliers go udetected because of the most extreme outlier(s), ad vice versa. Whe maskig occurs, the outliers may be eighbors. Table 3 shows 1
18 a computatio ad maskig problem of the Z-Score method usig the previous example data set, X. Table 3: Computatio ad Maskig Problem of the Z-Score i Case 1 ( x =5.46, sd=3.86) Case ( x =4.73, sd=.8) x i Z-Score x i Z-Score For case 1, with all of the example data icluded, it appears that the values 14 ad 15 are outliers, yet o observatio exceeds the absolute value of 3. For case, with the most extreme value, 15, amog example data excluded, 14 is cosidered a outlier. This is because multiple extreme values have artificially iflated stadard deviatios. 3.3 THE MODIFIED Z-SCORE Two estimators used i the Z-Score, the sample mea ad sample stadard deviatio, ca be affected by a few extreme values or by eve a sigle extreme value. To avoid this problem, the media ad the media of the absolute deviatio of the media (MAD) are employed i the 11
19 modified Z-Score istead of the mea ad stadard deviatio of the sample, respectively (Iglewicz ad Hoagli, 1993). MAD = media{ x ~ x }, where x~ is the sample media. i The modified Z-Score ( M ) is computed as M i i.6745( x ~ i x ) =, where E( MAD )=.675 σ for large ormal data. MAD Iglewicz ad Hoagli (1993) suggested that observatios are labeled outliers whe M >3.5 through the simulatio based o pseudo-ormal observatios for sample sizes of i 1,, ad 4. 1 The M i score is effective for ormal data i the same way as the Z-score. Table 4: Computatio of Modified Z-Score ad its Compariso with the Z-Score i x i Z-Score modified Z-Score Table 4 shows the computatio of the modified Z-Score ad its compariso with the Z- Score of the previous example data set. While o observatio is detected as a outlier i the Z- Score, two extreme values, 14 ad 15, are detected as outliers at the same time i the modified Z- Score sice this method is less susceptible to the extreme values. 1
20 3.4 TUKEY S METHOD (BOXPLOT) Tukey s (1977) method, costructig a boxplot, is a well-kow simple graphical tool to display iformatio about cotiuous uivariate data, such as the media, lower quartile, upper quartile, lower extreme, ad upper extreme of a data set. It is less sesitive to extreme values of the data tha the previous methods usig the sample mea ad stadard variace because it uses quartiles which are resistat to extreme values. The rules of the method are as follows: 1. The IQR (Iter Quartile Rage) is the distace betwee the lower (Q1) ad upper (Q3) quartiles.. Ier feces are located at a distace 1.5 IQR below Q1 ad above Q3 [Q1-1.5 IQR, Q3+1.5IQR]. 3. Outer feces are located at a distace 3 IQR below Q1 ad above Q3 [Q1-3 IQR, Q3+3 IQR]. 4. A value betwee the ier ad outer feces is a possible outlier. A extreme value beyod the outer feces is a probable outlier. There is o statistical basis for the reaso that Tukey uses 1.5 ad 3 regardig the IQR to make ier ad outer feces. For the previous example data set, Q1=3.75, Q3=4.575, ad IQR=.85. Thus, the ier fece is [.45, 5.85] ad the outer fece is [1.18, 7.13]. Two extreme values, 14 ad 15, are idetified as probable outliers i this method. Figure 4 is a boxplot geerated usig the statistical software STATA for the example data set Figure 4: Boxplot for the Example Data Set 13
21 While previous methods are limited to moud-shaped ad reasoably symmetric data such as the ormal distributio 1, Tukey s method is applicable to skewed or o moud-shaped data sice it makes o distributioal assumptios ad it does ot deped o a mea or stadard deviatio. However, Tukey s method may ot be appropriate for a small sample size 1. For example, let s suppose that a data set cosists of data poits 145, 147, 9, 93, 418, 158, ad 9. A simple distributio of the data usig a Boxplot ad Dotplot are show i Figure 5. Although 158 ad 9 may appear to be outliers i the dotplot, o observatio is show as a outlier i the boxplot. 1,, 3, Figure 5: Boxplot ad Dotplot. (Note: No outlier show i the boxplot) 3.5 ADJUSTED BOXPLOT Although the boxplot proposed by Tukey (1977) may be applicable for both symmetric ad skewed data, the more skewed the data, the more observatios may be detected as outliers, 3 as show i Figure. This results from the fact that this method is based o robust measures such as lower ad upper quartiles ad the IQR without cosiderig the skewess of the data. Vaderviere ad Huber (4) itroduced a adjusted boxplot takig ito accout the medcouple (MC) 3, a robust measure of skewess for a skewed distributio. 14
22 Whe X ={ x 1, x,..., x } is a data set idepedetly sampled from a cotiuous uivariate distributio ad it is sorted such as j i x1 x... x, the MC of the data is defied as ( x j med k ) ( med k xi ) MC( x 1,..., x ) = med,where medk is the media of X, ad x x i ad j have to satisfy x i med k follows (G. Bray et al. (5)): x j, ad x i x j. The iterval of the adjusted boxplot is as [L, U] = [Q * exp (-3.5MC) * IQR, Q * exp (4MC) * IQR] if MC = [Q * exp (-4MC) * IQR, Q * exp (3.5MC) * IQR] if MC, where L is the lower fece, ad U is the upper fece of the iterval. The observatios which fall outside the iterval are cosidered outliers. The value of the MC rages betwee -1 ad 1. If MC=, the data is symmetric ad the adjusted boxplot becomes Tukey s box plot. If MC>, the data has a right skewed distributio, whereas if MC<, the data has a left skewed distributio. 3 MC ad a brief compariso of classical ad MC skewess are i Appedix C. A simple example for computatio of For the previous example data set, Q1=3.75, Q3=4.575, IQR=.85, ad MC=.43. Thus, the iterval of the adjusted boxplot is [3.44, 11.6]. Two extreme values, 14 ad 15, ad the two smallest values, 3. ad 3.4, are idetified as outliers i this method. Figure 6 shows the chage of the itervals of two boxplot methods, Tukey s method ad the adjusted boxplot, for the example data set. The vertical dotted lies are the lower ad upper boud of the iterval of each method. Although the example data set is artificial ad is ot large eough to explai their differece, we ca see a geeral tred that the iterval of the adjusted boxplot, especially the upper fece, moves to the side of the skewed tail, compared to Tukey s method. 15
23 Ier feces of Tukey Method (Q1-1.5*IQR, Q3+1.5*IQR) Outer feces of Tukey Method (Q1-3*IQR, Q3+3IQR) Sigle fece of adjusted box plot (Q1-1.5 * exp (-3.5MC) * IQR, Q3+1.5 * exp (4MC) * IQR) Figure 6: Chage of theiitervals of Two Differet Boxplot Methods (Tukey s Method vs. the Adjusted Boxplot) Vaderviere ad Huber (4) computed the average percetage of outliers beyod the lower ad upper fece of two types of boxplots, the adjusted Boxplot ad Tukey s Boxplot, for several distributios ad differet sample sizes. I the simulatio, less observatios, especially i the right tail, are classified as outliers compared to Tukey s method whe the data are skewed to the right. 3 I the case of a mildly right-skewed distributio, the lower fece of the iterval may move to the right ad more observatios i the left side will be classified as outliers compared to Tukey s method. This differece maily comes from a decrease i the lower fece ad a icrease i the upper fece from Q1 ad Q3, repectively. 3 16
24 3.6 MAD E METHOD The MAD e method, usig the media ad the Media Absolute Deviatio (MAD), is oe of the basic robust methods which are largely uaffected by the presece of extreme values of the data set. 11 This approach is similar to the SD method. However, the media ad MAD e are employed i this method istead of the mea ad stadard deviatio. The MAD e method is defied as follows; MAD e Method: Media ± MAD e 3 MAD e Method: Media ± 3 MAD e, where MAD e =1.483 MAD for large ormal data. MAD is a estimator of the spread i a data, similar to the stadard deviatio 11, but has a approximately 5% breakdow poit like the media 1. The otio of breakdow poit is delieated i Appedix D. MAD= media ( x i media(x) i=1,,, ) Whe the MAD value is scaled by a factor of 1.483, it is similar to the stadard deviatio i a ormal distributio. This scaled MAD value is the MAD e. For the example data set, the media=4, MAD=.3, ad MAD e =.44. Thus, the itervals of the MAD e ad 3 MAD e methods are [3.11, 4.89] ad [.67, 5.33], respectively. Sice this approach uses two robust estimators havig a high breakdow poit, i.e., it is ot uduly affected by extreme values eve though a few observatios make the distributio of the data skewed, the iterval is seldom iflated, ulike the SD method. 3.7 MEDIAN RULE The media is a robust estimator of locatio havig a approximately 5% breakdow poit. It is the value that falls exactly i the ceter of the data whe the data are arraged i order. 17
25 That is, if x 1, x,, x is a radom sample sorted by order of magitude, the the media is defied as: Media, ~ x = x m whe is odd x~ = (x m +x m+1 )/ whe is eve, where m=roud up (/) For a skewed distributio like icome data, the media is ofte used i describig the average of the data. The media ad mea have the same value i a symmetrical distributio. Carlig (1998) itroduces the media rule for idetificatio of outliers through studyig the relatioship betwee target outlier percetage ad Geeralized Lambda Distributios (GLDs). GLDs with differet parameters are used for various moderately skewed distributios 1. The media substitutes for the quartiles of Tukey s method, ad a differet scale of the IQR is employed i this method. It is more resistat ad its target outlier percetage is less affected by sample size tha Tukey s method i the o-gaussia case 1. The scale of IQR ca be adjusted depedig o which target outlier percetage ad GLD are selected. I my paper,.3 is chose as the scale of IQR; whe the scale is applied to ormal distributio, the outlier percetage turs out to be betwee Tukey s method of 1.5 IQR ad that of 3 IQR, i.e.,. %. It is defied as: [C 1, C ]=Q ±.3 IQR, where Q is the sample media. For the example data set, Q=4, ad IQR=.85. Thus, the iterval of this method is [.5, 5.96]. 18
26 4. SIMULATION STUDY AND RESULTS FOR THE FIVE SELECTED LABELING METHODS Most itervals or criteria to idetify possible outliers i outlier labelig methods are effective uder the ormal distributio. For example, i the case of a well-kow labelig method such as the SD ad 3 SD methods ad the Boxplot (1.5 IQR), the expected percetages of observatios outside the iterval are 5%,.3%, ad.7%, respectively, uder large ormal samples. Although these methods are quite powerful with large ormal data, it may be problematic to apply them to o-ormal data or small sample sizes without iformatio about their characteristics i these circumstaces. This is because each labelig method has differet measures to detect outliers, ad expected outlier percetages chage differetly accordig to the sample size or distributio type of the data. The purpose of this simulatio is to preset the expected percetage of the observatios outside of the iterval of several labelig methods accordig to the sample size ad the degree of the skewess of the data usig the logormal distributio with the same mea ad differet variaces. Through this simulatio, we ca kow ot oly the possible outlier percetage of several labelig methods but also which method is more robust accordig to the above two factors, skewess ad sample size. The simulatio proceeds as follows: Five labelig methods are selected: the SD Method, the MADe Method, Tukey s Method (Boxplot), Adjusted Boxplot, ad the Media Rule. The Z-Score ad modified Z-Score are ot cosidered because their criteria to defie a outlier are based o the ormal distributio. Average outlier percetages of five labelig methods i the stadard ormal (,1) ad logormal distributios with the same mea ad differet variaces (mea=, variace=.,.4,.6,.8, 1 ) are computed. For each distributio, 1 replicatios of sample sizes ad 5, 3 replicatios of the sample size 1, ad 1 replicatios of the sample sizes 3 ad 5 are cosidered. To illustrate the shape of each distributio, i.e., the degree of skewess of the data, 19
27 5 radom observatios were geerated from the distributios, ad their desity plots ad skewess are as show i Figure 7. Desity Value...4 Stadard Normal cs=.15 mc=.53 Desity Value. 1.. Logormal(,.) cs=.7 mc= x x Desity Value..4.8 Logormal(,.4) cs=1.56 mc=.6 Desity Value..4.8 Logormal(,.6) cs=.559 mc= x x Logormal(,.8) cs=3.999 mc=.379 Logmormal(,1.) cs=5.99 mc=.446 Desity Value..4 Desity Value x x Figure 7: Stadard Normal Distributio ad Logormal Distributios (cs=classical skewess, mc=medcouple skewess) Figures 8 ad 9 visually show the characteristics of the five labelig methods accordig to the sample size ad skewess of the data usig the logormal distributio. All the values of the Figures icludig their stadard error of the average percetage are reported i Table 5. The results of this simulatio are as follows: 1. The MADe method classifies more observatios as outliers tha ay other method. This method approaches the SD method i large ormal data; however, as the data icreases i skewess, the differece i outlier percetages betwee the MADe method ad the SD method
28 becomes larger sice the locatio ad scale measures such as the media ad MADe become the same as the mea ad stadard variace of the SD method whe data follows a ormal distributio with a large sample size. The MADe, Tukey s method, ad the Media rule icrease i the total average percetages of outliers the more skewed the data, while the SD method ad adjusted boxplot seldom chage over differet sample sizes.. The Media rule classifies less observatios tha Tukey s 1.5 IQR method ad more observatios tha Tukey s 3 IQR method. 3. The decrease rage of the total outlier percetage of the adjusted boxplot is larger tha other methods as the sample size icreases. 4. Most methods except the adjusted boxplot show similar patters i the average outlier percetages o the left side of the distributio. They decrease i left outlier percetage rapidly, especially i MADe ad SD methods, the more skewed the data; however, the adjusted boxplot decreases slowly i sample sizes over 3. Differet patters of the adjusted boxplot, e.g., icrease i left outlier percetage i small sample sizes, may be due to the followig: The left fece of the iterval may move to the right side because of the MC skewess ad a few observatios may be distributed outside the left fece by chace. Although the umber of the observatios is small, the ratio i a small sample size could large. This may affect a icrease i the average of the percetage of outliers o the left of the distributio. The adjusted boxplot may still detect observatios o the left side of the distributio i right skewed data, especially mildly skewed data; however, the average percetages are quiet low. 5. The MADe, Tukey s method, ad the Media rule icrease i the percetage of outliers o the right side of the distributio as the skewess of the data icreases while the SD method ad adjusted boxplot seldom chage i each sample size (the SD method icreases slightly ad plateaus). The right fece of the itervals of both methods, the SD method ad adjusted boxplot, move to the right side of the distributio as the skewess of the data icreases. Sice the adjusted boxplot takes ito accout the skewess of the data, its right fece of the iterval moves more to the side of the skewed tail, here the right side of the distributio, as the skewess icreases. O the other had, the iterval of the SD method is just iflated because of the extreme values. 1
29 Sample size Sample size 5 Figure 8: Chage i the Outlier Percetages Accordig to the Skewess of the Data
30 Sample1 Sample size 3 Figure 8 (cotiued) 3
31 Sample size 5 Figure 8 (cotiued) 4
32 Figure 9: Chage i the Total Percetages of Outliers Accordig to the Sample Size 5
33 Figure 9 (cotiued) 6
34 Table 5: The Average Percetage of Left Outliers, Right Outliers ad the Average Total Percet of Outliers for the Logormal Distributios with the Same Mea ad Differet Variaces (mea=, variace=.,.4,.6,.8, 1. ) ad the Stadard Normal Distributio with Differet Sample Sizes. Distributio SN LN (,.) LN (,.4) LN (,.6) CS.6 (.15) -.17 (.1) -.6 (.13).6 (.15) -.8 (.1).436 (.16).57 (.1).574 (.18).64 (.).69 (.15).864 (.) 1.6 (.17) (.7) 1.51 (.33) 1.33 (.5) 1.1 (.4) 1.63 (.4) (.39).1 (.63).199 (.64) MC -.4 (.7) -.9 (.5).1 (.6).4 (.6).4 (.5).84 (.7).86 (.5).79 (.6).93 (.6).94 (.4).161 (.7).17 (.5).181 (.7).167 (.6).17 (.5).19 (.7).5 (.5).51 (.6).54 (.7).55 (.5) Left (.8).176 (.53).6 (.66).67 (.73).66 (.51).555 (.5).71 (.37).73 (.5).676 (.44).594 (.35).95 (.).4 (.9). (.8).7 (.5). (.) () () () () () SD Method MADe Method Mea ± SD Mea ± 3 SD Media ± MADe Media ± 3 MADe Right Total Left Right Total Left Right Total Left Right Total (.83) (.11) (.1) (.11) (.16) (.15) (.156) (.41) (.66) (.73) (.19) (.5) (.63) (.13) (.1) (.17) (.95) (.88) (.141) (.3) (.7) (.45) (.6) (.79) (.17) (.) (.6) (.115) (.19) (.184) (.3) (.36) (.55) (.6) (.86) (.19) (.1) (.6) (.11) (.99) (.173) (.9) (.8) (.4) (.47) (.59) (.16) (.17) (.5) (.78) (.8) (.133) (.19) (.19) (.9) (.9) (.95) () (.31) (.31) (.11) (.183) (.7) (.34) (.18) (.119) (.55) (.59) () (.8) (.8) (.6) (.114) (.141) (.8) (.57) (.59) (.73) (.76) (.3) (.38) (.38) (.77) (.139) (.168) (.9) (.65) (.67) (.71) (.81) () (.35) (.35) (.6) (.16) (.185) () (.68) (.68) (.64) (.65) () (.9) (.9) (.4) (.116) (.13) () (.51) (.51) (.9) (.9) () (.55) (.55) (.91) (.197) (.5) (.3) (.141) (.144) (.55) (.54) () (.37) (.37) (.5) (.17) (.133) (.) (.84) (.85) (.73) (.73) () (.44) (.44) (.3) (.168) (.173) () (.11) (.11) (.65) (.66) () (.46) (.46) (.14) (.158) (.163) () (.94) (.94) (.56) (.57) () (.3) (.3) (.5) (.149) (.151) () (.74) (.74) (.84) (.84) () (.69) (.69) (.4) (.16) (.4) (.5) (.164) (.165) (.56) (.56) () (.38) (.38) (.11) (.14) (.14) () (.15) (.15) (.74) (.74) () (.5) (.5) (.1) (.17) (.171) () (.133) (.133) (.86) (.86) () (.51) (.51) () (.178) (.178) () (.146) (.146) (.68) (.68) () (.47) (.47) () (.145) (.145) () (.16) (.16) 7
35 8 Table 5 (cotiued) SD Method MADe Method Mea ± SD Mea ± 3 SD Media ± MADe Media ± 3 MADe Distributio CS MC Left Right Total Left Right Total Left Right Total Left Right Total 1.56 (.4).31 (.7).5 (.5) 5.71 (.81) (.81) () 1.86 (.76) 1.86 (.76).95 (.31) 13.4 (.18) (.4).5 (.5) (.191) 7.9 (.191) 5.16 (.3).315 (.5) () 5.5 (.58) 5.5 (.58) ().13 (.38).13 (.38).6 (.4) 1.93 (.143) (.144) () 7.33 (.1) 7.33 (.1) (.58).314 (.7) () (.84) (.84) ().177 (.49).177 (.49) () (.183) (.183) () 7.83 (.151) 7.83 (.151) (.15).37 (.7) () 4.5 (.13) 4.5 (.13) () (.53) (.53) () 1.75 (.193) 1.75 (.193) () 7.17 (.16) 7.17 (.16) LN (,.8) 5.98 (.96).34 (.5) () 4.48 (.76) 4.48 (.76) () 1.94 (.41) 1.94 (.41) () 1.84 (.13) 1.84 (.13) () 7.9 (.1) 7.9 (.1) (.6).353 (.7) () 6.3 (.78) 6.3 (.78) ().455 (.79).455 (.79).5 (.5) (.17) (.18). (.16) (.195) (.195) 5.66 (.34).384 (.5) () 5.48 (.6) 5.48 (.6) ().486 (.37).486 (.37) () (.15) (.15) () 1.14 (.133) 1.14 (.133) (.79).4 (.6) () 4.76 (.96) 4.76 (.96) ().73 (.51).73 (.51) () (.19) (.19) () (.165) (.165) (.183).399 (.7) () 4.7 (.18) 4.7 (.18) ().13 (.6).13 (.6) () 15.6 (.11) 15.6 (.11) () 9.67 (.187) 9.67 (.187) LN (, 1.) (.155).394 (.5) () 4.15 (.85) 4.15 (.85) () 1.99 (.46) 1.99 (.46) () (.137) (.137) () 9.73 (.16) 9.73 (.16) Tukey s Method Adjusted Boxplot Media Rule Q1-1.5 IQR / Q3+1.5 IQR Q1-3 IQR / Q3+3 IQR Q1-1.5exp(-3.5mc)/ Q3+1.5exp(4mc) Q ±.3 IQR Distributio CS MC Left Right Total Left Right Total Left Right Total Left Right Total.6 (.15) -.4 (.7) 1.1 (.83) 1.17 (.89).7 (.137).65 (.19).4 (.14).15 (.6).39 (.135).75 (.153) 5.14 (.178).615 (.61).685 (.67) 1.3 (.1) (.1) -.9 (.5).74 (.43).66 (.41) (.65).6 (.3). (.).8 (.4) 1.46 (.9).7 (.11) 3.53 (.11).9 (.8).36 (.4).58 (.39) (.13).1 (.6).537 (.49).51 (.49) 1.47 (.78).3 (.3).3 (.3).7 (.5) 1.5 (.11) 1.18 (.17).3 (.15).18 (.6).183 (.8).363 (.4) 3.6 (.15).4 (.6).4 (.45).363 (.38).783 (.61) () () ().59 (.77).647 (.93) 1.37 (.98).17 (.5).13 (.4).57 (.35) SN (.1).4 (.5).34 (.9).354 (.3).696 (.47) (). (.). (.).564 (.71).468 (.57).89 (1.17).11 (.16).11 (.16). (.3)
36 9 Table 5 (cotiued) Tukey s Method Adjusted Boxplot Media Rule Q1-1.5 IQR / Q3+1.5 IQR Q1-3 IQR / Q3+3 IQR Q1-1.5exp(-3.5mc)/ Q3+1.5exp(4mc) Q ±.3 IQR Distributio CS MC Left Right Total Left Right Total Left Right Total Left Right Total.436 (.16).84 (.7).415 (.5).9 (.113).75 (.137) ().1 (.33).1 (.33).75 (.146).395 (.143) 5.1 (.177).19 (.35) (.98) (.111) 5.57 (.1).86 (.5).146 (.) 1.86 (.67) 1.95 (.75) ().18 (.15).18 (.15) (.13) (.91) 3.41 (.118).8 (.8) (.5) 1.14 (.54) (.18).79 (.6).63 (.1) 1.6 (.76) (.78) ().63 (.15).63 (.15).95 (.11) 1.1 (.13).5 (.15).3 (.3).913 (.55).917 (.55) 3.64 (.).93 (.6). (.9) (.86) 1.67 (.86) ().77 (.16).77 (.16).8 (.13).543 (.68) (.97) ().94 (.64).94 (.64) LN (..) 5.69 (.15).94 (.4).1 (.6) 1.51 (.6) 1.54 (.6) ().36 (.9).36 (.9).47 (.7).356 (.39).88 (.66) ().838 (.44).838 (.44).864 (.).161 (.7).145 (.33) 3.85 (.131) 3.95 (.139) ().755 (.63).755 (.63).785 (.153).16 (.13) (.175).5 (.11).87 (.1).895 (.11) (.17).17 (.5).1 (.4) (.88) 3.56 (.88) ().56 (.34).56 (.34).38 (.11) 1.54 (.87) 3.54 (.119) ().538 (.76).538 (.76) (.7).181 (.7) () 3.3 (.15) 3.3 (.15) ().373 (.38).373 (.38) (.143).717 (.71).153 (.143) ().143 (.9).143 (.9) (.33).167 (.6) ().93 (.95).93 (.95) ().363 (.4).363 (.4).587 (.98).553 (.61) 1.14 (.98) ().47 (.83).47 (.83) LN (,.4) (.5).17 (.5) () 3.78 (.77) 3.78 (.77) ().41 (.3).41 (.3).4 (.59).514 (.48).916 (.65) ().54 (.67).54 (.67) 1.1 (.4).19 (.7).1 (.7) 5.5 (.151) 5.15 (.15) () 1.48 (.86) 1.48 (.86) 3.75 (.169) 1.94 (.117) 5.15 (.181) () (.139) (.139) (.4).5 (.5) () 4.8 (.95) 4.8 (.95) () 1.7 (.5) 1.7 (.5).178 (.13) (.69) 3.33 (.15) () 4.98 (.87) 4.98 (.87) (.39).51 (.6) () (.18) (.18) () 1.15 (.69) 1.15 (.69) (.134).767 (.77).163 (.136) () (.119) (.119) 3.1 (.63).54 (.7) () 4.81 (.13) 4.81 (.13) () (.61) (.61).633 (.18).593 (.66) 1.7 (.13) () 3.97 (.119) 3.97 (.119) LN (,.6) (.64).55 (.5) () 4.59 (.93) 4.59 (.93) () 1.7 (.48) 1.7 (.48).5 (.74).496 (.58) 1.16 (.78) () 3.7 (.8) 3.7 (.8) 1.56 (.4).31 (.7).1 (.1) (.16) 6.85 (.163) ().595 (.113).595 (.113) 3.46 (.177) (.1) (.191) () 5.91 (.157) 5.91 (.157) 5.16 (.3).315 (.5) () (.1) (.1) ().8 (.68).8 (.68).134 (.1) 1.18 (.74) 3.35 (.17) () (.98) (.98) (.58).314 (.7) () (.131) (.131) ()..7 (.84)..7 (.84) 1.8 (.147).99 (.93).7 (.158) () 5.65 (.18) 5.65 (.18) (.15).37 (.7) () 6.67 (.137) 6.67 (.137) ().153 (.75).153 (.75).59 (.15).64 (.6) 1.3 (.119) () (.134) (.134) LN (,.8) 5.98 (.96).34 (.5) () (.113) (.113) () (.68) (.68).4 (.56).53 (.5).774 (.63) () (.13) (.13)
37 Table 5 (cotiued) Distributio CS MC LN (, 1.) (.6).66 (.34) 3.86 (.79) (.183) 4.5 (.155).353 (.7).384 (.5).4 (.6).399 (.7).394 (.5) Q1-1.5 IQR / Q3+1.5 IQR Left () () () () () Right 8.37 (.166) 8.16 (.11) (.144) 7.73 (.158) 7.68 (.1) Tukey s Method Adjusted Boxplot Media Rule Total 8.37 (.166) 8.16 (.11) (.144) 7.73 (.158) 7.68 (.1) Left () () () () () Q1-3 IQR / Q3+3 IQR Right 4.5 (.133) (.83) (.1) 3.3 (.11) (.75) Total 4.5 (.133) (.83) (.1) 3.3 (.11) (.75) Left (.179) (.11) 1. (.135).43 (.114).134 (.4) Q1-1.5exp(-3.5mc)/ Q3+1.5exp(4mc) Right.385 (.134) 1.41 (.76).847 (.74).687 (.6).616 (.5) Total 5.57 (.197) 3.38 (.17).47 (.138) 1.11 (.116).75 (.58) Left () () () () () Q ±.3 IQR Right (.163) (.17) 7.63 (.143) 7.4 (.148) (.116) Total (.163) (.17) 7.63 (.143) 7.4 (.148) (.116) (stadard error of the average percetage of outliers) 3
38 5. APPLICATION I this chapter the five selected outlier labelig methods are applied to three real data sets ad oe modified data set of oe of the three real data sets. These real data sets are provided by Gateway Health Pla, a maaged care alterative to the Departmet of Public Welfare s Medical Assistace Program i Pesylvaia. These data sets are part of Primary Care Provider (PCP) s basic iformatio which is eeded to idetify providers (PCPs) associated with Member Dissatisfactio Rates (MDRs = the umber of member complaits/pcp practice size) that are uusually high compared with other PCPs of similar sized practices 3. Case 1 (data set 1) is visit per 1 office med, ad its distributio is ot very differet from the ormal distributio. Case (data set ) is Scripts per 1 Rx, ad its distributio is mildly skewed to the right. Case 3 (data set 3) is Svcs per 1 early child im, ad its distributio is highly skewed to the right because of oe observatio which has a extremely large value. Case 4 (data set 4) is the data set which is modified from the data set 3 by meas of excludig the most extreme value from the data set 3 to see the possible effect of the oe extreme outlier over the outlier labelig methods. Figure 1 shows the basic statistics ad distributio of each data set (Case 1-Case 4). Desity 1.e-4.e-4 3.e-4 4.e case 1 Figure 1: Histogram ad Basic Statistics of Case 1-Case 4 Mi: 3.8 1st Qu.: 33.3 Mea: Media: rd Qu.: Max: Total N: 9 Variace: Std Dev.: SE Mea: LCL Mea: UCL Mea: Skewess:.597 Kurtosis:.793 Medcouple skewess:.64 3
39 Desity 1.e-5.e-5 3.e-5 4.e-5 5.e case Mi: st Qu.: Mea: Media: rd Qu.: Max: Total N: 9. Variace: Std Dev.: SE Mea: LCL Mea: UCL Mea: Skewess: 1.91 Kurtosis: Medcouple skewess:.187 Desity 5.e-5 1.e-4 1.5e-4.e case 3 Mi: st Qu.: Mea: Media: rd Qu.: Max: 7 Total N: 17 Variace: Std Dev.: SE Mea: LCL Mea: UCL Mea: Skewess: Kurtosis: Medcouple skewess:.11 Desity 5.e-5 1.e-4 1.5e-4.e case 4 Mi: st Qu.: Mea: Media: rd Qu.: Max: 16 Total N: 16 Variace: Std Dev.: SE Mea: LCL Mea: UCL Mea: Skewess: Kurtosis: Medcouple skewess:.119 Figure 1 (cotiued) 33
40 Table 6 shows the left, right, ad total umber of outliers idetified i each data set after applyig the five outlier labelig methods. Sample programs for Case 4 are give i APPENDIX E. Table 6: Iterval, Left, Right, ad Total Number of Outliers Accordig to the Five Outlier Methods Case 1 (Data set 1): N=9 Method Iterval Left Right Total SD Method (131.49, ) (.96) 6 (.87) 8 (3.83) 3 SD Method ( , ) () 1 (.48) 1 (.48) Tukey s Method (1.5 IQR) (376.81, ) 1 (.48) (.96) 3 (1.44) Tukey s Method (3 IQR) (-49.6, 17.3) () () () Adjusted Boxplot (95.41, ) 1 (.48) 1 (.48) (.96) MADe Method (131.5, 656.1) 4 (1.91) 11 (5.6) 15 (7.18) 3 MADe Method (74.1, 749.5) 1 (.48) (.96) 3 (1.44) Media Rule (-43.87, ) () 1 (.48) 1 (.48) Case (Data set ): N=9 Method Iterval Left Right Total SD Method ( , ) () 8 (3.83) 8 (3.83) 3 SD Method ( , ) () 4 (1.91) 4 (1.91) Tukey s Method (1.5 IQR) (169.66, ) () 8 (3.83) 8 (3.83) Tukey s Method (3 IQR) ( , ) () 3 (1.44) 3 (1.44) Adjusted Boxplot (858.85, ) 5 (.39) (.96) 7 (3.35) MADe Method ( , ) 4 (1.91) (9.57) 4 (11.48) 3 MADe Method ( , 488.4) () 6 (.87) 6 (.87) Media Rule ( , 4939.) () 5 (.39) 5 (.39) Case 3 (Data set 3): N=17 Method Iterval Left Right Total SD Method ( , ) () 1 (.79) 1 (.79) 3 SD Method ( , 53.38) () 1 (.79) 1 (.79) Tukey s Method (1.5 IQR) (-96.38, ) () 3 (.36) 3 (.36) Tukey s Method (3 IQR) ( , ) () (1.57) (1.57) Adjusted Boxplot ( , ) 1 (.79) (1.57) 3 (.36) MADe Method (114.7, ) 1 (.79) 6 (4.7) 7 (5.51) 3 MADe Method ( , ) () 3 (.36) 3 (.36) Media Rule ( , ) () 3 (.36) 3 (.36) 34
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about
More informationGCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.
GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
More informationCHAPTER 7: Central Limit Theorem: CLT for Averages (Means)
CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:
More informationOverview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationNon-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationThe following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationA Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:
A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio
More informationDescriptive Statistics
Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationExploratory Data Analysis
1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios
More informationQuadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
More informationChapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
More informationOne-sample test of proportions
Oe-sample test of proportios The Settig: Idividuals i some populatio ca be classified ito oe of two categories. You wat to make iferece about the proportio i each category, so you draw a sample. Examples:
More informationInference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval
Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationCOMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat
More informationThis document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.
SPC Formulas ad Tables 1 This documet cotais a collectio of formulas ad costats useful for SPC chart costructio. It assumes you are already familiar with SPC. Termiology Geerally, a bar draw over a symbol
More informationConfidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.
Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).
More informationLesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More informationMath C067 Sampling Distributions
Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters
More informationConfidence Intervals
Cofidece Itervals Cofidece Itervals are a extesio of the cocept of Margi of Error which we met earlier i this course. Remember we saw: The sample proportio will differ from the populatio proportio by more
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationChapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationData Analysis and Statistical Behaviors of Stock Market Fluctuations
44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:
More informationCONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION
www.arpapress.com/volumes/vol8issue2/ijrras_8_2_04.pdf CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION Elsayed A. E. Habib Departmet of Statistics ad Mathematics, Faculty of Commerce, Beha
More informationMann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)
No-Parametric ivariate Statistics: Wilcoxo-Ma-Whitey 2 Sample Test 1 Ma-Whitey 2 Sample Test (a.k.a. Wilcoxo Rak Sum Test) The (Wilcoxo-) Ma-Whitey (WMW) test is the o-parametric equivalet of a pooled
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More informationBiology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships
Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the
More informationResearch Method (I) --Knowledge on Sampling (Simple Random Sampling)
Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
More informationOverview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
More informationLesson 15 ANOVA (analysis of variance)
Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi
More informationAnalyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
More informationDefinition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean
1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.
More informationhp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation
HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics
More informationLECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More information, a Wishart distribution with n -1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationInstitute of Actuaries of India Subject CT1 Financial Mathematics
Istitute of Actuaries of Idia Subject CT1 Fiacial Mathematics For 2014 Examiatios Subject CT1 Fiacial Mathematics Core Techical Aim The aim of the Fiacial Mathematics subject is to provide a groudig i
More informationINVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
More informationPredictive Modeling Data. in the ACT Electronic Student Record
Predictive Modelig Data i the ACT Electroic Studet Record overview Predictive Modelig Data Added to the ACT Electroic Studet Record With the release of studet records i September 2012, predictive modelig
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationSampling Distribution And Central Limit Theorem
() Samplig Distributio & Cetral Limit Samplig Distributio Ad Cetral Limit Samplig distributio of the sample mea If we sample a umber of samples (say k samples where k is very large umber) each of size,
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationBasic Data Analysis Principles. Acknowledgments
CEB - Basic Data Aalysis Priciples Basic Data Aalysis Priciples What to do oce you get the data Whe we reaso about quatitative evidece, certai methods for displayig ad aalyzig data are better tha others.
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More information.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
More informationChapter XIV: Fundamentals of Probability and Statistics *
Objectives Chapter XIV: Fudametals o Probability ad Statistics * Preset udametal cocepts o probability ad statistics Review measures o cetral tedecy ad dispersio Aalyze methods ad applicatios o descriptive
More informationApproximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
More informationMEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)
MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:
More informationNow here is the important step
LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"
More informationChapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationwhere: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
More informationHypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
More informationCHAPTER 3 THE TIME VALUE OF MONEY
CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all
More informationTrigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the
More information3. Greatest Common Divisor - Least Common Multiple
3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
More informationSection 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
More informationCHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
More informationPractice Problems for Test 3
Practice Problems for Test 3 Note: these problems oly cover CIs ad hypothesis testig You are also resposible for kowig the samplig distributio of the sample meas, ad the Cetral Limit Theorem Review all
More informationPage 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville
Real Optios for Egieerig Systems J: Real Optios for Egieerig Systems By (MIT) Stefa Scholtes (CU) Course website: http://msl.mit.edu/cmi/ardet_2002 Stefa Scholtes Judge Istitute of Maagemet, CU Slide What
More informationMulti-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu
Multi-server Optimal Badwidth Moitorig for QoS based Multimedia Delivery Aup Basu, Iree Cheg ad Yizhe Yu Departmet of Computig Sciece U. of Alberta Architecture Applicatio Layer Request receptio -coectio
More informationTheorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius
More informationProject Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments
Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please
More informationOMG! Excessive Texting Tied to Risky Teen Behaviors
BUSIESS WEEK: EXECUTIVE EALT ovember 09, 2010 OMG! Excessive Textig Tied to Risky Tee Behaviors Kids who sed more tha 120 a day more likely to try drugs, alcohol ad sex, researchers fid TUESDAY, ov. 9
More informationHow to read A Mutual Fund shareholder report
Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.
More informationSAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
More informationPENSION ANNUITY. Policy Conditions Document reference: PPAS1(7) This is an important document. Please keep it in a safe place.
PENSION ANNUITY Policy Coditios Documet referece: PPAS1(7) This is a importat documet. Please keep it i a safe place. Pesio Auity Policy Coditios Welcome to LV=, ad thak you for choosig our Pesio Auity.
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationTHE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction
THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationTaking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
More informationBasic Elements of Arithmetic Sequences and Series
MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic
More informationODBC. Getting Started With Sage Timberline Office ODBC
ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.
More information4.3. The Integral and Comparison Tests
4.3. THE INTEGRAL AND COMPARISON TESTS 9 4.3. The Itegral ad Compariso Tests 4.3.. The Itegral Test. Suppose f is a cotiuous, positive, decreasig fuctio o [, ), ad let a = f(). The the covergece or divergece
More informationPUBLIC RELATIONS PROJECT 2016
PUBLIC RELATIONS PROJECT 2016 The purpose of the Public Relatios Project is to provide a opportuity for the chapter members to demostrate the kowledge ad skills eeded i plaig, orgaizig, implemetig ad evaluatig
More informationInverse Gaussian Distribution
5 Kauhisa Matsuda All rights reserved. Iverse Gaussia Distributio Abstract Kauhisa Matsuda Departmet of Ecoomics The Graduate Ceter The City Uiversity of New York 65 Fifth Aveue New York NY 6-49 Email:
More information