Exploratory Data Analysis
|
|
- Herbert Day
- 8 years ago
- Views:
Transcription
1 1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios or summaries of the data may sometimes be su±ciet to ful ll the purposes for which the data were gathered. E ective summaries ca also poit to \bad" data or uexpected aspects that might go uoticed if data are blidly cruched by computers. Further, exploratory data aalysis suggest possible probability models for the data ad helps uderstadig the populatio features that a good model ought to be able to reproduce. Here we shall brie y discuss ways of summarizig three features of the distributio of a batch of data Z = (Z 1 ;:::;Z ): its ceter, its spread ad is shape. All three cocepts are deliberately kept vague. For further refereces see Hoagli et al. (1984) ad Mosteller ad Tukey (1977). The term batch is used to emphasize the fact that at this stage o commitmet to a statistical model is beig made. 1.1 MEASURES OF CENTER A very popular measure of ceter (or locatio) is the (arithmetic mea X ¹Z = 1 Z i = 1 > Z; where deotes a -vector of oes. Notice that the mea eed ot coicide with ay of the observatios i the batch. The use of the mea is partly justi ed by its liearity property, that is, if X ad Y are batches of data of equal size ad Z is such that Z i = ¹ + X i + Y i, the ¹Z = ¹ + ¹ X + ¹Y : Notice however that if Z i = g(x i ;Y i ), where g is a arbitrary fuctio, the it is geerally ot true that ¹ Z = g( ¹ X; ¹ Y ). The mea is very sesitive to addig ad droppig observatios. I particular, it is very sesitive to eve a sigle outlier, that is, a arbitrarily large or small data poit. To see this, let ¹ Z 1 deote the mea of a batch of 1 observatios.
2 2 The mea of the batch of observatios obtaied by addig the value z to the iitial batch is equal to 1 X ¹Z = 1 ( Z i + z) = µ 1 1 ¹Z z; that is, ¹Z is a weighted average of ¹Z 1 ad z. For ay xed, j ¹Z ¹Z 1 j = jz ¹Z 1 j! 1; as jzj! 1. Sice a sigle outlier is eough to take ¹Z arbitrarily away from ¹Z 1, we say that the mea is ot a robust measure of ceter. O the other had, for ay xed ite z, ¹Z ¹Z 1 = z ¹Z 1! 0; as! 1, ad so the e ect of a sigle outlier vaishes as the size of the batch gets arbitrarily large. The ormalized di erece SC(z) = ( ¹Z ¹Z 1 ) = z ¹Z 1 ; viewed as fuctio of z, is called the sesitivity curve of the mea. The fact that this fuctio is ubouded simply re ects the lack of robustess of the mea. A closely related cocept is J.W. Tukey's empirical i uece fuctio. Let ¹Z deote the mea of a batch of data of size, ad let ¹Z (i) deote the mea of the batch of size 1 obtaied by deletig the i-th data poit Z i. It is easy to verify that ¹Z ¹Z (i) = Z i ¹Z 1 ; i = 1;:::;: The empirical i uece fuctio of the mea is a -vector with ith elemet equal to ¹Z ¹Z (i). A i uetial observatio is oe for which the di erece j ¹Z ¹Z (i) j is large or, equivaletly, the residual Z i ¹Z is large. To robustify the mea, let us sort the data i ascedig order. The ordered data Z (1) ;Z (2) ;:::;Z (), where Z (1) Z (2) Z (), are called the set of order statistics of the batch. A reasoable measure of ceter is the (symmetric) -trimmed mea, de ed as ¹Z = Z ([ ]+1) + + Z ( [ ]) ; 0 < :5; 2[ ] where [ ] deotes the greatest iteger less tha or equal to. Thus ¹ Z is obtaied by droppig the [ ] largest ad [ ] smallest data poits ad the takig the average of the rest. The mea is the extreme case correspodig to = 0.
3 EXPLORATORY DATA ANALYSIS 3 To compare the robustess properties of the mea ad a -trimmed mea, we itroduce the cocept of breakdow poit. Let T(Z) be a measure of ceter for a batch Z of size, ad let T(Z ) be the same measure for a ew batch Z obtaied by replacig ay m of the the origial data poits by arbitrary values. Let b(m;t; Z) = supjt(z ) T(Z)j; Z where the supremum is take over all possible Z. If b(m; Z; Z ) is i ite, this meas that m outliers ca have a arbitrarily large e ect o T, which may be expressed by sayig that T \breaks dow". Therefore, the breakdow poit of T is de ed by h m i ²(T; Z) = mi : b(m;t; Z) = 1 : I other words, the breakdow poit is the smallest fractio of cotamiatio that ca cause T(Z ) to take o values arbitrarily far from T(Z). It is straightforward to verify that the breakdow poit of the mea is equal to 1=, whereas the breakdow poit of the -trimmed mea is equal to ([ ] + 1)=. The media may be viewed as the extreme case of a -trimmed mea correspodig to! :5. Whe the umber of data poits i Z is odd, the media ~Z is uique ad is equal to Z ([+1]=2). Whe is eve, a media is ay poit i the iterval [Z (=2) ; Z ([=2]+1) ]. This lack of uiqueess is covetioally resolved by de ig ½ Z([+1]=2) ; ~Z = if is odd, :5[Z (=2) + Z ([=2]+1) ]; if is eve. Notice that if is odd, the media exactly coicides with oe of the observatios. If is eve, the media is the average of two adjacet order statistics. It is easy to verify that if g is ay icreasig fuctio ad X is such that X i = g(z i ), the ~X = g( ~Z). The breakdow poit of the media is equal to 1/2 if is eve, ad is equal to (1 + 1 )=2 if is odd. With little loss of geerality, let ~Z 1 be the media of a batch of size 1, where 1 = 2k is eve. Thus, ~Z 1 = :5[Z (k) +Z (k+1) ]. The media of the batch of size obtaied by addig the value z to the previous batch is equal to 8 < Z (k) ; if z < Z (k), ~Z = z; if Z (k) z Z (k+1), : Z (k+1) ; if z > Z (k+1). To compare the sesitivity curves of the mea ad the media, cosider the case whe ¹ Z 1 = ~ Z 1. The while SC(z; ¹ Z) = z ¹ Z 1 ; 8 SC(z; Z) ~ < (Z (k) Z ¹ 1 ); if z < Z (k), = (z Z : ¹ 1 ); if Z (k) z Z (k+1), (Z (k+1) Z ¹ 1 ); if z > Z (k+1).
4 4 Istead of choosig a sigle measure of ceter, it is ofte more iformative to compute ad compare several measures. For example, comparig the mea ad the media gives idicatio about the presece of skeweess i the data (skewess is aother vague cocept!). If the data are symmetric, the the mea ad the media coicide. If the data are skewed to the left, the the mea is greater tha the media. If the data are skewed to the right, the the media is greater tha the mea. 1.2 MEASURES OF SPREAD Two measures of spread (or scale) based o order statistics are the rage rage = maxfz i g mifz i g = Z () Z (1) i i ad the iterquartile rage IQR = upper quartile - lower quartile, where the upper quartile is the media of the data greater or equal to the media, ad the lower quartile is the media of the data smaller or equal to the media. Two other commo measures of spread are the mea squared deviatio from the mea ^¾ 2 = 1 X (Z i ¹Z) 2 ; or its square root ^¾ called the stadard deviatio, ad the mea absolute deviatio from the mea X ~¾ = 1 jz i ¹Zj: The rst is just the mea of the squared deviatios (Z i ¹Z) 2, while the secod is the mea of the absolute deviatios jz i ¹Zj. Because of their mea-like ature, either measure is robust. It is easily see that X ^¾ 2 = 1 Zi 2 ¹Z 2 : Further, if X is such that X i = a + bz i, b 6= 0, the ^¾ X = jbj ^¾ Z ; ~¾ X = jbj ~¾ Z : A highly robust estimate of spread is the media absolute deviatio from the media MAD = Med i jz i ~ Zj: 1.3 MEASURES OF SHAPE Oe measure of ceter ad oe measure of spread are ofte all oe eeds to cocisely summarize the data. Just a pair of summary statistics, however, does
5 EXPLORATORY DATA ANALYSIS 5 ot provide a accurate descriptio of the data, i the sese that arbitrarily di eret batches of data may result i exactly the same descriptio. J.W. Tukey suggested the use of a box-plot, a graphical procedure that combies a measure of locatio (the media), a measure of spread (the IQR), shows the presece of possible outliers, ad gives some idicatio about the shape of the distributio of the data i terms of their symmetry or skewess. Costructio of a box-plot proceeds as follows: 1. Horizotal lies are draw at the media ad the upper ad lower quartiles are joied by vertical lies to produce the box. 2. Vertical lie is draw up from the upper quartile to the most extreme data poit that is withi a distace of 1:5 IQR from the upper quartile. A similarly de ed vertical lie is draw dow from the lower quartile. Short horizotal lies are added to mark the eds of these vertical lies. 3. Each data poit beyod the eds of the vertical lie is marked with a asterix or a dot. Symmetry or asymmetry is revealed by the locatio of the media relative to the upper ad lower quartiles. If a large batch of data is available, oe ca study its shape i more detail. The mai tool is the empirical distributio fuctio (edf) F (z), de ed as the fractio of data poits less tha or equal to z. Let 1fAg deote the idicator fuctio of the evet A, that is, ½ 1; if A occurs, 1fAg = 0; otherwise. The we ca simply write F (z) = 1 X 1fZ i zg: Notice that F is a o-decreasig step fuctio, bouded betwee 0 ad 1, with jumps of height 1= at each distict poit Z i. If a data value is repeated m times, the jump is equal to m=. The edf. summarizes all the iformatio cotaied i a batch of data, except the order i which the observatios eter the batch. Notice that i some cases, such as time-series, time order may be importat. There exists a simple relatioship betwee the edf ad the set of order statistics. By the de itio of order statistic, the umber of data poits less tha or equal to Z (i) is equal to i. Thus F (Z (i) ) = i ; ad we say that Z (i) is the i=-quatile of the empirical distributio of Z. Sometimes it is useful to compare two edf's by meas of Q{Q plots. I a Q{Q plot, the quatiles of oe batch of data are plottet agaist those of aother.
6 6 To iterpret a Q{Q plot, the followig result is useful. If X ad Z are batches of data such that X i = a + bz i, 0 < b < 1, the Z (i) = a + bz (i). This implies that a Q{Q plot of X ad Z is a straight lie with slope equal to b ad itercept equal to a. Istead of workig with the edf, it is sometimes coveiet to work with a equivalet represetatio, amely the empirical survival fuctio S (z) = 1 X 1fZ i > zg = 1 F (z): This is just the fractio of data poits greater tha z. Clearly, S (Z (i) ) = ( i)=. The empirical survival fuctio is ofte used i the case of data o time util failure or death, such as idividual life-times or uemploymet duratio data. A alterative way of displayig the shape of a batch of data is by meas of a histogram. To costruct a histogram, partitio the rage of the data ito itervals or bis of a certai (possibly uequal) bi width. A histogram is the obtaied by plottig the fractio of observatios i each bi divided by the bi width. Thus, if the bi width is costat ad equal to ±, the height h (z;±) of a histogram at a poit z is equal to the umber of data poits i the bi cotaiig z divided by ±. Thus, ±h (y;±) is just the relative frequecy of data i the same bi cotaiig z. If there are m bis of equal size, the ± = Z () Z (1) : m Viewed as a fuctio, h ( ;±) is o-egative ad itegrates up to oe, that is, h ( ;±) 0 ad R h (z;±)dz = 1. A crucial problem i costructig a histogram is the choice of the umber of bis. Too may bis (or, equivaletly if the bi width is costat, too small a bi width) make a histogram look too ragged, too few bis (too large a bi width) make the histogram look oversmoothed. If data are ot uiformly distributed, it may be useful to let the biwidth vary with the local desity of the data. I this case, wider bis will be chose where the data are more sparse, ad arrower bis where the data are more dese. REFERENCES Hoagli D.C., Mosteller F. ad Tukey J.W. (1983) Uderstadig Robust ad Exploratory Data Aalysis, Wiley, New York. Mosteller F. ad Tukey J.W. (1977) Data Aalysis ad Regressio: A Secod Course i Statistics, Addiso-Wesley, Readig, MA. Tukey J. (1977) Exploratory Data Aalysis, Addiso-Wesley, Readig, MA.
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationDescriptive Statistics
Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote
More informationI. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
More informationGCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.
GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More information3. Greatest Common Divisor - Least Common Multiple
3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
More informationNon-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationNow here is the important step
LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More informationThis document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.
SPC Formulas ad Tables 1 This documet cotais a collectio of formulas ad costats useful for SPC chart costructio. It assumes you are already familiar with SPC. Termiology Geerally, a bar draw over a symbol
More informationChapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationChapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationConfidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.
Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).
More informationApproximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
More informationThe following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
More informationSection 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
More informationMEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)
MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:
More informationLecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
More informationOverview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals
Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationBasic Data Analysis Principles. Acknowledgments
CEB - Basic Data Aalysis Priciples Basic Data Aalysis Priciples What to do oce you get the data Whe we reaso about quatitative evidece, certai methods for displayig ad aalyzig data are better tha others.
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationhp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation
HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationTrigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
More informationCHAPTER 7: Central Limit Theorem: CLT for Averages (Means)
CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationTHE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction
THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,
More informationBiology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships
Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More informationA Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets
A Review ad Compariso of Methods for Detectig Outliers i Uivariate Data Sets by Sogwo Seo BS, Kyughee Uiversity, Submitted to the Graduate Faculty of Graduate School of Public Health i partial fulfillmet
More informationChapter XIV: Fundamentals of Probability and Statistics *
Objectives Chapter XIV: Fudametals o Probability ad Statistics * Preset udametal cocepts o probability ad statistics Review measures o cetral tedecy ad dispersio Aalyze methods ad applicatios o descriptive
More informationCS103X: Discrete Structures Homework 4 Solutions
CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least
More informationDefinition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean
1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationCS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations
CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad
More informationCHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationChapter 5: Inner Product Spaces
Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples
More informationFloating Codes for Joint Information Storage in Write Asymmetric Memories
Floatig Codes for Joit Iformatio Storage i Write Asymmetric Memories Axiao (Adrew Jiag Computer Sciece Departmet Texas A&M Uiversity College Statio, TX 77843-311 ajiag@cs.tamu.edu Vaske Bohossia Electrical
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationTHE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
More information2-3 The Remainder and Factor Theorems
- The Remaider ad Factor Theorems Factor each polyomial completely usig the give factor ad log divisio 1 x + x x 60; x + So, x + x x 60 = (x + )(x x 15) Factorig the quadratic expressio yields x + x x
More informationChapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity
More informationExample 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).
BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationVladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
More informationOverview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
More informationSampling Distribution And Central Limit Theorem
() Samplig Distributio & Cetral Limit Samplig Distributio Ad Cetral Limit Samplig distributio of the sample mea If we sample a umber of samples (say k samples where k is very large umber) each of size,
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the
More informationClass Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
More informationODBC. Getting Started With Sage Timberline Office ODBC
ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.
More informationMann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)
No-Parametric ivariate Statistics: Wilcoxo-Ma-Whitey 2 Sample Test 1 Ma-Whitey 2 Sample Test (a.k.a. Wilcoxo Rak Sum Test) The (Wilcoxo-) Ma-Whitey (WMW) test is the o-parametric equivalet of a pooled
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationQuadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
More informationINFINITE SERIES KEITH CONRAD
INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal
More informationA Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:
A Test of Normality Textbook Referece: Chapter. (eighth editio, pages 59 ; seveth editio, pages 6 6). The calculatio of p values for hypothesis testig typically is based o the assumptio that the populatio
More information3. If x and y are real numbers, what is the simplified radical form
lgebra II Practice Test Objective:.a. Which is equivalet to 98 94 4 49?. Which epressio is aother way to write 5 4? 5 5 4 4 4 5 4 5. If ad y are real umbers, what is the simplified radical form of 5 y
More informationUniversal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
More information.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
More informationSAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
More information1. MATHEMATICAL INDUCTION
1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1
More informationParametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)
6 Parametric (theoretical) probability distributios. (Wilks, Ch. 4) Note: parametric: assume a theoretical distributio (e.g., Gauss) No-parametric: o assumptio made about the distributio Advatages of assumig
More informationConfidence Intervals
Cofidece Itervals Cofidece Itervals are a extesio of the cocept of Margi of Error which we met earlier i this course. Remember we saw: The sample proportio will differ from the populatio proportio by more
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationFOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10
FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10 [C] Commuicatio Measuremet A1. Solve problems that ivolve liear measuremet, usig: SI ad imperial uits of measure estimatio strategies measuremet strategies.
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationTradigms of Astundithi and Toyota
Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationHow To Understand The Theory Of Coectedess
35 Chapter 1: Fudametal Cocepts Sectio 1.3: Vertex Degrees ad Coutig 36 its eighbor o P. Note that P has at least three vertices. If G x v is coected, let y = v. Otherwise, a compoet cut off from P x v
More information7. Concepts in Probability, Statistics and Stochastic Modelling
7. Cocepts i Probability, Statistics ad Stochastic Modellig 1. Itroductio 169. Probability Cocepts ad Methods 170.1. Radom Variables ad Distributios 170.. Expectatio 173.3. Quatiles, Momets ad Their Estimators
More information, a Wishart distribution with n -1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationParameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker
Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom
More informationPractice Problems for Test 3
Practice Problems for Test 3 Note: these problems oly cover CIs ad hypothesis testig You are also resposible for kowig the samplig distributio of the sample meas, ad the Cetral Limit Theorem Review all
More informationTaking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
More informationEscola Federal de Engenharia de Itajubá
Escola Federal de Egeharia de Itajubá Departameto de Egeharia Mecâica Pós-Graduação em Egeharia Mecâica MPF04 ANÁLISE DE SINAIS E AQUISÇÃO DE DADOS SINAIS E SISTEMAS Trabalho 02 (MATLAB) Prof. Dr. José
More informationPartial Di erential Equations
Partial Di eretial Equatios Partial Di eretial Equatios Much of moder sciece, egieerig, ad mathematics is based o the study of partial di eretial equatios, where a partial di eretial equatio is a equatio
More informationOur aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series
8 Fourier Series Our aim is to show that uder reasoable assumptios a give -periodic fuctio f ca be represeted as coverget series f(x) = a + (a cos x + b si x). (8.) By defiitio, the covergece of the series
More information4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.
4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics
More informationInference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval
Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio
More informationFactoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>
(March 16, 004) Factorig x 1: cyclotomic ad Aurifeuillia polyomials Paul Garrett Polyomials of the form x 1, x 3 1, x 4 1 have at least oe systematic factorizatio x 1 = (x 1)(x 1
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
More informationAnalysis Notes (only a draft, and the first one!)
Aalysis Notes (oly a draft, ad the first oe!) Ali Nesi Mathematics Departmet Istabul Bilgi Uiversity Kuştepe Şişli Istabul Turkey aesi@bilgi.edu.tr Jue 22, 2004 2 Cotets 1 Prelimiaries 9 1.1 Biary Operatio...........................
More information