1 Measures for location and dispersion of a sample



Similar documents
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Exploratory data analysis (Chapter 2) Fall 2011

3: Summary Statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Data Exploration Data Visualization

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Lecture 1: Review and Exploratory Data Analysis (EDA)

Exploratory Data Analysis

Variables. Exploratory Data Analysis

Exercise 1.12 (Pg )

Descriptive Statistics

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Module 4: Data Exploration

Means, standard deviations and. and standard errors

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Descriptive statistics parameters: Measures of centrality

Lesson 4 Measures of Central Tendency

Geostatistics Exploratory Analysis

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Exploratory Data Analysis. Psychology 3256

Week 1. Exploratory Data Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

AP * Statistics Review. Descriptive Statistics

Descriptive Statistics

Quantitative Methods for Finance

Implications of Big Data for Statistics Instruction 17 Nov 2013

Measures of Central Tendency and Variability: Summarizing your Data for Others

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Ch. 3.1 # 3, 4, 7, 30, 31, 32

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lecture 2. Summarizing the Sample

Topic 9 ~ Measures of Spread

Diagrams and Graphs of Statistical Data

First Midterm Exam (MATH1070 Spring 2012)

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

determining relationships among the explanatory variables, and

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

STAT355 - Probability & Statistics

How To Write A Data Analysis

Basics of Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics

Box-and-Whisker Plots

Using SPSS, Chapter 2: Descriptive Statistics

Shape of Data Distributions

a. mean b. interquartile range c. range d. median

1.5 Oneway Analysis of Variance

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Introduction to Quantitative Methods

Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II

Descriptive Statistics: Summary Statistics

Chapter 2 Data Exploration

How Far is too Far? Statistical Outlier Detection

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

1 Descriptive statistics: mode, mean and median

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Describing and presenting data

Aspects of Risk Adjustment in Healthcare

IN 1986, the Vietnamese government began a policy of textitdoi moi (renovation), EXPLORATORY DATA ANALYSIS C H A P T E R

EXPLORING SPATIAL PATTERNS IN YOUR DATA

MEASURES OF VARIATION

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Descriptive Statistics

Capital Market Theory: An Overview. Return Measures

Section 1.3 Exercises (Solutions)

THE BINOMIAL DISTRIBUTION & PROBABILITY

3.2 Measures of Spread

List of Examples. Examples 319

Exploratory Data Analysis

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Dongfeng Li. Autumn 2010

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Mean = (sum of the values / the number of the value) if probabilities are equal

Description. Textbook. Grading. Objective

How To Understand And Solve A Linear Programming Problem

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Descriptive Analysis

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Mathematical Conventions Large Print (18 point) Edition

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Transcription:

Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable X Sample of size n Sample: x, x 2,..., x n Summary measures for the location or dispersion of empirical distributions. Measures for location Measures for location or central tendency Definition: Mode The most frequently occurring value or category of X Example: Mode of (2, 2, 3, 3, 4, 5, 5, 5) is 5 Perhaps not well defined: (2, 2, 2, 3, 4, 4, 4) has two modes Measures for location or central tendency Definition: Median Order the sample to get the Order Statistics: x, x 2,..., x n x () x (2)... x (n) Median divides ordered data values, such that 50% of the data values are lower than or equal to the median and 50% are greater than or equal to the median Calculate median x 0.5 as x 0.5 = { x( n+ 2 ) if n is odd 2 (x (n/2) + x (n/2 +) ) if n is even Measures for location or central tendency Definition: Arithmetic mean or average of a sample x = n The mean of a population is often denoted by µ with µ = N N x i x i

Measures for location or central tendency Other measures Geometric mean Harmonic mean Weighted mean/average Truncated or trimmed mean Winsorized mean Measures for location Definition: Quantile Generalization of median α [0, ] The α-quantile divides the ordered data values such that nα% values are lower than or equal to the α-quantile and n( α)% values are greater than or equal to the α-quantile Calculation (usually software dependent, here the book version): x (k) if nα is not an integer, than x α = k is the smallest integer > nα 2 (x (nα) + x (nα+) ) if nα is integer Measures for location Special quantiles α {0., 0.2, 0.3,..., 0.9}: deciles α {0.25, 0.75}: first and third quartile α = 0: minimum α = : maximum.2 Definition: Variance of a sample The variance for a sample is defined as s 2 = n (x i x) 2 or s 2 n = n (x i x) 2 2

Loss of one degree of freedom, since n (x i x) = 0 x minimizes the average of the squared deviations The standard deviation s is then s = s 2 Note: s is not the same as Standard error of the mean (SEM) The variance of a population is σ 2 = N N (x i µ) 2 with µ = N N x i Definition: Variance decomposition k groups (x, x 2,..., x n,),, (x k, x 2k,..., x nk,k) with x j = x ij, j =,..., k and Then s 2 = (x ij x j ) 2, j =,..., k k k s 2 n = n j= with n = k j= and x = n k j= x j ( x j x) 2 + n j= s 2 Definition: Median absolute deviation The median absolute deviation (MAD) is MAD = n x i x 0.5 x 0.5 minimizes the average of the absolute deviations Definition: Range The range is Range = x (n) x () 3

Definition: Interquartile range (IQR) The IQR is IQR = x 0.75 x 0.25 Coefficient of variation Definition: Coefficient of variation v = s x. Assumption: X has positive values.3 Measure for skewness Skewness Definition: Skewness Figure : Distributions which are symmetric, negative skewed and positive skewed The skewness of a sample is g = n ( n (x i x) 3 ) 3 (x i x) 2.4 Graphical display, five point summary of a sample Five point summary: Minimum, first quartile, median, third quartile, maximum. version of a boxplot displays these measures The simple The extended version is given on the next slide 4

, extended version Calculate x 0.25, x 0.5, x 0.75 and IQR Draw a box bounded by x 0.25 and x 0.75, mark x 0.5 with a line Any data observation which lies more than.5 IQR lower than the first quartile or.5 IQR higher than the third quartile is considered an outlier. Indicate the smallest value that is not an outlier by connecting it to the box with a line or "whisker". Indicate outliers by open and closed dots (or stars). "Extreme" outliers, or those which lie more than 3 IQR below the first and above third quartiles respectively, are indicated by the presence of a closed dot or star. "Mild" outliers - that is, those observations which lie more than.5 IQR from the first and third quartile but are not also extreme outliers are indicated by the presence of a open dot. Design of a * o Extreme outlier (Mild) Outlier Whisker Third Quartile Median First Quartile Minimal value which is no outlier Magnitudes data 5

3 4 5 6 7 Noise data 50 00 50 0 50 00 50 s: some notes s are useful to compare distributions (e.g. for different groups) s are more useful than error bars (e.g. x ± 2 SEM) s give a hint for the shape of the distribution (symmetric or not) Multiple modes can not be detected See http://en.wikipedia.org/wiki/box_plot for alternative forms s need to tuning values as e.g. the number of bins in a histogram or the bandwidth in a kernel density estimate 6