Chapter 3 3-1. Chapter Goals. Summary Measures. Chapter Topics. Measures of Center and Location. Notation Conventions

Similar documents
ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Simple Linear Regression

Average Price Ratios

Session 4: Descriptive statistics and exporting Stata results

The simple linear Regression Model

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

Basic statistics formulas

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Report 52 Fixed Maturity EUR Industrial Bond Funds

MDM 4U PRACTICE EXAMINATION

Regression Analysis. 1. Introduction

1. The Time Value of Money

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

An Effectiveness of Integrated Portfolio in Bancassurance

Curve Fitting and Solution of Equation

Measures of Central Tendency: Basic Statistics Refresher. Topic 1 Point Estimates

Numerical Methods with MS Excel

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Report 19 Euroland Corporate Bonds

Reinsurance and the distribution of term insurance claims

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

The Analysis of Development of Insurance Contract Premiums of General Liability Insurance in the Business Insurance Risk

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Settlement Prediction by Spatial-temporal Random Process

APPENDIX III THE ENVELOPE PROPERTY

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT

AP Statistics 2006 Free-Response Questions Form B

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Michael J. Rosenfeld, draft version 1.7 (under construction). draft November 5, 2015

Numerical Comparisons of Quality Control Charts for Variables

Speeding up k-means Clustering by Bootstrap Averaging

Preparation of Calibration Curves

CHAPTER 2. Time Value of Money 6-1

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Chapter Eight. f : R R

Classic Problems at a Glance using the TVM Solver

Descriptive Statistics

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

10.5 Future Value and Present Value of a General Annuity Due

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Questions? Ask Prof. Herz, General Classification of adsorption

Beta. A Statistical Analysis of a Stock s Volatility. Courtney Wahlstrom. Iowa State University, Master of School Mathematics. Creative Component

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Online Appendix: Measured Aggregate Gains from International Trade

Borehole breakout and drilling-induced fracture analysis from image logs

Report 06 Global High Yield Bonds

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Data Analysis Toolkit #10: Simple linear regression Page 1

Performance Attribution. Methodology Overview

CSSE463: Image Recognition Day 27

Report 05 Global Fixed Income

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education.

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

Statistical Techniques for Sampling and Monitoring Natural Resources

Banking (Early Repayment of Housing Loans) Order,

Lesson 17 Pearson s Correlation Coefficient

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information

On Error Detection with Block Codes

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Measures of Spread and Boxplots Discrete Math, Section 9.4

ANNEX 77 FINANCE MANAGEMENT. (Working material) Chief Actuary Prof. Gaida Pettere BTA INSURANCE COMPANY SE

Response surface methodology

SPATIAL INTERPOLATION TECHNIQUES (1)

The Time Value of Money

Near Neighbor Distribution in Sets of Fractal Nature

RUSSIAN ROULETTE AND PARTICLE SPLITTING

of the relationship between time and the value of money.

Robust Realtime Face Recognition And Tracking System

Fundamentals of Mass Transfer

1. Measuring association using correlation and regression

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Using the Geographically Weighted Regression to. Modify the Residential Flood Damage Function

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Confidence Intervals for One Mean

Vibration and Speedy Transportation

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Load and Resistance Factor Design (LRFD)

3 Multiple linear regression: estimation and properties

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

Sequences and Series

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

Loss Distribution Generation in Credit Portfolio Modeling

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

Efficient Traceback of DoS Attacks using Small Worlds in MANET

MODELLING OF STOCK PRICES BY THE MARKOV CHAIN MONTE CARLO METHOD

Transcription:

Chapter 3 3- Chapter Goals Chapter 3 umercal Descrptve Measures After completg ths chapter, you should be able to: Compute ad terpret the mea, meda, ad mode for a set of data Fd the rage, varace, ad stadard devato ad kow what these values mea Costruct ad terpret a box ad whskers plot Compute ad expla the coeffcet of varato Use umercal measures alog wth graphs, charts, ad tables to descrbe data adell GSBA 50 Chap 3- adell GSBA 50 Chap 3- Chapter Topcs Summary Measures Measures of Ceter ad Locato Mea, meda, mode, geometrc mea, mdrage Other measures of Locato Weghted mea, percetles, quartles Measures of Varato Rage, terquartle rage, varace ad stadard devato, coeffcet of varato Skewess (shape) Lear correlato adell GSBA 50 Chap 3-3 Ceter ad Locato Mea Meda Mode Weghted Mea Descrbg Data umercally Other Measures of Locato Percetles Quartles Varato Rage Iterquartle Rage Varace Stadard Devato Coeffcet of Varato Skewess adell GSBA 50 Chap 3-4 otato Covetos Populato Parameters are deoted wth a letter from the Greek alphabet: : (mu) represets the populato mea F (sgma) represets the populato stadard devato Sample Statstcs are commoly deoted wth letters from the Roma alphabet: _ (-bar) represets the sample mea S represets the sample stadard devato adell GSBA 50 Chap 3-5 μ Measures of Ceter ad Locato Ceter ad Locato Mea Meda Mode Weghted Mea Mdpot of raked values Overvew Most frequetly observed value adell GSBA 50 Chap 3-6 μ W W w w w w

Chapter 3 3- Mea (Arthmetc Average) The Mea s the arthmetc average of data values Sample mea Sample Sze + + L+ Populato mea μ Populato Sze + + L + adell GSBA 50 Chap 3-7 Mea (Arthmetc Average) The most commo measure of cetral tedecy (cotued) Mea sum of values dvded by the umber of values Affected by extreme values (outlers) 0 3 4 5 6 7 8 9 0 Mea 3 + + 3 + 4 + 5 5 3 5 5 0 3 4 5 6 7 8 9 0 Mea 4 + + 3 + 4 + 0 0 4 5 5 adell GSBA 50 Chap 3-8 Meda ot affected by extreme values 0 3 4 5 6 7 8 9 0 Meda 3 0 3 4 5 6 7 8 9 0 Meda 3 I a ordered array, the meda s the mddle umber (50% above, 50% below) If or s odd, the meda s the mddle umber If or s eve, the meda s the average of the two mddle umbers adell GSBA 50 Chap 3-9 Meda To fd the meda, rak the values order of magtude Fd the value the (+)/ posto If s a eve umber, let the meda be the mea of the two mddle-most observatos. (cotued) adell GSBA 50 Chap 3-0 Mode A measure of cetral tedecy Value that occurs most ofte ot affected by extreme values Used for ether umercal or categorcal data There may may be o mode There may be several modes 0 3 4 5 6 7 8 9 0 3 4 Mode 5 0 3 4 5 6 o Mode adell GSBA 50 Chap 3- Weghted Mea Used whe values are grouped by frequecy or relatve mportace Example: Sample of 6 Repar Projects Days to Frequecy Complete 5 4 6 7 8 8 Weghted Mea Days to Complete: w (4 5) + ( 6) + (8 7) + ( 8) W w 4 + + 8 + 64 6 6.3 days adell GSBA 50 Chap 3-

Chapter 3 3-3 Revew Example Summary Statstcs Fve houses o a hll by the beach House Prces: $,000,000 500,000 300,000 00,000 00,000 $300 K $00 K $,000 K $500 K $00 K House Prces: $,000,000 500,000 300,000 00,000 00,000 Sum $3,000,000 Mea: ($3,000,000/5) $600,000 Meda: mddle value of raked data $300,000 Mode: most frequet value $00,000 adell GSBA 50 Chap 3-3 adell GSBA 50 Chap 3-4 Whch measure of locato s the best? Mea s geerally used, uless extreme values (outlers) exst The meda s ofte used, sce the meda s ot sestve to extreme values Example: Meda home prces may be reported for a rego less sestve to outlers ote The mea ad meda values do ot have to be values that are part of the data set. Example (four observatos):, 3, 4, Mea (+3+4+)/4 5 Meda 3.5 (The meda posto s (+)/.5 th posto, so use the mdpot of mddle-most values) adell GSBA 50 Chap 3-5 adell GSBA 50 Chap 3-6 Shape of a Dstrbuto Descrbes how data s dstrbuted Symmetrc or skewed Left-Skewed Symmetrc Rght-Skewed Measurg Skewess A umber called the coeffcet of skewess (SK) s commoly used to measure skewess: _ 3 (! Meda) SK S Mea < Meda < Mode (Loger tal exteds to left) Mea Meda Mode Mode < Meda < Mea (Loger tal exteds to rght) where S _ s the sample stadard devato, s the sample mea, ad Meda s the sample meda. adell GSBA 50 Chap 3-7 adell GSBA 50 Chap 3-8

Chapter 3 3-4 Measurg Skewess The magtude of SK dcates the degree of skewess, where -3 # SK # +3 SK < 0 6 skewed left SK 0 6 symmetrc (ot skewed) SK > 0 6 skewed rght (cotued) The coeffcet of skewess s calculated ad reported by may computer statstcal software packages. Other Locato Measures Percetles Other Measures of Locato The p th percetle a data array: p% are less tha or equal to ths value (00 p)% are greater tha or equal to ths value (where 0 p 00) Quartles st quartle 5 th percetle d quartle 50 th percetle meda 3 rd quartle 75 th percetle adell GSBA 50 Chap 3-9 adell GSBA 50 Chap 3-0 Percetles The p th percetle a ordered array of values s the value th posto, where p ( + ) 00 Example: The 60 th percetle a ordered array of 9 values s the value th posto: p 60 ( + ) (9 + ) 00 00 adell GSBA 50 Chap 3- Quartles Quartles splt the raked data to 4 equal groups 5% 5% 5% 5% Q Q Q3 Example: Fd the frst quartle Sample Data Ordered Array: 3 6 6 7 8 ( 9) Q 5 th percetle, so fd the 5 (9+).5 posto 00 so use the value half way betwee the d ad 3 rd values, so Q.5 adell GSBA 50 Chap 3- Quartle Formulas Fd a quartle by determg the value the x th posto of the raked data, where Frst quartle: Secod quartle: Thrd quartle: Q (+)/4 Q (+)/ (the meda posto) Q 3 3(+)/4 where s the umber of observed values Box ad Whsker Plot A Graphcal dsplay of data usg 5-umber summary: Example: Mmum -- Q -- Meda -- Q3 -- Maxmum 5% 5% 5% 5% Mmum st Meda 3rd Maxmum Mmum Quartle st Meda 3rd Quartle Maxmum Quartle Quartle adell GSBA 50 Chap 3-3 adell GSBA 50 Chap 3-4

Chapter 3 3-5 Shape of Box ad Whsker Plots Dstrbuto Shape ad Box ad Whsker Plot The Box ad cetral le are cetered betwee the edpots f data s symmetrc aroud the meda Left-Skewed Symmetrc Rght-Skewed A Box ad Whsker plot ca be show ether vertcal or horzotal format Q Q Q3 Q Q Q3 Q Q Q3 adell GSBA 50 Chap 3-5 adell GSBA 50 Chap 3-6 Box-ad-Whsker Plot Example Below s a Box-ad-Whsker plot for the followg data: M Q Q Q3 Max 0 3 3 4 5 5 0 7 Usg PHStat to costruct a Box-ad-Whsker Plot The PHStat add- ca be used to easly create a Box-ad-Whsker Plot. If you have several data sets ad wsh to make comparsos, PHStat ca create multple plots the same dsplay wdow 0 3 5 7 Ths data s very rght skewed, as the plot depcts Clck here to see a Box-ad-Whsker plot created usg PHStat adell GSBA 50 Chap 3-7 adell GSBA 50 Chap 3-8 Measures of Varato Varato Rage Iterquartle Rage Varato Varace Stadard Devato Coeffcet of Varato Populato Varace Populato Stadard Devato Measures of varato gve formato o the spread or varablty of the data values. Sample Varace Sample Stadard Devato Same ceter, dfferet varato adell GSBA 50 Chap 3-9 adell GSBA 50 Chap 3-30

Chapter 3 3-6 Rage Smplest measure of varato Dfferece betwee the largest ad the smallest observatos: Example: Rage x maxmum x mmum 0 3 4 5 6 7 8 9 0 3 4 Rage 4-3 Dsadvatages of the Rage Igores the way whch data are dstrbuted 7 8 9 0 7 8 9 0 Rage - 7 5 Rage - 7 5 Sestve to outlers,,,,,,,,,,,,,,,,,,,3,3,3,3,4,5 Rage 5-4,,,,,,,,,,,,,,,,,,,3,3,3,3,4,0 Rage 0-9 adell GSBA 50 Chap 3-3 adell GSBA 50 Chap 3-3 Iterquartle Rage Iterquartle Rage Ca elmate some outler problems by usg the terquartle rage Elmate some hgh- ad low-valued observatos ad calculate the rage from the remag values Iterquartle rage 3 rd quartle st quartle Example: Meda mmum Q (Q) Q3 5% 5% 5% 5% 30 45 57 70 Iterquartle rage 57 30 7 maxmum adell GSBA 50 Chap 3-33 adell GSBA 50 Chap 3-34 Varace Stadard Devato lste Average of squared devatos of values from the mea Sample varace: ( ) S - Populato varace: ( μ) adell GSBA 50 Chap 3-35 σ Most commoly used measure of varato Shows varato about the mea Has the same uts as the orgal data Sample stadard devato: Populato stadard devato: S σ ( μ) adell GSBA 50 Chap 3-36 ( ) -

Chapter 3 3-7 Calculato Example: Sample Stadard Devato Sample Data ( ) : 0 4 5 7 8 8 4 8 Mea 6 Measurg varato Small stadard devato s (0 x ) + ( x ) + (4 x ) + L + (4 x ) (0 6) + ( 6) + (4 6) 8 + L + (4 6) Large stadard devato 6 7 4.46 adell GSBA 50 Chap 3-37 adell GSBA 50 Chap 3-38 Comparg Stadard Devatos Advatages of Varace ad Stadard Devato Data A 3 4 5 6 7 8 9 0 Data B 3 4 5 6 7 8 9 0 Data C 3 4 5 6 7 8 9 0 Mea 5.5 s 3.338 Mea 5.5 s.958 Mea 5.5 s 4.57 Each value the data set s used the calculato Values far from the mea are gve extra weght (because devatos from the mea are squared) adell GSBA 50 Chap 3-39 adell GSBA 50 Chap 3-40 The Emprcal Rule If the data dstrbuto s bell-shaped, the the terval: μ ± σ cotas about 68% of the values the populato or the sample The Emprcal Rule μ ± σ cotas about 95% of the values the populato or the sample μ ± 3σ cotas about 99.7% of the values the populato or the sample 68% μ μ ±σ 95% μ ± σ 99.7% μ ± 3σ adell GSBA 50 Chap 3-4 adell GSBA 50 Chap 3-4

Chapter 3 3-8 lste Coeffcet of Varato Measures relatve varato Always percetage (%) Shows varato relatve to mea Is used to compare two or more sets of data measured dfferet uts CV Populato σ μ 00% Sample S CV 00% adell GSBA 50 Chap 3-43 Stock A: Comparg Coeffcet of Varato Average prce last year $50 Stadard devato $5 S $5 CV A 00% 00% 0% $50 Stock B: Average prce last year $00 Stadard devato $5 S $5 CV B 00% 00% 5% $00 Both stocks have the same stadard devato, but stock B s less varable relatve to ts prce adell GSBA 50 Chap 3-44 Usg Mcrosoft Excel Usg Excel Descrptve Statstcs are easy to obta from Mcrosoft Excel Use meu choce: tools / data aalyss / descrptve statstcs Eter detals dalog box Use meu choce: tools / data aalyss / descrptve statstcs Clck here to ope house prce worksheet, the follow steps show below to obta descrptve statstcs adell GSBA 50 Chap 3-45 adell GSBA 50 Chap 3-46 Eter dalog box detals Check box for summary statstcs Usg Excel (cotued) Mcrosoft Excel descrptve statstcs output, usg the house prce data: House Prces: $,000,000 500,000 300,000 00,000 00,000 Excel output Clck OK adell GSBA 50 Chap 3-47 Clck here to start demo adell GSBA 50 Chap 3-48

Chapter 3 3-9 Scatter Plots ad Correlato Scatter Plot Examples A scatter plot (or scatter dagram) s used to show the relatoshp betwee two varables Strog relatoshps Weak relatoshps Correlato aalyss s used to measure stregth of the lear assocato (lear relatoshp) betwee two varables Oly cocered wth stregth of the relatoshp o causal effect s mpled adell GSBA 50 Chap 3-49 adell GSBA 50 Chap 3-50 Scatter Plot Examples (cotued) o relatoshp Correlato Coeffcet The populato correlato coeffcet ρ (rho) measures the stregth of the assocato betwee the varables The sample correlato coeffcet r s a estmate of ρ ad s used to measure the stregth of the lear relatoshp the sample observatos (cotued) adell GSBA 50 Chap 3-5 adell GSBA 50 Chap 3-5 Features of ρ ad r Examples of Approxmate r Values Ut free Rage betwee - ad The closer to -, the stroger the egatve lear relatoshp The closer to, the stroger the postve lear relatoshp The closer to 0, the weaker the lear relatoshp r - r -.6 r 0 adell GSBA 50 Chap 3-53 r +.3 r + adell GSBA 50 Chap 3-54

Chapter 3 3-0 where: Calculatg the Correlato Coeffcet Sample correlato coeffcet: r r or the algebrac equvalet: [( ( [ ( )( ) ) ][ r Sample correlato coeffcet Sample sze Value of the depedet varable Value of the depedet varable adell GSBA 50 Chap 3-55 ( ) ] y ) ( ) ][( ) ( ) ] Calculato Example Tree Heght 35 49 7 33 60 45 5 Σ3 Truk Dameter 8 9 7 6 3 7 Σ73 Σ34 60 Σ4 64 8 49 36 69 49 44 Σ73 adell GSBA 50 Chap 3-56 80 44 89 98 780 47 495 6 5 40 79 089 3600 44 05 Tree Heght, 70 60 50 40 30 0 Calculato Example r (cotued) [( ) ( ) ][( ) ( ) ] 8(34) (73)(3) [8(73) (73) ][8(4) (3) ] 0.886 Excel Output Excel Correlato Output Tools / data aalyss / correlato Tree Heght Truk Dameter Tree Heght Truk Dameter 0.8863 0 0 0 4 6 8 0 4 Truk Dameter, r 0.886 relatvely strog postve lear assocato betwee ad Correlato betwee Tree Heght ad Truk Dameter adell GSBA 50 Chap 3-57 adell GSBA 50 Chap 3-58 Chapter Summary Fal Demostrato Descrbed measures of ceter ad locato mea, meda, mode Dscussed percetles ad quartles Descrbed measure of varato rage, terquartle rage, varace, stadard devato, coeffcet of varato Created Box-ad-Whsker plots Illustrated dstrbuto shapes (symmetrc, skewed) Dscussed lear correlato Clck here to see a demo of a sde-by-sde box-ad-whsker plot ad see how to get summary statstcs adell GSBA 50 Chap 3-59 adell GSBA 50 Chap 3-60