Descriptive statistics parameters: Measures of centrality



Similar documents
Lesson 4 Measures of Central Tendency

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Diagrams and Graphs of Statistical Data

Summarizing and Displaying Categorical Data

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Data Exploration Data Visualization

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Exploratory Data Analysis. Psychology 3256

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Descriptive Statistics and Measurement Scales

Week 1. Exploratory Data Analysis

MBA 611 STATISTICS AND QUANTITATIVE METHODS

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Descriptive Statistics

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Measures of Central Tendency and Variability: Summarizing your Data for Others

II. DISTRIBUTIONS distribution normal distribution. standard scores

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Introduction to Quantitative Methods

Chapter 2 Statistical Foundations: Descriptive Statistics

Mathematical goals. Starting points. Materials required. Time needed

Means, standard deviations and. and standard errors

Statistics. Measurement. Scales of Measurement 7/18/2012

Exploratory data analysis (Chapter 2) Fall 2011

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

MEASURES OF VARIATION

Exploratory Data Analysis

How To Write A Data Analysis

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Describing Data: Measures of Central Tendency and Dispersion

Practice#1(chapter1,2) Name

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Visualizations. Cyclical data. Comparison. What would you like to show? Composition. Simple share of total. Relative and absolute differences matter

Quantitative Methods for Finance

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Intro to Data Analysis, Economic Statistics and Econometrics

Introduction; Descriptive & Univariate Statistics

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Lecture 2. Summarizing the Sample

CALCULATIONS & STATISTICS

THE BINOMIAL DISTRIBUTION & PROBABILITY

Intro to Statistics 8 Curriculum

Foundation of Quantitative Data Analysis

MEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Evaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University

Exercise 1.12 (Pg )

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Geostatistics Exploratory Analysis

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Statistics Chapter 2

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Module 3: Correlation and Covariance

430 Statistics and Financial Mathematics for Business

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Data exploration with Microsoft Excel: univariate analysis

Northumberland Knowledge

List of Examples. Examples 319

The correlation coefficient

Variables. Exploratory Data Analysis

3: Summary Statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Sta 309 (Statistics And Probability for Engineers)

1 Descriptive statistics: mode, mean and median

Mind on Statistics. Chapter 2

Visualization and descriptive statistics. D.A. Forsyth

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

CHAPTER THREE. Key Concepts

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

TECHNIQUES OF DATA PRESENTATION, INTERPRETATION AND ANALYSIS

Frequency Distributions

Variable: characteristic that varies from one individual to another in the population

Lecture 5 : The Poisson Distribution

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Descriptive Statistics

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Chapter 1: Exploring Data

Shape of Data Distributions

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Interpreting Data in Normal Distributions

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Dongfeng Li. Autumn 2010

Transcription:

Descriptive statistics parameters: Measures of centrality

Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between central tendency parameters and distribution... 6 Evaluating descriptive statistics statements... 7 References... 8 2 P a g e

Definitions Descriptive statistics = methods or indicators that allow to represent data in a readable form. Some methods allows to prepare data for graphical representation (such as histogram, bar chart, pie chart, box plot, etc.) while others allow to obtain parameters that summarize the investigated data (such as mean, standard deviation, correlation coefficient, etc.) Estimation [1] = the procedure used to determine the value of a particular parameter associated to a population. Two main types of estimators are used in medical statistics: point estimation and interval estimation. Estimator = statistical function of a sample used to estimate an unknown parameter of the population. The value obtain is an estimate of the population parameter. Measures of centrality = simple values that give us information about the distribution of data such as arithmetic mean, median, mode. 3 P a g e

Classification of descriptive statistics parameters Measures of central tendency Measure of dispersion Measures of localization Measures of shape 4 P a g e

More about central tendency estimators Arithmetic mean [2] = a measure of central tendency which allows to characterize the center of the frequency distribution of a quantitative variable that follow a normal distribution. It is calculated by summing all numbers in the sample and then dividing by the number of observations. If the distribution of data is symmetric and unimodal the arithmetic mean is equal to both median (M d ) and modal (M o ) values. If the distribution is unimodal m M d M o for data skewed to the right and m M d M o for data skewed to the left. Median (M d ) = a measure of central tendency used when data are not normal distributed. Its value is not influence by outlier since just a maximum of two values are implied in the calculation; is easy to determine because only one classification is needed; is easy to understand. It is the estimator of central value of a distribution when it is asymmetric or has outliers. The median split the dataset in the way in which 50% of observations are on each side of its value. The steps needed to be follow for calculation the median are: Arrange the n observations in increasing or decreasing order. For n = odd: M d = X (n+1)/2 where M d = median, X i = the i th observation. The median equals the value of the middle observation. Form n = even: M d = X n/2 + X (n/2+1). The median equals the arithmetic mean of the values of these two observations. Mode (M o ) = a measure of central tendency defined as the value with the highest frequency (most represented value of a set). The mode has a value that is little influenced by outliers but is a rarely used measure. Its value is strongly influenced by the fluctuations of a sampling and can strongly vary from one sample to another. A distribution can have a unique mode (called the unimodal distribution) or many modes (called the bimodal, trimodal, multimodal distribution). For a discrete variable, the mode has the advantage of being easy to determine and interpreted and it is used especially when the distribution is not symmetric. For continuous variable, the mode is a good indicator of the center of the data only if there is one dominant value in the data set. If there are many dominant values, the distribution is called plurimodal, cases in which the modes are not the measure of central tendency. A bimodal distribution for example indicates that the considered distribution is in reality heterogenous and is composed of two sub-populations with different central values. 5 P a g e

Relationship between central tendency parameters and distribution The assessment of mean, median and mode could lead to identify the type of asymmetry in the data (see Figures 1-2). 40 median 35 Absolute Frequency 30 25 20 15 10 5 0 0 20 40 60 80 100 120 140 160 180 200 220 240 Income(*100 lei) mode mean Figure 1. Left (positive) asymmetry: Mode < Median < Mean 7 Absolute frequency 6 5 4 3 2 1 0 0 20 40 60 80 100 Test score Figure 2. Right (negative) asymmetry: Mode > Median > Mean 6 P age

Evaluating descriptive statistics statements The evaluation of any statistical parameters and statements must take into consideration both the relevance and the validity of data and analysis on which the statement is based on. The easiest way to evaluate a statistical statement is to follows the Huff s criteria (5 questions) [3]: 1. Who says so? 2. How does he/she know? The estimator reflects the facts? 3. What s missing? All information needed to proper interpretation was provided? 4. Did someone change the subject? The estimator offers the right answer to the wrong problem? 5. Does it make sense? 7 P a g e

References 1 Fisher RA. On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. Lond. Ser A 1922;222:309-368. 2 Simpson T. Aletter to the RightHonorable George Earl of Macclesfield, President of theroyalsociety,on the advantage of taking the mean of a number of observations inpractical astronomy. Philos.Trans.Roy.Soc. Lond. 1755;49:82-93. 3 Huff, D., 1954, How to Lie with Statistics, W. W. Norton & Company, New York, NY. 8 P a g e