Recitation, Week 3: Basic Descriptive Statistics and Measures of Central Tendency:



Similar documents
4. Descriptive Statistics: Measures of Variability and Central Tendency

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

6 3 The Standard Normal Distribution

Descriptive Statistics and Measurement Scales

Descriptive Statistics

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Data exploration with Microsoft Excel: analysing more than one variable

Lesson 4 Measures of Central Tendency

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Data exploration with Microsoft Excel: univariate analysis

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Measures of Central Tendency and Variability: Summarizing your Data for Others

Excel Charts & Graphs

Scatter Plots with Error Bars

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics

Introduction; Descriptive & Univariate Statistics

Data Analysis Tools. Tools for Summarizing Data

Describing, Exploring, and Comparing Data

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

MEASURES OF VARIATION

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Normal distributions in SPSS

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Module 4: Data Exploration

How to Use a Data Spreadsheet: Excel

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

6.4 Normal Distribution

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Northumberland Knowledge

CALCULATIONS & STATISTICS

Summarizing and Displaying Categorical Data

Chapter 3. The Normal Distribution

A Picture Really Is Worth a Thousand Words

Means, standard deviations and. and standard errors

S P S S Statistical Package for the Social Sciences

IBM SPSS Direct Marketing 23

An introduction to using Microsoft Excel for quantitative data analysis

Exercise 1.12 (Pg )

IBM SPSS Direct Marketing 22

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

ADD-INS: ENHANCING EXCEL

Describing Data: Measures of Central Tendency and Dispersion

Chapter 4. Probability and Probability Distributions

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

AP * Statistics Review. Descriptive Statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Chapter 1 Introduction

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

SPSS Workbook 1 Data Entry : Questionnaire Data

4 Other useful features on the course web page. 5 Accessing SAS

Appendix III: SPSS Preliminary

Mathematical goals. Starting points. Materials required. Time needed

Exploratory data analysis (Chapter 2) Fall 2011

Statistical Data analysis With Excel For HSMG.632 students

Descriptive statistics parameters: Measures of centrality

IBM SPSS Statistics for Beginners for Windows

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Selecting a Sub-set of Cases in SPSS: The Select Cases Command

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Statistics Revision Sheet Question 6 of Paper 2

First Midterm Exam (MATH1070 Spring 2012)

Descriptive Statistics

Week 3&4: Z tables and the Sampling Distribution of X

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Lab 1: The metric system measurement of length and weight

Standard Deviation Estimator

Pivot Tables & Pivot Charts

Analyzing and interpreting data Evaluation resources from Wilder Research

PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD. To explore for a relationship between the categories of two discrete variables

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Frequency Distributions

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

The Kruskal-Wallis test:

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

4. Are you satisfied with the outcome? Why or why not? Offer a solution and make a new graph (Figure 2).

4. Continuous Random Variables, the Pareto and Normal Distributions

Introduction to Statistics and Frequency Distributions

Chapter 4 Displaying and Describing Categorical Data

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

SPSS Explore procedure

Transcription:

Recitation, Week 3: Basic Descriptive Statistics and Measures of Central Tendency: 1. What does Healey mean by data reduction? a. Data reduction involves using a few numbers to summarize the distribution of a variable, or an array of data as he calls it. 2. What is the problem with using only a few numbers to summarize the distribution of a variable? a. Summarizing a distribution involves using the mean, denoted x, or standard deviation, denotedσ, to describe the variable. This inevitably leads to a loss of information (precision and detail). 3. When analyzing descriptive statistics, it is best to describe the data in terms of percentages as opposed to using the frequency count. Comparisons are difficult to conceptualize as raw frequencies. a. EXAMPLE: Instead of saying 20 out of 100 students got 4. on the exam, say 20% of students got 4. on the exam. 4. What is the difference between percentage and proportion? A percentage is a proportion multiplies by 100. 5. What is a measure of central tendency? a. It is a way to summarize the distribution to give you an idea about the typical case of that distribution, in other words, the center of it. b. There are three measures of central tendency i. The mean: describes the typical score ii. The mode: describes the most recurring score 1. Only used with nominal variables iii. The median: is the 50 th Percentile of the distribution 1. A median is a special case of a percentile, which is the percentage of cases below which a specific percentage of cases fall. c. How does the median differ from the mode and the mean? Unlike the mode or the mean, the always represents the exact center of a distribution

of scores, meaning that 50% of the cases always fall above the median and 50% of the cases always fall below the median. d. Characteristics of the mean i. The mean is always the center of any distribution. The mean is the point around which all of the scores cancel out. Mathematically, this says that if I subtract the mean from each value and sum the results, the resulting sum will be equal to 0. ii. The mean may often be very misleading because it is sensitive to all observations whereas the median is not. In fact, the median is less sensitive to extreme observations and therefore it is often better to report the median. 1. To illustrate this, consider the familiar normal or bell curve. This is a symmetric distribution because there are as many values on the left as there are on the right of the center. Many natural phenomena have normal distributions, such as weight, height, etc. 2. There are important distributions that are not symmetric. When a distribution is not symmetric, it is skewed. There are two types of skewed distributions, right skewed and left skewed. 3. EXAMPLE of RIGHT SKEWED: Income. Often it is better to report the median than the mean, since the mean is misleading in extreme cases. a. EXAMPLE. Consider the following summary of AGE. Notice that the arithmetic mean is somewhat greater than the median. The reason is that the distribution is right skewed. If the mean is larger than the median the distribution is skewed. Statistics AGE OF RESPONDENT N Valid Missing Mean Median 1385 2 44.94 41.00

To see this, create a histogram of the age variable. 300 200 100 0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 25.0 35.0 45.0 55.0 65.0 75.0 85.0 Std. Dev = 17.08 Mean = 44.9 N = 1385.00 AGE OF RESPONDENT 6. What is a measure of dispersion? a. Measures of Central Tendency don t tell anything about how much the data values differ from each other. i. EXAMPLE: What is the mean of the following two distributions of AGE? 1. 50 50 50 50 50 2. 10 20 50 80 90 ii. The distributions are obviously very different. b. Measures of dispersion or variability attempt to quantify the spread of observations. c. It is a measure of variability, usually defined in terms of variability around the mean. d. The distance between the individual score and the mean value, mathematically this is ( X i X ). e. The larger the distance from the mean, the larger the deviation will be.

f. If the scores were clustered around the mean, the less variability there will be. i. PRACTICAL EXAMPLE: Let s assume that average income for people with PhD s is $55,000 and average income for people with a high school education is $20,000. Since opportunities for people with merely a HS education are less than those with PhD s most people who only have a HS education would make somewhere aroung 20K, there is not much variation. However, it is possible for PhDs to make anywhere from $20K to $800K per year and hence there is much more variation around the average salary for PhDs than there is for HS graduates. 7. USING SPSS to Produce Measures of Dispersion a. Use Descriptives to find the range and standard deviation for age, educ and tvhours b. To reproduce this output first open the gss98randsamp.save c. Go to the familiar Analyze! Descriptive Statistics! Frequencies d. Put the variables corresponding to age, educ and tvhours in the box labeled Variable(s) e. Click the Statistics button and check mean, minimum, maximum and standard deviation

f. Click continue, then OK g. Notice that the chart in the book looks a little bit different, so lets transpose the rows and columns to make it look like Healey. h. Double click on the output window i. Then from the menu select Pivot! Transpose Rows and Columns, you should get what is shown below.

Statistics AGE OF RESPONDENT HIGHEST YEAR OF SCHOOL COMPLETED HOURS PER DAY WATCHING TV N Valid Missing Mean Std. Deviation Minimum Maximum 1385 2 44.94 17.080 18 89 1381 6 13.37 2.857 0 20 1134 253 2.86 2.197 0 21 j. How do we interpret these results? What is 1 standard deviation above the mean for the variable tvhours? 8. USING THE COMPUTE COMMAND to create an Attitude towards abortion scale a. There are two distinct measures on attitudes toward abortion in the 1998 GSS survey. One variable, abany, asks the respondent to state whether they believe that abortion should be allowed for any reason. The other, abhlth, measure whether they feel abortion should only be allowed to preserve the health of the woman. b. We want to create a summary measure that gives us an overall measure of attitude toward abortion. c. We must know something about the data. If response on abany is 1 then the person was in favor of abortion for any reason. If the value in the dataset is 2 then the person was opposed. Similar thing for abhlth, 1 = in favor of abortion if health is at stake, 2 = not in favor. d. We want an overall measure of anti-abortion position. So we will sum the variables. If the person was in favor of both, then our new variable will have a value of 2 (1+1). In favor of one and not the other gives a value of 3 (2+1 or 1+2). If a person in completely against abortion, the value is going to be 4. e. We need to use the Compute command. To open the Compute Variable dialog box from the menus choose: Transform! Compute

f. In the Compute Variable Dialog box, type abscale, which will represent the variable we are creating. g. Click the button Type and Label and type Abortion Scale, then Continue

h. Select (or type) abany in the variable list and move it into the Numeric Expression box. Then type + and then abhlth and OK. i. Get the frequency distribution of each variable. Statistics ABORTION IF WOMAN WANTS FOR ANY REASON WOMANS HEALTH SERIOUSLY ENDANGERED Abortion Scale N Valid Missing Mean Std. Deviation Minimum Maximum 887 500 1.58.494 1 2 895 492 1.12.321 1 2 855 532 2.6865.67404 2.00 4.00 ABORTION IF WOMAN WANTS FOR ANY REASON Valid Missing YES NO NAP DK NA Cumulative Frequency Percent Valid Percent Percent 372 26.8 41.9 41.9 515 37.1 58.1 100.0 887 64.0 100.0 449 32.4 49 3.5 2.1 500 36.0 1387 100.0

WOMANS HEALTH SERIOUSLY ENDANGERED Valid Missing YES NO NAP DK NA Cumulative Frequency Percent Valid Percent Percent 791 57.0 88.4 88.4 104 7.5 11.6 100.0 895 64.5 100.0 449 32.4 42 3.0 1.1 492 35.5 1387 100.0 Valid Missing 2.00 3.00 4.00 System Abortion Scale Cumulative Frequency Percent Valid Percent Percent 370 26.7 43.3 43.3 383 27.6 44.8 88.1 102 7.4 11.9 100.0 855 61.6 100.0 532 38.4 1387 100.0 Note: for category 3.00, we know these are the situations where the person approved in one situation but not in the other, but we do not know which situation they approved. It seems reasonable that they approved when life of the mother was at stake but not for any reason, but we would have to use other procedures to find that out. SPSS companion exercises 2.5, but choose 1 variable from world.sav, recode it, get frequency distributions for the variable, and summarize the results. 3.4 4.4 4.6