The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)



Similar documents
Summarizing and Displaying Categorical Data

Exploratory data analysis (Chapter 2) Fall 2011

Diagrams and Graphs of Statistical Data

Using SPSS, Chapter 2: Descriptive Statistics

Exercise 1.12 (Pg )

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

Lecture 1: Review and Exploratory Data Analysis (EDA)

Variables. Exploratory Data Analysis

Lesson 4 Measures of Central Tendency

Modifying Colors and Symbols in ArcMap

MTH 140 Statistics Videos

Describing, Exploring, and Comparing Data

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

+ Chapter 1 Exploring Data

Demographics of Atlanta, Georgia:

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Data Exploration Data Visualization

AP * Statistics Review. Descriptive Statistics

Exploratory Data Analysis

Mind on Statistics. Chapter 2

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

AMS 7L LAB #2 Spring, Exploratory Data Analysis

Chapter 2 Data Exploration

Week 1. Exploratory Data Analysis

Chapter 4 Displaying and Describing Categorical Data

Section 1.1 Exercises (Solutions)

Chapter 1: Exploring Data

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD. To explore for a relationship between the categories of two discrete variables

LESSON 4 - ADVANCED GRAPHING

Data exploration with Microsoft Excel: univariate analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Data exploration with Microsoft Excel: analysing more than one variable

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 2: Frequency Distributions and Graphs

Sta 309 (Statistics And Probability for Engineers)

2 Describing, Exploring, and

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Descriptive Statistics and Measurement Scales

Module 4: Data Exploration

COMMON CORE STATE STANDARDS FOR

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Descriptive Statistics

Statistics 2014 Scoring Guidelines

Visualizations. Cyclical data. Comparison. What would you like to show? Composition. Simple share of total. Relative and absolute differences matter

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Module 2: Introduction to Quantitative Data Analysis

Exploratory Data Analysis. Psychology 3256

A Picture Really Is Worth a Thousand Words

Name: Date: Use the following to answer questions 2-3:

Descriptive Statistics and Exploratory Data Analysis

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Descriptive statistics parameters: Measures of centrality

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Practice#1(chapter1,2) Name

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

IBM SPSS Statistics for Beginners for Windows

Cell Phone Impairment?

Characteristics of Binomial Distributions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

a. mean b. interquartile range c. range d. median

Lecture 2. Summarizing the Sample

SPSS Manual for Introductory Applied Statistics: A Variable Approach

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Northumberland Knowledge

Analyzing Research Data Using Excel

What Does the Normal Distribution Sound Like?

IBM SPSS Direct Marketing 23

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

What is the purpose of this document? What is in the document? How do I send Feedback?

Fairfield Public Schools

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

MetroBoston DataCommon Training

IBM SPSS Direct Marketing 22

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Visualization Handbook

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

with functions, expressions and equations which follow in units 3 and 4.

Frequency Distributions

Chapter 2: Descriptive Statistics

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

The Normal Distribution

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

STAT355 - Probability & Statistics

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2: Frequency Distributions

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Transcription:

Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data, we need ways to summarize and visualize it. Community Coalitions (n = 175) Summarizing and visualizing variables and relationships between two variables is often known as exploratory data analysis (also known as descriptive statistics). The type of summary statistics and visualization methods to use depends on the type of variables being analyzed (i.e., categorical or quantitative). One Categorical Variable What is your race/ethnicity? White Black Asian Other Display the number or proportion of cases that fall into each category. Frequency Table A frequency table shows the number of cases that fall into each category: What is your race/ethnicity? White Black Asian Other Total 111 29 29 2 4 175 1

Proportion Proportion The sample proportion ( ) of directors in each category is number of cases in category pˆ total number of cases White Black Asian Other Total 111 29 29 2 4 175 The sample proportion of directors who are white is: 111.63 63% 175 Proportion and percent can be used interchangeably. Relative Frequency Table A relative frequency table shows the proportion of cases that fall in each category. Bar Chart In a bar chart, the height of the bar corresponds to the number of cases that fall into each category. White Black Asian Other.63.17.17.1.2 All the numbers in a relative frequency table sum to 1. 1 1 8 111 29 29 2 4 White Black Asian Other Pie Chart Two Categorical Variables In a pie chart, the relative area of each slice of the pie corresponds to the proportion/percentage in each category. Look at the relationship between two categorical variables 1. Race/Ethnicity 2. Gender Black 17% White 63% 17% Asian 1% Other 2% 2

It doesn t matter which variable is displayed in the rows and which in the columns. What proportion of female directors are? A. 12/29 B. 12/175 C. 12/81 D. 81/175 E. 29/175 What proportion of directors are female? A. 12/29 B. 12/175 C. 12/81 D. 81/175 E. 29/175 The proportion of female directors that are The proportion of directors that are female What proportion of directors are female and? A. 12/29 B. 12/175 C. 12/81 D. 81/175 E. 29/175 Side-by-Side Bar Chart In a side-by-side bar chart, the height of each bar corresponds to the number of cases that fall into each category of the table 7 5 3 1 52 59 16 17 13 12 Female Male White Black Other 3 1 3

Side-by-Side Bar Chart In a side-by-side bar chart, the height of each bar corresponds to the number of cases that fall into each category of the table Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side 7 5 3 1 White Black Other 1 8 Other Black White Female Male Female Male Difference in Proportions A difference in proportions is the difference in proportions for one categorical variable (e.g., the proportion who are ) calculated for the different levels of another categorical variable (e.g., gender) Difference in Proportions What is the difference in proportion of male directors who are and female directors who are? M H = sample proportion of male directors who are F H = sample proportion of female directors who are Difference in Proportions = MH - FH What is the difference in gender proportions among directors? M H - F H The proportion of male directors who are The proportion of female directors who are 17/94 12/81.33 One Quantitative Variable When describing quantitative variables we are interested in the distribution of the values it s shape, center, and spread. Shape: Form of the distribution of values Center: Main peak Spread: Relative deviation of the values To understand these concepts we ll look at quantitative variables from the student survey. 4

Dotplot Histogram In a dotplot, each case is represented by a dot and dots are stacked. Number of text messages sent yesterday Create bins (i.e., value intervals) and place each case in the appropriate bin based on its value for the variable of interest. The height of the each bar corresponds to the number of cases that have values falling within that particular interval. Bar Charts vs. Histograms Shape Although they look similar, a histogram is not the same as a bar chart. A bar chart is for categorical data, and the x-axis has no numeric scale. A histogram is for quantitative data, and the x-axis is numeric. Long right tail For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed. For a quantitative variable, the number of bars (or bins) in a histogram is up to you, and the appearance can differ with different number of bars. Symmetric Right-Skewed Left-Skewed Measures of Center Notation The sample size, the number of cases in the sample, is denoted by n. A variable is often denoted by x, and x 1, x 2,, x n represent the n values of the variable x. Example: x = The number of body piercings x 1 = 6 x 2 = x 3 = 1 x 4 = 5 5

Measures of Center Mean The sample mean ( ) is the average, and is computed by adding up all the numbers and dividing by the number of cases. Sample Mean: x x x x n i n n i 1 1 n Measures of Center Median The sample median (m) is the middle value when the data is ordered. If there are an even number of values, the median is the average of the two middle values. Outliers An outlier is a value that is notably different from the other values (e.g., much larger or smaller than the other values) Average number of times Number of text messages checking Facebook per day sent yesterday Resistance A statistic is resistant if it is not heavily affected by outliers. The median is resistant, the mean is not resistant. Number of text messages sent per day: Mean Median Outlier Included 32.6 8 Outlier Removed 9.2 8 Outliers When calculating statistics that are not resistant to outliers, look for outliers and decide whether the outlier is a mistake. If not, you have to decide whether the outlier is part of your population of interest or not. Measures of Center Distribution of NHL Player Salaries m = $1,25, = $2,21, Mean is pulled in the direction of skewness Usually, for outliers that are not a mistake, it s best to run the analysis twice, once with the outlier(s) and once without, to see how much the outlier(s) affect the results. 2 4 6 8 1 Salary (in millions) 6

Assignment Getting Variable Statistics from the GSS Part I Graded Problems 2.18 and 2. Additional Practice Problems (not to be turned in): 2.11 and 2.57 Part II Goto http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss1 Find 3 categorical variables and provide the proportion for each category for each variable. Find 3 quantitative variables and provide the mean & median and whether the distribution of values are symmetric, right skewed, or left skewed. Enter the variable name here. Make sure these boxes are checked. If applicable select the appropriate type of chart. Click on this button and the variable statistics will open up in a new window. Summary Statistics Proportion Frequency table Relative frequency table Visualizations Bar chart Pie chart Summary: One Categorical Variable Summary: Two Categorical Variables Summary Statistics Two-way table Difference in proportions Visualizations Side-by-side bar chart Segmented bar chart 7