Chapter 1: Exploring Data - Key

Similar documents
Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Summarizing and Displaying Categorical Data

Exploratory data analysis (Chapter 2) Fall 2011

Exercise 1.12 (Pg )

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Diagrams and Graphs of Statistical Data

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

AP * Statistics Review. Descriptive Statistics

Variables. Exploratory Data Analysis

Lesson 4 Measures of Central Tendency

Chapter 2: Frequency Distributions and Graphs

Exploratory Data Analysis. Psychology 3256

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

How To Write A Data Analysis

Descriptive Statistics

Lecture 1: Review and Exploratory Data Analysis (EDA)

a. mean b. interquartile range c. range d. median

3: Summary Statistics

+ Chapter 1 Exploring Data

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Using SPSS, Chapter 2: Descriptive Statistics

Exploratory Data Analysis

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

TEACHER NOTES MATH NSPIRED

Descriptive statistics parameters: Measures of centrality

Descriptive Statistics and Measurement Scales

Data Exploration Data Visualization

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Box-and-Whisker Plots

Module 4: Data Exploration

Chapter 1: Exploring Data

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Describing, Exploring, and Comparing Data

Lecture 2. Summarizing the Sample

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Sampling and Descriptive Statistics

Correlation and Regression

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

AP Statistics Solutions to Packet 2

Mathematical goals. Starting points. Materials required. Time needed

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Examples of Data Representation using Tables, Graphs and Charts

Describing and presenting data

2 Describing, Exploring, and

MTH 140 Statistics Videos

determining relationships among the explanatory variables, and

Measures of Central Tendency and Variability: Summarizing your Data for Others

Bar Graphs and Dot Plots

Means, standard deviations and. and standard errors

Sta 309 (Statistics And Probability for Engineers)

THE BINOMIAL DISTRIBUTION & PROBABILITY

Mind on Statistics. Chapter 2

Statistics Chapter 2

Week 1. Exploratory Data Analysis

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

AMS 7L LAB #2 Spring, Exploratory Data Analysis

CALCULATIONS & STATISTICS

Section 1.1 Exercises (Solutions)

Frequency Distributions

STAT355 - Probability & Statistics

MEASURES OF VARIATION

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Chapter 2 Statistical Foundations: Descriptive Statistics

Intro to Statistics 8 Curriculum

Bar Charts, Histograms, Line Graphs & Pie Charts


Foundation of Quantitative Data Analysis

What Does the Normal Distribution Sound Like?

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Northumberland Knowledge

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Basics of Statistics

Topic 9 ~ Measures of Spread

Chapter 2 Data Exploration

Descriptive Statistics

Box-and-Whisker Plots

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

AP STATISTICS REVIEW (YMS Chapters 1-8)

Module 3: Correlation and Covariance

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Polynomial and Rational Functions

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

First Midterm Exam (MATH1070 Spring 2012)

Shape of Data Distributions

Interpreting Data in Normal Distributions

Data exploration with Microsoft Excel: univariate analysis

3 Describing Distributions

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Descriptive Statistics and Exploratory Data Analysis

Transcription:

Name Date Block READING GUIDE - Key Key Vocabulary: individuals variable categorical variable quantitative variable two way table marginal distributions conditional distribution association distribution range spread frequency outlier center shape skewed left skewed right symmetric dot plot histogram stemplot split stems back-to-back stemplot time plot mean nonresistant x median resistant quartiles Q1, Q3 IQR five-number summary minimum maximum boxplot modified boxplot standard deviation variance INTRO Analyzing Categorical Data (pp.2-6) 1. How is statistics defined? The science of data organizing, displaying, summarizing, and asking questions about data (i.e. data analysis) 2. Define data analysis? Organizing, displaying, summarizing, and asking questions about data (i.e. data analysis) 3. Define individual. Objects described by a set of data 4. Define variable. Any characteristic of an individual 5. What is a categorical variable? A qualitative variable that simply records a category destination; in other words it is used to place an individual into one or several groups or CATEGORIES 6. What is a quantitative variable? A measurement variable that typically measures a numerical characteristic; in other words, it categorizes an individual using numerical values for which it is often sensible to find an average 7. Define distribution. A distribution tells us what values a variable takes and how often the variable takes on those values. 8. How should data be explored? Begin by examining each variable by itself. Then move on to study relationships among them. Also, use a graphical display with numerical summaries. 9. Drawing conclusions that go beyond the given data is referred to as _inference.

10. What are the two primary ways to produce data? Sampling and experiments 1.1 Displaying Distributions with Graphs (pp.8-21) 1. What is the difference between a frequency table and a relative frequency table? Frequency table only shows the count whereas a relative frequency table shows the percent. 2. What type of data are pie charts and bar graphs used for?? Categorical data. They show the distribution more vividly. 3. Pie Charts can only be used when? Since a pie chart must use all the categories that make a whole, it can only be used when you want to emphasize each category s distribution as it relates to the whole. 4. How is a two-way table setup? It is set up to describe two categorical variables. 5. Which is more informative when comparing group counts or percents? Percents 6. Explain the four step process to organizing a statistical problem. State What s the question that you re trying to answer? Plan How will you go about answering the question? Do Make graphs and carry out needed calculations. Conclude Give your practical conclusion in context of the problem. 7. What do you need to be cautious of when variables seem to have a strong association? Hidden variables be sure to examine data carefully. 1.2 Describing Distributions with Numbers (pp.27-42) 8. How do you make a dot plot? Draw a number line (i.e. a horizontal axis) labeled with the name of the variable. Scale the axis using the appropriate range. Place a dot over the location that corresponds with the frequency of each value. 9. When examining a distribution, you can describe the overall pattern by its S_hape_ O_utlier_ C_enter_ S_pread 10. If a distribution is symmetric, what does its dot plot look like? The left and right sides of the graph are approximately mirror images of each other. 11. If a distribution is skewed right, what does its dot plot look like? The right side of the graph is much longer than the left side; i.e. the long tail is to the right or FEWER observations are to the right. 12. If a distribution is skewed left, what does its dot plot look like? The left side of the graph is much longer than the right side; i.e. the long tail is to the left or FEWER observations are on the left.

13. What is the difference between unimodal, bimodal, and multimodal data? Unimodal data has a distribution that is single-peaked (one mode). Bimodal data has two peaks (2 modes) and multimodal data refer to distributions with more than two clear peaks. 14. How do you make a stemplot? Separate all data observations into a stem and leaf (the final digit of the value). Write the stems in a vertical column ascending. Do not skip stems. Draw a vertical line to the right of the column. Write each leaf in the row to the right of its tem in ascending order. Provide a key that explains in context what the stems and leaves represent. 15. When is it advantageous to split stems on a stemplot? (See pp.33-34) It is difficult to determine the shape of a distribution when you have too few stems or when each stem has too many leaves. In this case, splitting the stems gives a better visual of the shape. (Note: If you split stems, be sure that each stem is assigned an equal number of possible leaf digits. For example, two stems with 5 possible leaves.) 16. When is a back to back stemplot useful? It is useful when comparing two sets of data about an individual on one graph. 17. What is the purpose of the stemplot? A stemplot gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. It does not work well for large data sets. 18. How is the stemplot of a distribution related to its histogram? A histogram is a shaded in stemplot on the histogram the individual data values of the stemplot are not recorded; however the overall shape of the distribution remains. 19. What is a histogram? The most common graph that shows the distribution of one quantitative variable. 20. When is it better to use a histogram rather than a stemplot or dotplot? When you have many data values. 21. What is meant by frequency in a histogram? The frequency = the number of counts in each class. 22. What is the difference between a bar-graph and a histogram? A histogram displays quantitative data and a bar-graph categorical. A histogram doesn t have space between bars due to the representation of continuous data. 23. Define outlier. An outlier is an individual observation that falls outside the overall pattern of the graph. 1.3 Describing Quantitative Data with Numbers (pp.50-69) 1. In statistics, what are the most common measures of center? The arithmetic average, or mean.

2. Explain how to calculate the mean, x. To find the mean of a set of observations, add their values and divide by the number of observations. 3. Explain how to calculate the median, M. The median, M, is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median: 1) arrange all the observations in order of size, from smallest to largest 2) if the number of observations is odd, the median M is the center observation of the order list 3) if the number of observations is even, the median M is the mean of the 2 center observations in the ordered list. 4. Explain why the median is resistant to extreme observations, but the mean is nonresistant. The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward the long tail. 5. In a symmetric distribution where are the mean and median in relation to each other? What about in a distribution that is skewed? See graphs below. 6. What is the difference between average value and typical value? 7. Explain how to calculate Q1 and Q3 and IQR. To calculate the quartiles: 1) arrange the observations in increasing order and locate the median in the list 2) Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median 3) Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. The IQR is the distance between the first and third quartiles, Q3 - Q1. Also known as the range of the middle half of the data. 8. When does an observation become an outlier? An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile. 9. What is the five-number summary? The 5 # summary is: Minimum, Q1, Median, Q3, and, maximum. 10. How much of the data falls between each quartile? 25% of the data falls between each quartile. 11. How much of the data falls between Q1 and Q3? 50% of the data falls between Q1 and Q3. Describe a boxplot. A modified boxplot is a graph of the 5-number summary, with outliers plotted individually. Description: - a central box spans the quartiles - a line in the

box marks the median - observations more than 1.5*IQR outside the central box are plotted individually - lines extend from the box out to the smallest and largest observations that are not outliers. 12. What does standard deviation measure? The standard deviation is a measure of spread. It measures spread around the mean and should only be used when the mean is chosen as the measure of center. 13. What is the relationship between variance and standard deviation? The standard deviation, s, is the square root of the variance s 2. 14. When does standard deviation equal zero? The standard deviation = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger. 15. What are the units for the standard deviation of a distribution? The standard deviation is expressed in the same units as the data. 16. Is standard deviation resistant or nonresistant to extreme observations? Explain. The standard deviation, s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large. 17. Use a five number summary when you want to provide a quick overall description of distribution. Remember, numerical summaries do not fully describe the shape of a distribution. Always plot your data. 18. Use x and s when the distribution is roughly symmetrical and not affected by outliers.