Chapter 1: Exploring Data - Key

Name Date Block READING GUIDE - Key Key Vocabulary: individuals variable categorical variable quantitative variable two way table marginal distributions conditional distribution association distribution range spread frequency outlier center shape skewed left skewed right symmetric dot plot histogram stemplot split stems back-to-back stemplot time plot mean nonresistant x median resistant quartiles Q1, Q3 IQR five-number summary minimum maximum boxplot modified boxplot standard deviation variance INTRO Analyzing Categorical Data (pp.2-6) 1. How is statistics defined? The science of data organizing, displaying, summarizing, and asking questions about data (i.e. data analysis) 2. Define data analysis? Organizing, displaying, summarizing, and asking questions about data (i.e. data analysis) 3. Define individual. Objects described by a set of data 4. Define variable. Any characteristic of an individual 5. What is a categorical variable? A qualitative variable that simply records a category destination; in other words it is used to place an individual into one or several groups or CATEGORIES 6. What is a quantitative variable? A measurement variable that typically measures a numerical characteristic; in other words, it categorizes an individual using numerical values for which it is often sensible to find an average 7. Define distribution. A distribution tells us what values a variable takes and how often the variable takes on those values. 8. How should data be explored? Begin by examining each variable by itself. Then move on to study relationships among them. Also, use a graphical display with numerical summaries. 9. Drawing conclusions that go beyond the given data is referred to as _inference.

10. What are the two primary ways to produce data? Sampling and experiments 1.1 Displaying Distributions with Graphs (pp.8-21) 1. What is the difference between a frequency table and a relative frequency table? Frequency table only shows the count whereas a relative frequency table shows the percent. 2. What type of data are pie charts and bar graphs used for?? Categorical data. They show the distribution more vividly. 3. Pie Charts can only be used when? Since a pie chart must use all the categories that make a whole, it can only be used when you want to emphasize each category s distribution as it relates to the whole. 4. How is a two-way table setup? It is set up to describe two categorical variables. 5. Which is more informative when comparing group counts or percents? Percents 6. Explain the four step process to organizing a statistical problem. State What s the question that you re trying to answer? Plan How will you go about answering the question? Do Make graphs and carry out needed calculations. Conclude Give your practical conclusion in context of the problem. 7. What do you need to be cautious of when variables seem to have a strong association? Hidden variables be sure to examine data carefully. 1.2 Describing Distributions with Numbers (pp.27-42) 8. How do you make a dot plot? Draw a number line (i.e. a horizontal axis) labeled with the name of the variable. Scale the axis using the appropriate range. Place a dot over the location that corresponds with the frequency of each value. 9. When examining a distribution, you can describe the overall pattern by its S_hape_ O_utlier_ C_enter_ S_pread 10. If a distribution is symmetric, what does its dot plot look like? The left and right sides of the graph are approximately mirror images of each other. 11. If a distribution is skewed right, what does its dot plot look like? The right side of the graph is much longer than the left side; i.e. the long tail is to the right or FEWER observations are to the right. 12. If a distribution is skewed left, what does its dot plot look like? The left side of the graph is much longer than the right side; i.e. the long tail is to the left or FEWER observations are on the left.

13. What is the difference between unimodal, bimodal, and multimodal data? Unimodal data has a distribution that is single-peaked (one mode). Bimodal data has two peaks (2 modes) and multimodal data refer to distributions with more than two clear peaks. 14. How do you make a stemplot? Separate all data observations into a stem and leaf (the final digit of the value). Write the stems in a vertical column ascending. Do not skip stems. Draw a vertical line to the right of the column. Write each leaf in the row to the right of its tem in ascending order. Provide a key that explains in context what the stems and leaves represent. 15. When is it advantageous to split stems on a stemplot? (See pp.33-34) It is difficult to determine the shape of a distribution when you have too few stems or when each stem has too many leaves. In this case, splitting the stems gives a better visual of the shape. (Note: If you split stems, be sure that each stem is assigned an equal number of possible leaf digits. For example, two stems with 5 possible leaves.) 16. When is a back to back stemplot useful? It is useful when comparing two sets of data about an individual on one graph. 17. What is the purpose of the stemplot? A stemplot gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. It does not work well for large data sets. 18. How is the stemplot of a distribution related to its histogram? A histogram is a shaded in stemplot on the histogram the individual data values of the stemplot are not recorded; however the overall shape of the distribution remains. 19. What is a histogram? The most common graph that shows the distribution of one quantitative variable. 20. When is it better to use a histogram rather than a stemplot or dotplot? When you have many data values. 21. What is meant by frequency in a histogram? The frequency = the number of counts in each class. 22. What is the difference between a bar-graph and a histogram? A histogram displays quantitative data and a bar-graph categorical. A histogram doesn t have space between bars due to the representation of continuous data. 23. Define outlier. An outlier is an individual observation that falls outside the overall pattern of the graph. 1.3 Describing Quantitative Data with Numbers (pp.50-69) 1. In statistics, what are the most common measures of center? The arithmetic average, or mean.

2. Explain how to calculate the mean, x. To find the mean of a set of observations, add their values and divide by the number of observations. 3. Explain how to calculate the median, M. The median, M, is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median: 1) arrange all the observations in order of size, from smallest to largest 2) if the number of observations is odd, the median M is the center observation of the order list 3) if the number of observations is even, the median M is the mean of the 2 center observations in the ordered list. 4. Explain why the median is resistant to extreme observations, but the mean is nonresistant. The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward the long tail. 5. In a symmetric distribution where are the mean and median in relation to each other? What about in a distribution that is skewed? See graphs below. 6. What is the difference between average value and typical value? 7. Explain how to calculate Q1 and Q3 and IQR. To calculate the quartiles: 1) arrange the observations in increasing order and locate the median in the list 2) Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median 3) Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. The IQR is the distance between the first and third quartiles, Q3 - Q1. Also known as the range of the middle half of the data. 8. When does an observation become an outlier? An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile. 9. What is the five-number summary? The 5 # summary is: Minimum, Q1, Median, Q3, and, maximum. 10. How much of the data falls between each quartile? 25% of the data falls between each quartile. 11. How much of the data falls between Q1 and Q3? 50% of the data falls between Q1 and Q3. Describe a boxplot. A modified boxplot is a graph of the 5-number summary, with outliers plotted individually. Description: - a central box spans the quartiles - a line in the

box marks the median - observations more than 1.5*IQR outside the central box are plotted individually - lines extend from the box out to the smallest and largest observations that are not outliers. 12. What does standard deviation measure? The standard deviation is a measure of spread. It measures spread around the mean and should only be used when the mean is chosen as the measure of center. 13. What is the relationship between variance and standard deviation? The standard deviation, s, is the square root of the variance s 2. 14. When does standard deviation equal zero? The standard deviation = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger. 15. What are the units for the standard deviation of a distribution? The standard deviation is expressed in the same units as the data. 16. Is standard deviation resistant or nonresistant to extreme observations? Explain. The standard deviation, s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large. 17. Use a five number summary when you want to provide a quick overall description of distribution. Remember, numerical summaries do not fully describe the shape of a distribution. Always plot your data. 18. Use x and s when the distribution is roughly symmetrical and not affected by outliers.