Variable - what we are measuring Quantitative - numerical where mathematical operations make sense. These have UNITS Categorical - puts individuals into categories Numbers don't always mean Quantitative... Frequency vs. Relative Frequency vs. cumulative frequency vs. relative cumulative frequency
Two-Way Tables and Marginal Distributions Distributions are of VARIABLES, not individual values!!! To examine a marginal distribution, 1) Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. 2) Make a graph to display the marginal distribution. Note: Percents are often more informative than counts, especially when comparing groups of different sizes.
A Conditional Distribution of a variable describes the values of that variable among individuals who have a specidic value of another variable. To examine or compare conditional distributions, 1) Select the row(s) or column(s) of interest. 2) Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3) Make a graph to display the conditional distribution. Use a side-by-side bar graph or segmented bar graph to compare distributions. There are three main ways to display quantitative data: -Dotplots -Stemplots -split -back-to-back -Histograms
How to create a dotplot: 1) Draw a horizontal axis (a number line) and label it with the variable name. 2) Scale the axis from the minimum to the maximum value. 3) Mark a dot above the location on the horizontal axis corresponding to each data value. How to make a stemplot: 1) Separate each observation into a stem (all but the Linal digit) and a leaf (the Linal digit). 2) Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of the column. 3) Write each leaf in the row to the right of its stem. Arrange the leaves in increasing order out from the stem. 4) Provide a key that explains in context what the stems and leaves represent. Splitting Stems and Back-to-Back Stemplots When data values are bunched up, we can get a better picture of the distribution by splitting stems. Two distributions of the same quantitative variable can be compared using a back-to-back stemplot with common stems. How to make a histogram: 1) Divide the range of data into classes of equal width. 2) Find the count (frequency) or percent (relative frequency) of individuals in each class. 3) Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals.
(Using your calculator) 1. Enter the data into L 1. (press the STAT button, highlight EDIT and choice #1 and press ENTER). 2. Turn on the stat-plot. (press 2 nd and the Y= button to select STAT PLOT, highlight choice #1 and press ENTER, select ON and press enter, select the histogram under TYPE and press enter) 3. Adjust your window. (press the WINDOW button; enter your minimum value (smaller than the smallest observation) for Xmin, enter your maximum value (larger than the largest observation) for Xmax, enter the length of your classes for Xscl (i.e. what you are counting by to get from Xmin to Xmax), adjust your Ymin = 0 and Ymax appropriately) OR Go to ZOOM and select #9ZoomStat Using Histograms Wisely Here are several cautions based on common mistakes students make when using histograms. 1) Don t confuse histograms and bar graphs. 2) Don t use counts (in a frequency table) or percents (in a relative frequency table) as data. 3) Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. 4) Just because a graph looks nice, it s not necessarily a meaningful display of data.
Relative Frequency Histogram This type of histogram displays proportions or percents rather than counts. Cumulative Frequency Histogram (Ogive) Examine the Distribution Look for the OVERALL pattern and any striking DEVIATIONS from that pattern Describe the shape, center, and spread and determine if there are any outliers (don't forget your SOCS!) Shape Skewed or symmetric? Symmetric - the left and right hand sides of the histogram are approximately mirror images of each other Skewed right - the right side of the histogram extends MUCH farther out than the left side ("tail" goes to the right) Skewed left - the left side of the histogram extends MUCH farther out than the right side ("tail" goes to the left) Uniform distribution - doesn't appear to have any modes - pretty much the same height across the whole distribution
Measures of Center We have two ways of numerically measuring the center of a quantitative data set - the Median and the Mean. Both of these can be considered to give us the "average" of a data set. Some issues with notation: There are two ways to write the mean The choice depends on whether you are talking about the entire POPULATION of interest or just a SAMPLE from the entire population. Unless you are 100% positive you have the data from the ENTIRE population, use μ. If you see being used, then the data must be from the entire population. Comparing the Mean and Median In a symmetric distribution the mean and median are VERY close together. In a skewed distribution the mean will be greater than or less than the median, depending upon the skew. The larger the difference between the two, the greater the skew. If the mean is greater than the median, the distribution is skewed right If the mean is smaller than the median, the distribution is skewed left
Measures of Spread As with measures of center, we have two different ways to measure the spread in quantitative data - quartiles and IQR and the standard deviation and variance. Standard Deviation - (written as σ - population or s - sample) and Variance - (written as σ 2 - population or s 2 - sample) The standard deviation gives a measure of the "average" distance that data points fall from the mean s = 0 ONLY when there is NO SPREAD - this only happens when every observation is the SAME otherwise s > 0 The more spread out the observations are the greater s will be s has the same units of measurement as the observations do Like we saw with the mean, s is not resistant Choosing measures of center of spread 1. FIVE-NUMBER SUMMARY or Median and IQR The Five-Number Summary gives a quick summary of both the center and spread of your data. Some people also consider giving the IQR with the Median to be a suflicient measure of center and spread. It contains the Minimum observation, Q 1, the Median, Q 3, and the Maximum observation. Use when the distribution is skewed or has strong outliers Used to create another graphical display of quantitative data - the BOXPLOT 2. The Mean and Standard Deviation Use for reasonably symmetric distribution that are free of outliers
Boxplot A graph of the Dive-number summary A central box spans the quartiles, Q 1 and Q 3 with a line marking the median, M. Lines extend from the edge of the box ( Q 1 and Q 3 ) out to the minimum and maximum values, respectively. IF THERE ARE OUTLIERS: DO NOT extend the lines to outliers. Only extend to the minimum and maximum values that are NOT outliers. Mark outliers with an asterisk. How to use the calculator for numerical summaries and boxplots: (Using your calculator) 1. Enter the data into L 1. (press the STAT button, highlight EDIT and choice #1 and press ENTER). For Numerical Summaries: 2. Press the STAT button, arrow over to CALC 3. Select 1-Var Stats 4. You will get a list of values on your main screen. Arrow through to find all necessary values. mean standard deviation Minimum Observation Q 1 Median Q 3 Maximum For Boxplot: 2. Turn on the stat-plot. (press 2 nd and the Y= button to select STAT PLOT, highlight choice #1 and press ENTER, select ON and press enter) 3. Select the FIRST boxplot option under "TYPE" - this one graphs outliers 4. Adjust your window. (ZOOM, select #9ZoomStat)