1 Graphing Data Presentation of Data in Visual Forms Purpose of Graphing Data Audience Appeal Provides a visually appealing and succinct representation of data and summary statistics Provides a visually appealing and succinct representation of relationships between variables Diagnostics Determine Characteristics of Data Are there patterns in the data? Could there be relationships between variables? Determine Quality of Data Distribution of data Outliers Data cleaning
2 Graphing Data for Diagnostics Purposes A More In-depth Explanation Determine Quality of Data Distribution of data Is it normally distributed? Outliers Are there anomalies? Data cleaning Does data correspond to possible answer categories? (i.e., Was there a 3 recorded for gender even though codebook only lists 1. male, 2. female?)
3 A Diagnostics Tool for Examining the Distribution of Continuous Data One of the most commonly used diagnostic tools for continuous data is the histogram. The following slides outline how to construct and use this valuable tool.
4 The Importance of Histograms What is a histogram? Why do we use histograms? What does a normal distribution look like? Why is a normal distribution important? How are histograms constructed?
5 What is a histogram? Histograms A histogram is a pictorial representation of the distribution of continuous data ranked from the lowest to the highest value. Below are two histograms representing the distribution of IQ scores for men and women in the U.S. Men Women
6 Histograms Why do we use histograms? Histograms are used for diagnostic purposes, and to answer the following questions. Is data normally distributed? Are there outliers? Histograms can be used to predict. The probability of individual values or scores. The probability of individual values occurring within a designated interval. IQ Scores for Males What is the probability of an IQ between 100 and 120? What is the probability of a score of 140 or above?
7 Histograms Histograms, Tables and Cumulative Graphs Histograms and tables can be used to construct each other Understanding the relationship between tables and histograms can help you present and interpret your data more accurately and precisely Histograms and tables can be used to construct cumulative graphs Cumulative graphs can, in turn, be used to predict individual values Common examples where cumulative graphs are used include Standardized tests (i.e., SAT, ACT) Weight and height charts used in doctor s offices Mental health indices
8 Histograms What does a normal distribution look like? Most histograms have an approximately normal distribution. If you drew a smooth line connecting the midpoints of each interval, the line would outline a figure that is symmetrical and the number of values would decrease steadily as the distance from the mean increases.
9 Histograms What does a normal distribution look like? Normal distributions are symmetrical. The mean and median are the same value. IQ scores of US males and females are an example. Most variables have normally distributed values. Positive skew mean is greater than median. Example is income. Negative skew - mean is less than median. Example is student exam scores. Hamilton
10 Histograms Why is a normal distribution important? Most statistical formulas for analyzing continuous data assumes a normal distribution Therefore the results of statistical analysis is not valid unless the distribution is normal How are histograms constructed? Statistical programs can be used to construct histograms, or you can construct them manually Data is ranked from lowest to highest value Equal intervals are constructed The number of individual values within each interval are calculated The intervals are then distributed on a number line
11 Stem and Leaf Plot An Easy Way to Construct Your Own Histogram Step 1: Line up scores from lowest to highest values Step 2: Decide on width of your stems or categories With a single stem histogram, the width is 10 Step 3: Put the tens digits down the columns, with the ones digits placed in the rows equally spaced Step 4 Rotate the stem and leaf plot 90 degrees, and you have a histogram
12 Variations of the Stem and Leaf Plot The interval widths can be of different sizes. This example uses the same raw data. The width of each stem is 2. Five Stem Histogram Using Same Data 4* 4t 4f 4s * 5t 5f 5s 5. 6* t 4 4 6f s 6. You would use this stem and leaf plot if your data was more tightly clustered
13 Constructing Your Own Histogram Constructing your own histogram and cumulative graph is not recommended: (1) if you data set is large, and/or (2) you plan to enter the data into a computer program anyway for other research purposes (i.e., running statistical tests or generating descriptive statistics) HOWEVER, statisticians recommend that you do know how to construct a histogram manually as it helps you understand and interpret them more easily. It can also give you a quick preliminary summation of your data before entering it into the computer.
14 Understanding the Relationship Between Raw Data, Tables, and Histograms In this example we will construct the table, then the histogram, and finally a cumulative graph. The construction of these graphs/tables can be done in a different order. Constructing a table that can be used to construct a histogram and cumulative graphs: Step 1 - Order values from lowest to highest as shown below 33, 50, 52, 60, 63, 65, 65, 65, 66,67, 68, 69, 69, 70, 70, 71, 71, 72, 73, 73, 74,74, 74, 75, 75, 75, 75, 75, 76, 76, 77,77, 77, 78, 78, 80, 81, 82, 83, 84, 84,87, 88, 88, 90, 90, 92, 95, 95, 98,
15 Use a Template Similar to the One Below Class Intervals are Reported Here The Percent of the Total Number of Values is Reported Here The Cumulative Percent is Reported Here Frequency Cumulative Frequency Percent Cumulative Percent A + D + B = C E = F The Number of Values in Each Interval is Reported in This Column The Cumulative Number of Values is Reported Here
16 Step 2 Report the Intervals in the First Column In this Example, the Class Interval is 5 (i.e., 30,31,32,33,34, are in the first interval) Frequency Cumulative Frequency Percent Cumulative Percent
17 Step 3 Count the Number of Values in Each Interval and Report Them Under Frequency Frequency Cumulative Frequency Percent Cumulative Percent
18 Step 4 Count the Number in the Frequency Column and Report it in Cumulative Frequency Frequency Cumulative Frequency Percent Cumulative Percent
19 Step 5 Compute and Report the Percent for Each Frequency For Instance, the Frequency in Interval is 1, and 1 is 2% of the Total (50 cases) Frequency Cumulative Frequency Percent Cumulative Percent
20 Step 6 Compute and Report the Cumulative Percent (Similar to What was Done in Step 3 for Cumulative Frequency) Frequency Cumulative Frequency Percent Cumulative Percent
21 This Histogram can Then be Constructed BINS are constructed. Each bin is the width of the corresponding interval from the table. The first bin should start with the lowest interval, and the last bin should start with the highest interval (i.e., ).
22 This Histogram can Then be Constructed Frequency
24 Using the Same Data Set, You Can Construct a Cumulative Graph
25 Frequency Cumulative Frequency Percent Cumulative Percent The Table you have Constructed can be used to Construct a Cumulative Graph.
26 Cumulative Percent Use the Cumulative Percent to Construct the Graph Plot the cumulative percent at the midpoint of each interval. For instance, for the first interval (30-34), you would plot a 2 at the midpoint or middle of this interval. Do the same for each interval, and then connect the plot points. Cumulative Graph
27 Cumulative Frequency Cumulative Graph 1 3 OR you could Use the Cumulative Frequency
28 Cumulative Graph To predict percent and/or value, you simply use the cumulative graph line. You draw a straight line from the value/percent to the graph line, and then a straight line to the corresponding percent/value. (1) Determine percent that corresponds to a value of 75. In this instance, a score of 75 is in the 58 percentile. (2) Determine the score of a certain percent. The 75 th percent corresponds to an approximate score of 80. The Cumulative Graph can be used to predict the approximate percentile for any value, and the approximate value for any percentile
29 You can also use a cumulative graph to determine if your data have an approximately normal distribution. The graph in the upper left hand corner corresponds to a bell-shaped, or normal distribution.
30 The cumulative graphs at the top are negatively and positively skewed. The correspond to a negatively or positively skewed histogram as shown directly below them.
31 Questions or Comments, Contact: Dr. Carol Albrecht Assessment Specialist USU Ext (979)