# Exploratory Data Analysis

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After completion of this module, the student will be able to employ web based tools for statistical hypothesis testing and determine significance levels under multiple testing using the Bonferroni correction. Knowledge and Skills Graphing in Excel Functions in Excel Pivot tables in Excel Concepts: mean, standard deviation, bar chart, histogram, boxplot, quantiles, error bar, line graph, scatter plot Prerequisites Statistical hypothesis testing Some basic familiarity with graphing in Excel Familiarity with o Cardiac events o Risk factors for cardiac events o Dobutamine stress echocardiography Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1

2 Materials and Pedagogical Approaches We will use a data set that is analyzed in Garfinkel, Alan, et. al. "Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease." Journal of the American College of Cardiology 33(3) (1999) You can download the data set from the NUMB3R5 COUNT web site ( The data set is called Garfinkel Cardiac Data. The data set is a compilation of medical records of patients who underwent dobutamine stress echocardiography and were followed for twelve months afterwards for occurrences of cardiac events. The data set thus includes data on patient characteristics, physiological data under rest and dobutamine conditions, the history of medical conditions relevant to cardiac events, and outcome data during the subsequent twelve months. As preparation for the data set and the accompanying article, students should become familiar with coronary events (myocardial infarction, cardiac death, percutaneous transluminal coronary angioplasty, and coronary artery bypass surgery) and risk factors (high blood pressure, diabetes, and smoking), at the level of being able to read a newspaper article that uses these terms. Students should review patient information on dobutamine stress echocardiography prior to this module (see, for instance, ardiogram.html). The data set and the accompanying article can be deployed in numerous ways in the classroom, depending on the background of the students. (1) If students are already familiar with statistical methods, they can start with the data set and explore the data on their own or with some guidance. Both a hypothesis based or a discovery based approach are appropriate: based on prior knowledge, students can formulate hypotheses and test them subsequently; or students can explore the data and discover patterns. (2) Students can reanalyze the data based on results from the accompanying paper to deepen their grasp of statistical analysis. (3) Students can unpack the paper to learn how a scientific paper in this discipline is written and how results can be succinctly communicated. (4) The paper relies almost exclusively on tables to convey results. A student could focus on a specific aspect of the paper and develop a poster where the results are visually displayed. We will focus in the following on visual displays of the data and their statistical analysis. Exploratory Data Analysis (EDA) Graphical techniques are important for exploratory data analysis. The NIST Engineering Statistics Handbook describes many of these techniques: Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2

3 Data There are three types of data: free text, categorical data, and numerical data. Free text is simply text. It can be visualized with Word Clouds ( and a new one). Below is a word cloud of some of the text in this module. Figure 1: Word Cloud of free text generated by tagxedo.com. Categorical data are either nominal (unranked) or ordinal (ranked). Categorical data are often visualized using bar charts. Numerical data are either discrete or continuous. Common ways to visualize numerical data are histograms, boxplots, or scatter plots. Let s look at the variables in the data set. Click on the Data tab in the spreadsheet. The first 21 columns (A U) contain measurements or calculations based on the measurements prior to and during the test. This is followed by specific outcomes, namely four types of cardiac events during the ensuing 12 months in columns V Y. The next six columns (Z AE) list the history of medical events of the patient. The final column, AF, is another outcome column, namely whether the patient suffered a new cardiac event during the ensuing 12 months, i.e., whether any of the events in Columns V Y occurred. The data set has two types of data: numerical and categorical. Columns A P, except for Column N, contain numerical data. Columns Q AF and Column N contain categorical data. All the categorical data in the data set are nominal: 0 codes for yes and 1 for no. For exploratory data analysis, we will introduce histograms and boxplots for univariate numerical data, scatterplots for bivariate numerical data, and pivot tables combined with bar charts for categorical data. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3

4 In addition, we will divide patients into groups based on the outcome of categorical data (yes vs. no), and then investigate differences among groups. The following table contains a list of the variables and their data type in the spreadsheet: Column Variable Data Type A Basal Heart Rate numerical B Basal Blood Pressure numerical C Basal Double Product BHR*BBP numerical D Peak Heart Rate numerical E Systolic Blood Pressure numerical F Double Product PKR*SBP numerical G Dobutamine Dose numerical H Maximum Heart Rate numerical I % Maximun Predicted Heart Rate Achieved by Patient numerical J Maximum Blood Pressure numerical K Double Product of Maximum on Dobutamine numerical L Double Product Maximum on this Dobutamine Dose numerical M Age numerical N Gender (male=0) categorical O Baseline Cardiac Ejection Fraction numerical P Baseline Cardiac Ejection Fraction on Dobutamine numerical Q Chest Pain (yes=0) categorical R Sign of Heart Attack on ECG (yes=0) categorical S Equivocal ECG (yes=0) categorical T Heart Wall Motion Anomaly Observed (yes=0) categorical U Stress ECG Positive (yes=0) categorical V New Heart Attack (yes=0) categorical W New Angioplasty (yes=0) categorical X New Bypass Surgery (yes=0) categorical Y Death (yes=0) categorical Z History of Hypertension (yes=0) categorical AA History of Diabetes (yes=0) categorical AB History of Smoking (yes=0) categorical AC History of Heart Attack (yes=0) categorical AD History of Angioplasty (yes=0) categorical AE History of Bypass Surgery (yes=0) categorical AF Any New Cardiac Event (yes=0) categorical Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4

5 Guidelines for Exploratory Data Analysis After you have become familiar with the different variables in the data set, determine which variables are independent, i.e., input variables, and which ones are dependent, i.e., outcome variables. Next, get a sense for the range of the numerical variables by finding the minimum and maximum value, the mean, median, and standard deviation, and the first and third quartile (i.e., the 25 th and 75 th percentile). This information can be visualized with box plots. A histogram is helpful to visualize the range and the shape of the distribution. It is generated by first grouping the data and counting the number of occurrences in each group or category. The counts are then displayed as vertical bars in a histogram where the area of each bar is proportional to the frequency in that category. Pairs of numerical data can be displayed as scatter plots to reveal relations among the pairs. Pivot tables are helpful to find counts of categorical data, which can then be displayed as bar charts where the height of each bar is proportional to the frequency in the respective category. To determine statistical significance of any relationships, the chisquared test and the t test are frequently employed. Statistical hypothesis testing determines whether there are statistically significant relationships among variables. With a large data set like the one considered in this module, we will perform a large number of statistical tests. To set the significance level in multiple comparisons, we need to be careful. A significance value of 5% means that there is a 5% chance of rejecting the null hypothesis even if it is true. If we do multiple comparisons and stick with the same significance level of say 5%, then just by chance it becomes increasingly likely that we will reject a null hypothesis even if it is true as the number of tests increases. You can think of it in this way: every time you perform a statistical hypothesis test with significance level 5%, there is a 5% chance that you reject the null hypothesis even though it is true, in other words, you are wrong 5% of the time. If you have a coin that comes up tails 5% of the time, then if you toss the coin often enough you will eventually see tails. In fact, we expect to see tails about every twenty coin tosses since 5% is 1 in 20. One solution to the multiple comparison problem is provided by the Bonferroni correction, which says that you need to divide the desired significance level across all tests by the number of tests and apply this new level to each of the individual tests. For instance, if you perform ten tests and want an overall level of significance of 5%, each individual test should be conducted at the 0.5% level since 0.5=5/10. You can read about the Bonferroni correction on Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5

6 Histograms Histograms are a common form to summarize univariate numerical data. You can find a description on the NIST pages: We will create a histogram for the frequency distribution of the basal heart rate. The data are listed in the spreadsheet under the Basal Heart Rate tab. The range of the date is A2:A559. We begin with arranging the data into bins of equal width and counting the number of patients in each bin. To determine the bin size, we first calculate the range of the date, that is, the difference between the largest and the smallest value. The minimum of the data can be found using the Excel function MIN(range) and the maximum of the data can be found using the Excel function MAX(range). Confirm that the minimum is 42 and the maximum is 210. The range of the date is the difference between the maximum and the minimum, that is, 168. Let s start with somewhere close to ten bins. The number of bins may have to be revised to obtain a meaningful histogram. If each bin is of width 15, we end up with 12 bins. To capture all the data, we start the first bin at 41 so that it ends at 55, and the last bin at 206, which ends at 220. The choice of values is somewhat arbitrary: we need to capture all the data and may want to choose nice numbers, such as 55, 70, for the right end points. The right end points are then 55, 70, 85,..., 205, and 220. Enter these numbers into the cells E2:E13. To count the number of patients whose basal heart rate is greater than 70 and less than or equal to 85, we count the number of patients whose heart rate is less than or equal to 85 and subtract from this the number of patients whose basal heart rate is less than or equal to 70. The COUNTIF function in Excel can accomplish this task. For instance, to count the number of patients whose basal heart rate is less than or equal to 70, we would enter into a cell =COUNTIF(range, <=70 ), where range is the data range. If 70 is in the cell E3, say, then the formula becomes =COUNTIF(range, <= &E3). To count the number of patients whose basal heart rate is less than or equal to the right boundary of each bin, enter the function =COUNTIF(\$A\$2:\$A\$559, <= &E2) into the cell F2. Drag this down to F13. These are the cumulative counts. To find the counts in each bin, we take differences, except for the first bin. The count of the first bin is in Cell F2. We thus enter =F2 into Cell G2. To calculate the remaining counts, we enter F3 F2 into cell G3 and drag this down to G13. We can now create a table with the counts. The table should list the ranges of each of the bins, the midpoints of each of the bins and the counts for each of the bins. To find the midpoint of a bin, average the left and the right boundary values of each bin. That is, to determine the midpoint of the bin that counts the number of patients whose basal heart rate is between 41 and 55, find Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 6

7 The midpoint of the first bin is thus 48. Enter the midpoints into the Cells H2 H13. The following table lists the ranges of each bin, their midpoints, and the counts. Range Midpoint Count Excel has bar charts that display data as bars where the height is proportional to the count in the respective category. Histograms display counts as bars where the area is proportional to the count in each bin. If all bins are of equal width, both the height and the area of each bar are proportional to the respective counts. Note that in our example, all bins are of equal width. Figure 2 shows the histogram and we will proceed with how to create this figure. Figure 2: Histogram of the basal heart rate. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 7

8 The basal heart rate data of the 558 patients is divided into equal sized bins and the vertical axis counts the number of patients in each bin. For instance, the bin with midpoint 63 counts the number of patients whose basal heart rate is between 56 and 70 (both end points are included). To graph the histogram, click on the Insert tab. Under the Charts group, you find the Column charts. We will use the 2D Clustered Column to create the histogram. Highlight the counts in the range G2:G13 and click on the 2D Clustered Column chart. This creates the graph. To modify the figure, click on the Design tab in the Chart Tools group. Under the Data group, click on Select Data. A window appears. Click on Edit in the Horizontal (Category) Axis Label. A window for the Axis label range appears. Select the range and enter the range of the midpoints, H2:H13. Click OK. We can now add a title and axes titles, and increase the width of the bars so that they touch (a standard feature of histograms). We can also remove the Series 1 label by clicking on it and then hitting the Delete button on the key board. To add a title and axes title, click on the Layout tab in the Chart Tools group and click on Chart Title for the title and on Axis Title for the axes titles. To change the width of the bars, double click on any of the bars. A window appears where you can change the gap width: set it to No Gap and the gaps will disappear. To change the border color of the bars, click on Border Color and select a color that is different from the fill color. To change the color of the bars, click on the Fill option. Close the window. Your histogram should now look like the one in Figure 2. A histogram is a graphical way to show you the location of the data, how much they are spread out, whether the data are skewed, whether there are outliers, and what the shape of the frequency distribution is. We see that the shape is unimodal, that is, the distribution has only one hump. The distribution is not too skewed and the tails are not too heavy. In other words, the distribution looks near normal. We find that there is one outlier, namely the maximum value of 210. Without any further knowledge, it is difficult to determine whether this value is accurate or a measurement error. Boxplots Boxplots are very useful in summarizing univariate numerical data. The NIST Engineering Statistics Handbook introduces boxplots as an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data. ( Excel does not have a built in tool to graph boxplots. However, we can use stacked columns and error bars to build boxplots. We consider the Baseline Cardiac Ejection Fraction as an example. The data are in the spreadsheet under the BCEF tab. Click on the tab. We divide the patients into two groups: in one Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 8

9 group, heart wall motion anomaly was observed (Group 0), in the other group, no heart wall motion anomaly was observed (Group 1). To group the data, use the Sort & Filter option in the Editing group on the Home tab. Highlight the range of values in both columns, click on Sort & Filter, select Custom Sort and sort according to heart wall motion anomaly. We then create boxplots for each group, following the NIST handbook: the vertical axis lists the response variable, which is the Baseline Cardiac Ejection Fraction, and the horizontal axis lists the factor of interest, which is Heart Wall Motion Anomaly Observed (yes=group 0; no=group1). Figure 3 shows the final result. Figure 3: Boxplot for Baseline Cardiac Ejection Fraction for two groups: Group 0 has heart wall motion anomaly observed, whereas Group 1 does not have heart wall motion anomaly observed. We will describe in the following how to create this figure. A boxplot identifies the median (the middle 50%), the lower quartile (the 25 th percentile), the upper quartile (the 75 th percentile), and the extreme points. It consists of a box with the bottom of the box indicating the location of the lower quartile, the top of the box indicating the upper quartile, and band in the box indicating the median. Two whiskers on either end of the box indicate the extremes of the data, for instance, the minimum and the maximum of the data. Alternatively 1, the ends of the whiskers may indicate the value that is within 1.5 times the interquartile range (upper minus lower quartile) of the lower and the higher quartile, respectively. Sometimes, the 9 th and the 91 st or the 2 nd and the 98 th percentiles are used. Since the whiskers are not plotted uniformly, we need to explicitly state their meaning. In the following, we will use the minimum and the maximum of the data, respectively, to indicate the ends of the whiskers. 1 Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 9

10 We begin with calculating the lower quartile, the median, and the upper quartile, together with the minimum and the maximum data values in each of the two groups. Excel has functions for these values: To calculate quartiles, use =PERCENTILE(range, p) where p denotes the percentile, such as 0.25 for the lower quartile. To calculate the minimum and the maximum, respectively, use =MIN(range) and =MAX(range), respectively. We find Group 0 Group 1 Q Median Q Min Max For the box, we will use stacked bar graphs. We therefore need to know the height of the boxes. For the whiskers, we will use error bars. We therefore need to know how far below the bottom of the box and above the top of the box the whiskers extend. We find Group 0 Group 1 Q Median-Q Q3-Median 5 5 Q1-Min Max-Q We can now draw the boxes for the two groups. Highlight the values for Q1, Median Q1, and Q3 Median for both groups simultaneously, and select the Stacked Columns option in the Charts group under Column. Click on one of the bars. This opens the Chart Tools group with three tabs. Click on the Design tab. In the Data group, you find the Switch Row/Column option. Click it to have the boxes stacked according to their groups. Double click on either of the bottom boxes of the stacked boxes. A window opens that allows you to format the data series. Go to Fill and select no color for the boxes and no color for the shape outline. Close the window and double click on one of the middle boxes. Again, select no fill but now choose a black shape outline for the two middle boxes. Repeat this for the top boxes. To add one sided error bars to the bottom boxes, click on either of the (now invisible) bottom boxes. In the Analysis group under the Layout tab, you will find the Error Bars option. Click on More Error Bar Options. Click on the Minus direction for the display and use the Custom option for the error amount. The value you specify is the Q1 Min value. Repeat this for the top whisker, except that now the direction of the whisker is Plus and its length is given by Max Q3. Add the names of the groups and a vertical axis title. If you double click on the horizontal lines in your figure, that is, the Vertical Axis Major Gridlines, Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 10

11 you can delete them by clicking on Line Color and then No Line. Your figure should now look like the one in Figure 3. To test whether the two groups are statistically significant, perform a t test. There are online calculators for t tests We need to calculate the mean, the standard deviation, and count the number of patients in each group. For the mean, use the Excel function =AVERAGE(range) and for the standard deviation, use the Excel function =STDEV(range). To count the number of patients, sum the 0s and 1s. The sum is equal to the number of patients in Group 1. If you subtract the value from the total number of patients, 558, you obtain the number of patients in Group 0. We find Group 0 Group 1 Mean S.D Sample Size Go to either web site for the t test calculator and enter the data. You will find that the two groups are significantly different from each other. Pivot Tables and Bar Charts The article by Garfinkel states that [p]atients with a positive SE had a 34% cardiac event rate within the ensuing 12 months (Table 3) versus an event rate of only 10% in patients with a negative SE (p<0.001). We will use pivot tables to confirm these figures and bar charts to illustrate the difference. Creating a pivot table: 1. Highlight all the data that is in the spreadsheet under the Data tab (but only the data, not the headers), go to Insert, and click on PivotTable. This opens a pivot table in a new tab. 2. There are four areas where we can drag fields from the Pivot Table Field List. We drag the Stress ECG Positive (yes=0) field to the Row Labels and the Any New Cardiac Event (yes=0) to the Column Labels. To count the values in each of the four categories in the pivot table, we can drag, for instance, Gender into the Σ Values area. Make sure that the Value Field setting is Count. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 11

12 3. To create a bar graph, click on Options in the PivotTables Tools and click on PivotChart. Select the bar graph. 4. To delete a field from the areas, either unclick the box in the Pivot Table Field List or left click the label and delete the label. We find Count of Gender Any New Cardiac Event (yes=0) Stress ECG Positive (yes=0) 0 1 Grand Total Grand Total and the corresponding bar chart Figure 4: Bar chart of new cardiac events as a function of whether the stress ECG was positive. To confirm the statement in Garfinkel et al., we calculate the percentage of patients with a positive SE who experienced a cardiac event. 136 patients had a positive SE and 46 of them experienced a cardiac event. Therefore % 136 There were 422 patients who had a negative SE and 43 of them experienced a cardiac event. Therefore Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 12

13 % 422 The statistical significance of this difference can be ascertained with a chi square test, which tests whether the two variables (positive SE and cardiac event) are associated. Online chi square calculators are available: We use the first link. The online calculator asked you to enter the total number of patients in each of the four categories into cells. Clicking the button to calculate the chi square value, which is (Figure 5), confirms that p Figure 5: Screenshot of Chi square calculator ( The Report Filter can be used to understand the effect of for instance History of Diabetes. The next table and graph looks at the number of patients with Heart Wall Anomaly and Stress ECG Positive filtered according to diabetes. Set the filter so that the data set includes only those patients with diabetes. Below is the table that you should get: Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 13

14 History of Diabetes (yes=0) 0 Count of Gender Any New Cardiac Event (yes=0) Heart Wall Motion Anomaly Observed (y Stress ECG Positive (yes=0) 0 1 Grand Total Total Total Grand Total The bar charts looks then as follows: Figure 5: Cardiac event rate for patients with a history of diabetes as a function of WMA and positive stress ECG. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 14

15 Scatterplots You Tube has videos on many of the features of Excel. Here is one on scatter plots: SeCPLC30_g Figure 6: Scatterplot A scatterplot can convey visually whether two numeric variables are related. We added a regression line and the R 2 value, both are readily available in Excel. Effective Graphing 1. Take the Graph Design IQ test: and record your score. 2. Go to a. A graph and a table: b. Pie chart versus bar graph : c. Using line markers: d. Multiple solutions: Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 15

16 For each of the four examples, explain in your own words (250 words or less) what the problem of the original visualization was and how the visualization was improved. 3. Retake the Graph Design IQ test: and record your score. 4. Read pages 1 4 in the White Paper Effectively Communicating Numbers by S. Few. Web 2.0 Visualization Tools ibm.com/software/data/cognos/manyeyes/ References Few, S Effectively Communicating Numbers. Principal Perceptual Edge. White Paper. Downloaded from Garfinkel, A., et. al Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease. Journal of the American College of Cardiology 33(3): Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 16

### Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### Appendix 2.1 Tabular and Graphical Methods Using Excel

Appendix 2.1 Tabular and Graphical Methods Using Excel 1 Appendix 2.1 Tabular and Graphical Methods Using Excel The instructions in this section begin by describing the entry of data into an Excel spreadsheet.

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Data exploration with Microsoft Excel: univariate analysis

Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating

### Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

### SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS An Excel Pivot Table is an interactive table that summarizes large amounts of data. It allows the user to view and manipulate

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

### Computer Training Centre University College Cork. Excel 2013 Pivot Tables

Computer Training Centre University College Cork Excel 2013 Pivot Tables Table of Contents Pivot Tables... 1 Changing the Value Field Settings... 2 Refreshing the Data... 3 Refresh Data when opening a

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

### SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing

### Drawing a histogram using Excel

Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### When to use Excel. When NOT to use Excel 9/24/2014

Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

### Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

### Using Excel in Research. Hui Bian Office for Faculty Excellence

Using Excel in Research Hui Bian Office for Faculty Excellence Data entry in Excel Directly type information into the cells Enter data using Form Command: File > Options 2 Data entry in Excel Tool bar:

### Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

### Basic Pivot Tables. To begin your pivot table, choose Data, Pivot Table and Pivot Chart Report. 1 of 18

Basic Pivot Tables Pivot tables summarize data in a quick and easy way. In your job, you could use pivot tables to summarize actual expenses by fund type by object or total amounts. Make sure you do not

### How to Use a Data Spreadsheet: Excel

How to Use a Data Spreadsheet: Excel One does not necessarily have special statistical software to perform statistical analyses. Microsoft Office Excel can be used to run statistical procedures. Although

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### GeoGebra Statistics and Probability

GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

### Excel Unit 4. Data files needed to complete these exercises will be found on the S: drive>410>student>computer Technology>Excel>Unit 4

Excel Unit 4 Data files needed to complete these exercises will be found on the S: drive>410>student>computer Technology>Excel>Unit 4 Step by Step 4.1 Creating and Positioning Charts GET READY. Before

### TIPS FOR DOING STATISTICS IN EXCEL

TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

### A Guide for a Selection of SPSS Functions

A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University - 2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

### Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

### Using SPSS, Chapter 2: Descriptive Statistics

1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Statistics Chapter 2

Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

### IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

### Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Chapter 4 Displaying and Describing Categorical Data

Chapter 4 Displaying and Describing Categorical Data Chapter Goals Learning Objectives This chapter presents three basic techniques for summarizing categorical data. After completing this chapter you should

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Data Visualization Techniques

Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

### Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

### How To: Analyse & Present Data

INTRODUCTION The aim of this How To guide is to provide advice on how to analyse your data and how to present it. If you require any help with your data analysis please discuss with your divisional Clinical

### Describing, Exploring, and Comparing Data

24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Using Excel for descriptive statistics

FACT SHEET Using Excel for descriptive statistics Introduction Biologists no longer routinely plot graphs by hand or rely on calculators to carry out difficult and tedious statistical calculations. These

### Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

### Statistical Analysis Using Gnumeric

Statistical Analysis Using Gnumeric There are many software packages that will analyse data. For casual analysis, a spreadsheet may be an appropriate tool. Popular spreadsheets include Microsoft Excel,

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Microsoft Excel 2010 Pivot Tables

Microsoft Excel 2010 Pivot Tables Email: training@health.ufl.edu Web Page: http://training.health.ufl.edu Microsoft Excel 2010: Pivot Tables 1.5 hours Topics include data groupings, pivot tables, pivot

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data

### Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion

Statistics Basics Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Part 1: Sampling, Frequency Distributions, and Graphs The method of collecting, organizing,

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Coding & Data Skills for Communicators Dr. Cindy Royal Texas State University - San Marcos School of Journalism and Mass Communication

Coding & Data Skills for Communicators Dr. Cindy Royal Texas State University - San Marcos School of Journalism and Mass Communication Spreadsheet Basics Excel is a powerful productivity tool. It s a spreadsheet

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap

### The Ordered Array. Chapter Chapter Goals. Organizing and Presenting Data Graphically. Before you continue... Stem and Leaf Diagram

Chapter - Chapter Goals After completing this chapter, you should be able to: Construct a frequency distribution both manually and with Excel Construct and interpret a histogram Chapter Presenting Data

### How to make a line graph using Excel 2007

How to make a line graph using Excel 2007 Format your data sheet Make sure you have a title and each column of data has a title. If you are entering data by hand, use time or the independent variable in

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### Using Excel as a Management Reporting Tool with your Minotaur Data. Exercise 1 Customer Item Profitability Reporting Tool for Management

Using Excel as a Management Reporting Tool with your Minotaur Data with Judith Kirkness These instruction sheets will help you learn: 1. How to export reports from Minotaur to Excel (these instructions

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Week 1. Exploratory Data Analysis

Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

### Using Excel for Analyzing Survey Questionnaires Jennifer Leahy

University of Wisconsin-Extension Cooperative Extension Madison, Wisconsin PD &E Program Development & Evaluation Using Excel for Analyzing Survey Questionnaires Jennifer Leahy G3658-14 Introduction You

### Data Visualization Techniques

Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

### Assignment objectives:

Assignment objectives: Regression Pivot table Exercise #1- Simple Linear Regression Often the relationship between two variables, Y and X, can be adequately represented by a simple linear equation of the

### Module 2: Introduction to Quantitative Data Analysis

Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...

### An introduction to using Microsoft Excel for quantitative data analysis

Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

### STC: Descriptive Statistics in Excel 2013. Running Descriptive and Correlational Analysis in Excel 2013

Running Descriptive and Correlational Analysis in Excel 2013 Tips for coding a survey Use short phrases for your data table headers to keep your worksheet neat, you can always edit the labels in tables

### AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

### There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel