# Simple Predictive Analytics Curtis Seare

Size: px
Start display at page:

Transcription

1 Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010

2 Contents Section I: Background Information Why use Predictive Analytics? How to use this Guide Analysis Fundamentals Method for Creating Predictive Models Section II: Predictive Models How to Choose an Appropriate Model Regression Correlation Linear Regression Multivariate Exponential Regression Logarithmic Regression Polynomial Regression Time Series Logistic Regression ANOVA t-test 1 Way ANOVA 2 Way ANOVA Chi-Square 2 P age

3 Why Use Predictive Analytics? Strategy Development Predictive analytics is an indispensible tool for strategy development. In any good strategy, there are three elements: a set of assumptions, a set of actions that need to be taken, and a set of desired outcomes. The outcomes are achieved as a result of the actions being executed and the assumptions being true. In essence, we say: Assuming A is true, if we do B we will achieve C. Predictive analytics helps out with all three of the steps considered above. Statistical tests tell us if our assumptions are correct and predictive models fit to historic data help us quantify what the result will be with a given set of inputs. The value of having a statistically valid predictive model in strategy development is that you can know what you need to do to produce the best results. Further, simple predictive models are not above most people s abilities to produce. It requires some understanding of statistics, as well as various predictive modeling techniques. This guide provides you with both the statistical basics you need to know, as well as a process that can be followed to choose and implement an appropriate predictive modeling technique for your given situation. It stays admittedly away from many technicalities, so that focus can be placed just on the information that is necessary to get a predictive model built and in working order. Example Business Questions Answered The reason we do predictive analytics is to solve business problems. Accordingly, here are some example business questions that can be answered with each of the predictive models described in this guide. Regression Correlation From a large database of demographical data, which factors are associated with response rates? From all the metrics on our website, which ones are associated with purchases on the site? From all the medical data collected in a patient sample, which factors are associated with blood pressure? Linear Regression If we send out x number of mail pieces to a target customer segment, how many sales will we get in return? If we sell x number of product A, how much can we expect to sell of product B? If x number of people sign up for this promotional program, how many extra sales can we expect? Multivariate Linear Regression 3 P age

4 How many subscriptions can we expect to get by spending specific amounts of time in various social media marketing channels? What is the optimal word count, number of graphics, and number of topics covered in our weekly newsletter in order to get the most click-throughs? Exponential Regression Model what happens in a word of mouth marketing campaign builds slowly at first, but after a certain point it rises exponentially Model a learning curve at first, it takes a long time to perform a specific task on a new piece of software, but the more you do it, the time lowers to a constant level Logarithmic Regression Model the percentage of total expected calls that have come in after a mail campaign is executed - this percentage will grow quickly but taper off as time goes on Polynomial Regression What is the appropriate amount of customer contacts per month to maximize sales? Where should we put the price in order to sell the maximum amount of a given product? Time Series How many sales are we likely to have in the next year? How much do the specific months of the year affect our close rates? Logistic Regression What is the probability that someone of a given age will respond to our marketing campaign? What is the probability that someone will purchase this product given a certain price? ANOVA t-test Does gender affect the sales rate of our products? Does this medical treatment affect the blood pressure of our patients? Does this training course increase the efficiency of our staff? 1 Way ANOVA Compare the response rates of various customer segments to a specific marketing campaign, and then allocate more resources to the segment with the higher response Compare response rates of different variations of a mailer piece, and use the mailer that has the greatest response rate Compare sales conversion rates according to traffic referral information from your web analytics, and then focus on getting more traffic from the highest converting source 2 Way ANOVA 4 P age

5 What is the best combination of marketing channel and product offering to get the highest sales rates? Chi-Square Out of all of our customer segments, which ones are the most likely to buy our products? Do our former assumptions about our customer segments still hold true? 5 P age

7 Next, click on the Add-Ins section on the left side, and click Go... Lastly, check the Analysis ToolPak and Solver Add-in check boxes, and click OK 7 P age

8 Analysis Fundamentals Although an entire book could be written about the fundamentals of good analysis, here we will cover just two fundamentals that are the most critical. These two basics are seeing the data in context and segmentation. Seeing the Data in Context Understanding what the data are telling you within the context of the business situation being analyzed is extremely important. This will help you avoid making faulty conclusions and keep your analysis appropriate for the business question being answered. The best way to learn this fundamental is to see it in action, so we will take an example. We will look at a type of direct mail campaign analysis. We want to know how many calls are expected to come into our call center after we execute the campaign. First, we take some historical data showing us the percentage of total calls coming in according to the number of days after starting a mail campaign, shown below. After creating a scatter plot of the data, we try to fit a logarithmic regression line as a model, shown seen below. 120% 100% 80% 60% 40% 20% 0% y = ln(x) R² = Even though the R 2 tells us that the fit is good, the model may not be the best way to explain this data when the context and purpose of this analysis are considered. We want the model to be able to predict what percentage of total calls will come in from a mailing campaign so we can staff the call 8 P age

9 center. If I were to use the line above as the model, I would be predicting low values for incoming calls between about day 20 and 100, and high values thereafter. Because of this error, we would not be staffing the call center correctly. To create a better model, I would consider the fact that, in this context, it is not necessary to fit a trend model to the entire data set. Consider the following model, which can be used to predict the percentage of total calls coming in between days 4 and 35 after the mailing campaign: 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% y = ln(x) R² = Days You will notice that this trend model does not contain the same high and low errors as the previous model did. Further, upon doing some calculations on the data in the spreadsheet, we know that anything before day 4 makes up for just 8% of all calls, and anything after day 35 makes up for just 15% of all calls. I have highlighted with a model the time period of the biggest growth to the call percentage, while summarizing the remaining percentages on either side. This will give just the right amount of information needed to staff the call center, while minimizing errors I would have made trying to fit a single trend model to the data. The point here is to look at the data in the context of the purpose of the analysis. What are you going to use the predictive model for? Is it necessary to fit a model to the entire data set? How exact do you need to be with the prediction? What is the most important part of the data set to model? These and other questions are important to consider when performing analysis. Segmentation The second fundamental of analysis is the practice of segmenting the data. As with seeing the data in context, this is best described with an example. Consider the analysis presented below, which shows a linear regression model to predict how much someone will likely donate to your cause according to their age. 9 P age

10 Donation \$35.00 \$30.00 \$25.00 \$20.00 \$15.00 \$10.00 \$5.00 \$- Donation vs. Age y = x R² = Age The fit of the model is extremely weak, and there seems to be no relationship between donation and age. However, this data was taken and aggregated from two different cities, Boston and New York. If we separate out the data according to those two cities (otherwise known as segmenting by them), we get the following when we run a regression analysis: \$35.00 \$30.00 Donation vs. Age y = x R² = \$25.00 Donation \$20.00 \$15.00 \$10.00 \$5.00 \$- y = x R² = Age Boston New York By segmenting the data first, we notice that there is, in fact, a relationship between donation and age, but that relationship differs depending on what city you are in. 10 P age

11 Method for Creating Predictive Models Process Outlined 1. Choose a predictive model according to the business question 2. Check to see if all the conditions for the model are met 3. Carry out the analysis 4. Check for statistical significance and fit 5. Validate the predictive model 6. Refine the predictive model 1. Choose a Predictive Model Once you have a business question in mind, you must look at what kinds of variables you will be working with in order to answer this question. The models in this guide will help you with the following four situations, which are fairly comprehensive: Predict a quantitative outcome with quantitative explanatory variables Predict a yes or no outcome with quantitative explanatory variables Predict a quantitative outcome according to categorical explanatory variables Predict a categorical outcome according to categorical explanatory variables The difference between quantitative and categorical variables is described below: Quantitative Variable Anything that can be measured or counted is a quantitative variable. This includes things such as heights, distances, weights, # of items, etc. Because these variables are numerical, you can perform mathematical operations on them, such as addition, multiplication, division, etc. Below is an example of what quantitative variables look like in a spreadsheet: Above, the variables are Sales, Costs, and Distance, each followed by the data points that make them up. Categorical (or Qualitative) Variable 11 P age

13 Break the data you are testing into smaller categories (a practice known as segmentation), and then build separate models for each of the categories If you are using a model containing more than one explanatory variable, you will want to use the R 2 adjusted to give you the correct value for this statistic. 5. Validate the Predictive Model After you have a statistically significant, well fit model, it s important to do one last test to check its performance. You do this by applying the predictive model to data that you ve collected, but that was not used in the creation of the model. One method for doing this is to, before creating the predictive model, split the data into two sections. Use one section of data to build the model, and then test the model on the other section to see how accurate its predictions are. 6. Refine the Predictive Model Now you are ready to apply the predictive model. As you continue to use it, it is important to check its performance and look at ways to improve it. This way your model will stay current with the changing environment. 13 P age

14 How to Choose an Appropriate Model Predict a quantitative outcome with quantitative explanatory variables: Regression Correlation Linear Regression Multivariate Exponential Regression Logarithmic Regression Polynomial Regression Time Series Predict a yes or no outcome with quantitative explanatory variables: Logistic Regression Predict a quantitative outcome according to categorical explanatory variables: ANOVA t-test 1 Way ANOVA 2 Way ANOVA Predict a categorical outcome according to categorical explanatory variables: Chi-Square 14 P age

15 Regression Overview Predictive Model Regression models always take the form of an equation, with x representing the input, or explanatory variable, and y representing the output, or response variable. It is the most common type of statistical test run to create predictive models. It allows the practitioner to predict the outcome of a quantitative variable according to one or more quantitative inputs. Choosing the Appropriate Model The type of regression you choose to use will be according to the type of relationship the variables exhibit. (You can see this relationship visually by making a scatter plot) Correlation, Linear Regression, and Multivariate Linear Regression all describe a linear relationship between variables: 20 Linear Relationship Exponential Regression describes an exponential relationship: 200 Positive Exponential P age

16 Logarithmic Regression describes a logarithmic relationship: Polynomial Regression describes a polynomial relationship: Time Series describes a trend and/or seasonal relationship: 600 Time Series P age

17 General Regression Conditions The following conditions must hold for all the statistical tests described in this section x and y must be quantitative y values must have a normal distribution o On a standardized residual plot, this is true if you see more values close to zero and less further away, (This is true in the plot below) o If your sample size is over 50, it s less important that this condition is met Standardized Residuals y values must have the same variance around each x o Looking at the best fit line on a scatter plot - this condition is not met if y output values within specific ranges of x tend to be further or closer from the best fit line than all other y value. (below you can see that the y values spread out further from the best fit line as x gets larger, so the criteria is not met) 17 P age

18 The data must be homogeneous. You can look at the scatter plot to make sure of this (you can t have large spans of x values with no data). Non Homogeneous Group The residuals must be independent. You can tell this from the Standardized Residual plot if there is no pattern in the data (for instance, an upward trend), then they are independent. (The only pattern that is alright to see in the data is that more values are closer to zero than further away. This does not negate independence). The plot below shows independence, as there is no pattern. If this condition is not met, you may need to run a time series analysis (the only test in this section that does not require this condition to be met) Standardized Residuals How to Check Conditions in Excel 2007 To make a scatter plot highlight the two data sets of interest, the hit insert scatter choose upper left option 18 P age

19 To get a best fit line right click a data point on the scatter plot, click Add Trendline, and then choose the line that best represents the data To get a standardized residual plot hit Data Data Analysis Regression highlight inputs, and check box for standardized residuals, click OK. Now, highlight Observation and Standard Residuals and make a scatter plot. Regression General Warnings Watch for outliers. You can find them on the standardized residual plot generally points higher than 3 or lower than -3 are considered outliers (these numbers represent standard deviations from the mean). You can see an example of one below: If you have outliers in the data, do the following: Check and make sure there wasn t a mistake in the collection of the data. If there was, then throw out the outlier from the data set. 19 P age

20 20 P age If there was no mistake, it may not be ethical or accurate to throw it out, because it is a real point of data. It s best to run the regression with and without the outliers, present both results, and give an interpretation of what the outliers may signify.

21 Correlation Uses Correlation is used to find which quantitative variables are associated with each other. It is a preparatory analysis before creating predictive linear regression models. Correlation analysis is especially useful in crunching a lot of data quickly in order to find where relationships exist. Once these relationships are found, it is easier to know what variables to use for a linear predictive model. Example Questions Answered From a large database of demographical data, which factors are associated with response rates? From all the metrics on our website, which ones are associated with purchases on the site? From all the medical data collected in a patient sample, which factors are associated with blood pressure? Conditions x and y must be quantitative x and y must have a linear relationship (on a scatter plot, it looks like you could draw a line through them, as seen below) 20 Linear Relationship Check for normality (if normality is not met, you must use a nonparametric test) o y values must have a normal distribution and the same variance around each x On a standardized residual plot, this is true if you see more values close to zero and less further away (see General Regression Conditions for graphic) If your sample size is over 50, it s less important that this condition is met o y values must have the same variance around each x Look at the best fit line on the scatter plot - this condition is not met if higher or lower values of x tend to be further from the best fit line than other values (see General Regression Conditions for graphic) The data must be a homogeneous group (see General Regression Conditions for graphics) The residuals must be independent. You can tell this from the Standardized Residual plot if there is no pattern in the data (for instance, an upward trend), then they are independent. (see 21 P age

22 General Regression Conditions for graphic) If this condition is not met, you may need to run a time series analysis How to in Excel First, run the correlation - hit Data Data Analysis Correlation, check the box for Labels in First Row, and highlight all the data you want to test for correlation (including column labels), hit OK. You will get out a matrix of r values. Below, the values that have a high correlation are highlighted, showing which data sets are associated with each other. 2. Next, make scatter plots of the correlated data sets highlight the two data sets of interest, then hit insert scatter choose upper left option 3. To fit a line on the scatter plot right click a data point on the plot and click add trendline choose linear close. This will give you a better idea on how well the data are correlated. Analysis Correlation Coefficient, r (Pearson s coefficient) This measure shows the strength and direction of the correlation between two quantitative variables. Positive values mean that as x increases, so does y Negative values mean that as x increases, y decreases Values are always between -1 and 1 o.8 1 means a very strong correlation o.6-.8 means strong correlation o.4-.6 means some correlation o Less than.4 means little or no correlation Scatterplot This shows graphically the strength, direction, and consistency of the association Warnings You cannot predict anything or say one variable is dependent upon another in this analysis; you can merely state how strongly two variables are or are not correlated to each other Watch for Outliers 22 P age

23 Nonparametric Counterpart If the normality condition is not met, you will have to use the nonparametric counterpart to this test, which is known as Spearman s rank. Spearman s Rank test can be used with ordinal data (explained below), there is no need for a normal distribution, and the data don t even need to have a linear relationship. Ordinal Data These are qualitative variables with a special feature: they can be ordered and given a numerical value. For example, if a survey asks you to rate your customer experience on a scale from 1=very poor to 5=excellent, it is collecting ordinal data. The numbers 1-5 are categories related to how good the experience was, and their order holds meaning. In simple categorical variables, this is not true (you cannot order male and female in any way to give added meaning). 23 P age

24 Linear Regression Predictive Model The predictive model you will create in a linear regression analysis is an equation, shown below: y = Ax + B Above, y is the outcome variable, and x is the explanatory variable. A is known as the coefficient of x, and B is a constant. The regression test will allow you to find the appropriate coefficient and constant values to define the relationship between y and x, allowing you to predict outcomes according to specific x values. Uses Linear regression predicts the value of one response variable, y, given the value of one explanatory variable, x. It is often done directly after a correlation analysis. Example Questions Answered If we send out x number of mail pieces to a target customer segment, how many sales will we get in return? If we sell x number of product A, how much can we expect to sell of product B? If x number of people sign up for this promotional program, how many extra donations can we expect? Conditions x and y must be quantitative x and y must have a linear relationship Check for normality (if normality is not met, you must use a nonparametric test) o y values must have a normal distribution and the same variance around each x On a standardized residual plot, this is true if you see more values close to zero and less further away (see General Regression Conditions for graphic) If your sample size is over 50, it s less important that this condition is met o y values must have the same variance around each x Look at the best fit line on the scatter plot - this condition is not met if higher or lower values of x tend to be further from the best fit line than other values (see General Regression Conditions for graphic) Must have a homogeneous group (see General Regression Conditions for graphics) The residuals must be independent. You can tell this from the Standardized Residual plot if there is no pattern in the data (for instance, an upward trend), then they are independent. (see General Regression Conditions for graphic) If this condition is not met, you may need to run a time series analysis 24 P age

25 How to in Excel To make a scatter plot highlight the two data sets of interest, then hit insert scatter choose upper left option. If you notice a linear relationship, continue with this method. 2. un the regression Data Data Analysis Regression highlight the explanatory variable for Input X Range, and the response variable for Input Y Range check box for Standardized Residuals OK 3. The p-values,r 2, and coefficients of interest are highlighted below in the regression output Analysis p-value (The predictive model for the above output would be y =.051x ) There will be a p-value for the coefficient and the intercept, and both must be below.05 for the predictive model to be completely validated. R 2 This shows the percentage of variability in y that is explained by x in the predictive equation. If the value is equal to 1, that means all the variability in y is explained perfectly (don t ever expect this to happen). Confidence Interval Excel gives you Lower 95% and Upper 95% values for both the intercept and the coefficients. These values provide a range in which, statistically, you are 95% sure that the true value of either the intercept or the coefficient lies within. These values are highlighted below: 25 P age

26 With the above data, I can say that I am 95% sure that the true value of the intercept of my equation lies between 57 and 321. I can also say that I am 95% sure that the true value of the x coefficient lies in between.04 and.06. Warnings Don t predict responses in y for an x that is outside the range of the data you built the model on; you can t be sure the relationship will hold outside this range. Watch for Outliers Nonparametric Counterpart If the normality condition is not met, you will have to use the nonparametric counterpart to this test. One popular nonparametric test that can replace any parametric regression test is MARSplines (Multivariate Adaptive Regression Splines). It doesn t impose the condition that any specific type of relationship exists between the variables (such as linear or exponential), can be used with multiple explanatory variables, and can even predict multiple outcome variables. 26 P age

27 27 P age

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### When to use Excel. When NOT to use Excel 9/24/2014

Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

### Tutorial Segmentation and Classification

MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE Perhaps Microsoft has taken pains to hide some of the most powerful tools in Excel. These add-ins tools work on top of Excel, extending its power and abilities

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Figure 1. An embedded chart on a worksheet.

8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the

### A Guide to Using Excel in Physics Lab

A Guide to Using Excel in Physics Lab Excel has the potential to be a very useful program that will save you lots of time. Excel is especially useful for making repetitious calculations on large data sets.

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Spreadsheet software for linear regression analysis

Spreadsheet software for linear regression analysis Robert Nau Fuqua School of Business, Duke University Copies of these slides together with individual Excel files that demonstrate each program are available

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Using Excel for Statistical Analysis

Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

### How to Get More Value from Your Survey Data

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

DATA ANALYSIS QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University Quantitative Research What is Statistics? Statistics (as a subject) is the science

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

### Adverse Impact and Test Validation Book Series: Multiple Regression. Introduction. Comparison of Compensation using

Adverse Impact and Test Validation Book Series: Multiple Regression Using Multiple Regression to Examine Compensation Practices Introduction Reasons for Investigating Pay Equity: The Equal Pay Act of 1963

### Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

### R with Rcmdr: BASIC INSTRUCTIONS

R with Rcmdr: BASIC INSTRUCTIONS Contents 1 RUNNING & INSTALLATION R UNDER WINDOWS 2 1.1 Running R and Rcmdr from CD........................................ 2 1.2 Installing from CD...............................................

### Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### Data Analysis in SPSS. February 21, 2004. If you wish to cite the contents of this document, the APA reference for them would be

Data Analysis in SPSS Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Heather Claypool Department of Psychology Miami University

CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

### An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

### E x c e l 2 0 1 0 : Data Analysis Tools Student Manual

E x c e l 2 0 1 0 : Data Analysis Tools Student Manual Excel 2010: Data Analysis Tools Chief Executive Officer, Axzo Press: Series Designer and COO: Vice President, Operations: Director of Publishing Systems

### An introduction to using Microsoft Excel for quantitative data analysis

Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Call Centre Helper - Forecasting Excel Template

Call Centre Helper - Forecasting Excel Template This is a monthly forecaster, and to use it you need to have at least 24 months of data available to you. Using the Forecaster Open the spreadsheet and enable

### Mathematics within the Psychology Curriculum

Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### How To Analyze Data In Excel 2003 With A Powerpoint 3.5

Microsoft Excel 2003 Data Analysis Larry F. Vint, Ph.D lvint@niu.edu 815-753-8053 Technical Advisory Group Customer Support Services Northern Illinois University 120 Swen Parson Hall DeKalb, IL 60115 Copyright

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### Using Excel for Statistics Tips and Warnings

Using Excel for Statistics Tips and Warnings November 2000 University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID Contents 1. Introduction 3 1.1 Data Entry and

### Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

### Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students Scott A. Sinex Barbara A. Gage Department of Physical Sciences and Engineering Prince

### SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis