ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
|
|
|
- Natalie Reynolds
- 9 years ago
- Views:
Transcription
1 DATABASE MARKETING Fall 2015, max 24 credits Dead line ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls. That file includes sorted respondent scores from the analysis file (30 customers) and the validation file (30 customers) of a modeling exercise, and it gives you the upper and lower bounds for the 10 % buckets that you should use in your gains chart. For both the analysis and validation file calculate the response rate in the 10 buckets and gain over total (in the same manner as you have in the slides). Remembering that this is only a small sample of 30+30, describe how the model is working. As part of your answer please also give a table similar to slide number 9 in slide set 7 (incremental gains charts). PART B Predicting and scoring using SAS EM 1 ASSIGNMENT OUTLINE AND DATA 2 QUESTIONS AND REPORT INSTRUCTIONS 3 COMPUTER INSTRUCTIONS 3.1PROJECT-LIBRARY-DATA SOURCE-DIAGRAM 3.2 VARIABLE DEFINITIONS-SAMPLES-TRANSFORMATIONS 3.3 MODELING 3.4 ASSESSING THE MODELS
2 2 1 ASSIGNMENT OUTLINE AND DATA The data used in this exercise is available in \\work\courses\e\27\e20100\cooking.sas Please use your network drive or usb stick for storage. This applies for both data- and project files. So first of all copy your data there. I am using all the time directory aaa on the Desktop for data files. Last year Books-By-Mail test promoted a new cook book Quick & Easy called to 9,592 names selected randomly from their primary book buyer segment. The response rate received was 3.79%. Names and all data were saved point-in-time of the promotion. In preparation for immediate roll-out, the product manager requests that you build response models to assist her in identifying those names in her primary book buyer segment most likely to order Quick & Easy. In predictions we use three different models: regression analysis, trees and neural networks. Identify the best model to use among them. If a customer orders the promoted cook book the profit margin is 16 euros before promotion costs. Promotion costs are 0.65 euros per promotion. The data file contains 5 predictor variables and an order indicator denoting who in the sample ordered the cook book (your dependent/target variable). Details of these variables can be found below.
3 3 VARIABLE DESCRIPTIONS OF DATAFILE cooking.sas Variable Name Num/Char Definition ORDER Numeric Indicates if customer ordered or not AGE50PL Numeric Indicates if customer is age 50+ based on purchased enhancement data. 1, if customer is 50 years of age or older HISTORY Numeric Response to previous promotions 0, if customer did not respond to one or more among the four last promotions 1, if customer responded to one or more among the four last promotions GENDER Numeric Indicates gender of customer 1, if no information is available 2, if male 3, if female TPAID Numeric Indicates the total number of all paid books 1 = 1 product paid 2 = 2 products paid 3 = 3 products paid 4 = 4 products paid 5 = 5 products paid 6 = 6 products paid 7 = 7 products paid 8 = 8 products paid 9 = 9 products paid 10 = 10 products paid 11 = 11 products paid 12 = 12 products paid 13 = 13 products paid 14 = 14 products paid 15 = 15 products paid 16 = 16+ products paid TSLBO 0 = no book orders placed 1 = 0-6 months ago 2 = 6-12 months ago 3 = months ago 4 = months ago 5 = months ago 6 = months ago 7 = months ago 8 = months ago 9 = months ago 10 = months ago 11 = months ago 12 = 66+ months ago Numeric Indicates the customers elapsed time in months since their last book order across all genres Note, though TSLBO and TPAID have their last classes open (16+ and 66+) we use them as if they were not in order to deal with them as continuous variables.
4 4 2 QUESTIONS AND REPORT INSTRUCTIONS This assignment involves estimating several response models with SAS EM. The purpose of this assignment is not to make you understand every model in detail. Instead, the purpose is to briefly introduce a few models and show the basics of comparing different models. For this reason this assignment proceeds by following computer instructions in detail. Your task is to follow instructions, and answer questions about what you have done. Despite the mechanical nature of this assignment, it is important to think what is being done and why. Predictive response models play a big role in data driven marketing and this assignment gives you the opportunity to get acquainted with the relevant ingredients in relatively minor effort. NOTICE: When you are asked to estimate a model, please use 60 % of the data for analysis and 40 % for validation. QUESTION 1 (pen and paper, no need for SAS EM) Assume we decide not to send a promotion for anyone whose predicted expected profit is less than zero. For this reason we need to know, how likely ordering needs to be for a particular customer in order for customer to break even. a) Given the profit margin and cost information we have, calculate the break-even probability (i.e lowest probability for which expected profit is non-negative). b) Next assume that 5 per cent of the customers ordering the book do not pay it. In such a case the cost for the firm is 7 euros (in addition to the postal cost). What is the predicted minimum probability now for breaking even? QUESTION 2 First you need to estimate a logit regression model, and answer questions related to it. Your logit regression model is calculating to you coefficients for the explanatory variables (in slide set 6 slide 18). The coefficients a, b1, b2, you will thus get in the output of your regression model. Report which variables were statistically significant in the regression model. What are the zero and alternative hypotheses in the t-test that we carry out? What are the values of
5 5 the regression coefficients? Comment the signs. Are they as you would expect? Is there any indication of multi-collinearity? Next you use the regression model to do few sample predictions. What does this model predict to be the ordering probability for the example customers A and B? A B AGE50PL 1 0 HISTORY 0 1 GENDER 2 3 TPAID 2 1 TSLBO 4 2 If our criterion for promoting will be that the expected profit should be strictly positive, should A and B be promoted? Tips: It is handy to use excel to calculate the probabilities required. The exponential function is EXP in excel. When estimating the model in SAS EM, TLSBO and TPAID need to be transformed to log transformation to make them more like normal distributed. You must then use this transformation also when applying the observed TLSBO and TPAID in the excel formula. E.g if TLSBO=2 and your transformation is LOG(TLSBO), you apply LN(2) in excel. Note that in the SAS, LOG means the same thing as LN in excel (natural logarithm) Example: Assume that in your model you have two significant variables, transformed TLSBO, the transformation being log(tlsbo) and the intercept. Then you calculate: D =a + b1*log(tlsbo) and then apply exp(d)/(1+exp(d)) to get the probabilities. QUESTION 4 Next we estimate a neural network model (again for the same data). After how many iterations did the neural network stop training? Why did it stop training? Include the diagrams describing the training process for average error rate and misclassification rate.
6 6 QUESTION 3 Next we estimate a decision tree for the same data. Describe the decision tree splits (all the splits). What is the final definition of the subgroups (segments) of customers (so in which groups did the tree split the customers)? Describe the two best groups in terms of expected response rate that the tree found (validation data response rate is the criterion but report also analysis data response rate)? Why are there differences in the two data sets? Also report the number of observations (both in the analysis and in the validation sample) in the groups. If you wish to have a response rate of minimum 6 %, which segments do you promote? Take care to use all the possible information to exclude customers whose expected response rate is below 6 %. What is the percentage you expect to promote of the whole list of names if you use this rule of 6 %? Calculate also the expected response rate (e.g. using weighted average). Please write down clearly the calculations and not just the final figure. QUESTION 5 Next we compare the three models that we estimated. We use lift chart to do the comparison. Based on the lift chart, which of the three models turned out to be the best? Why do we use the validation data as criterion of model goodness? Print the graph in the report to complement your answer. Describe the gains in the response rate expected with the model use (in other words interpret the lift chart). QUESTION 6 Next look at the % Response chart. That is telling you the response rates (in the course slides we were looking at incremental gains charts that included the same kind of information). Comment the monotonicity of the chart. The monotonicity in the validation file, i.e. descending response rates indicates a good model. Now use chart to respond to question: using the break even probability you calculated in Question 1 a, check what percentage of the list of target customers (used in roll out) are you going to promote if you use the best model?
7 7 Look next at the Cumulative % Response chart: what is the response rate you are expecting to get if you define the percentage to be promoted as described above (all promoted need to have an expected profit that is nonnegative? QUESTION 7 Use SEMMA and list which steps in our analysis belong under the SEMMA steps. What is the big picture in this exercise? You were doing a lot of modeling. Why? What happens after the modeling? Where our assignment stops?
8 8 3 COMPUTER INSTRUCTIONS FOR PART B 3.1 PROJECT-LIBRARY-DATA SOURCE-DIAGRAM This first part includes the preliminary preparations for constructing the flow needed to build and assess the predictive models. Open Enterprise Miner. Programs SAS SAS Enterprise Miner Define a new project for you. The project may include different data files as well as different diagrams. PROJECT File New Project Define your project path and name. DEFINE YOUR PROJECT ON THE DESKTOP. Otherwise you may not be able to access it later. LIBRARY EM wants to have all the data files you use in a library. It is only a directory the path to which you need to define.
9 9 You can define a new data source, project, diagram or library clicking the arrow on the right of the sun residing on the toolbar. I refer to that by sun-click. Define a new library here called garden on your desktop. Sun-click Library tick New Library -Next and define the path Select Next Finish. DATA SOURCE Define the data file we use. Sun-click Data Source- Next Browse Garden Cooking Ok Next Next Finish.
10 10. Note that this file is now visible on the upper left corner DIAGRAM Next we define the Diagram. A project can contain several diagrams. Sun-click Diagram. Given the name Ok.
11 11 When you click Diagrams under cooking you a new diagram opens. That is the space where you build the analysis flow.
12 VARIABLE DEFINITIONS-SAMPLES-TRANSFORMATIONS NOW WE START BUILDING THE DATAFLOW The flow means dragging different icons from the toolbar as building blocks of the process flow. We add nodes and connect them with arcs. Note that the icons to be used as nodes have been arranged in groups: Sample Explore Modify Model Assess etc Our data source is cooking.sas7bdat. First we drag this file to the diagram space. The first node thus is isis On the left in the grey area represents the menu for the node. The default settings can be seen. Click the Cooking icon and then on the left Variables and we see the variables that the data file includes. VARIABLE DEFINITIONS
13 13 The first column defines the role of the variable. We are predicting the order so the important Target variable for us is ORDER. Replace the role input that column with Target clicking the cell and selecting the Target under the appearing arrow. All the other variables have role Input. Next column Level refers to the measurement level. Variable Gender measurement level should be nominal (three different alternative values), Age50pl and History are binary and ORDER is binary as well. Do the change in the same way as when changing the Role.. TSLBO and TPAID are interval (meaning they are dealt with as continuous). The data manipulations always should start with getting acquainted with the data. As an example we view the distribution of TPAID (mark the row and click Explore). We see that it is far from being normal distributed, which is the desired feature in a regression model. To close click Ok.
14 14 Next embed the information about expected profit margins and costs. On the left click Decisions Click Build in the next window and Decision Weights in the next one. You see a matrix. Column with Decision 1 is to promote and Decision 2 is not to promote. The row variables correspond to the response of the respondent. If we send the promotion and it is successful, then our profit is dollars. If we do not get response then we lose 0.65 dollars. If we do not promote it costs nothing. Note that we maximize.
15 15 Click Ok to exit the window. Next we wish to append new icons to the project flow. Drag the Data Partition node under Sample and connect it with Cooking just using your mouse. On the left under Data Set Allocations allocate for Analysis 60 % and Validation 40 %. At this stage normally we would deal with the missing values but to make the assignment simpler all missing values have been dealt with. Also we would deal with outliers and explore the distributions of the variables.
16 16 VARIABLE TRANSFORMATIONS For regression analysis the transformation of variables is needed. For regression the explanatory variables should be close to normal. Below you see distributions of our continuous variable. They are not close to normal. Drag the node Transform Variables under Modify and connect Data Partition with it. Click Transform Variables see the menu below. With interval inputs select log. This is a common transformation that is used to make the distribution of a variable more normal.
17 17 Note that there is also an option maximize normality. Do NOT use it in this case (will cause trouble with the interpretation). This is all we need here Run the flow right-clicking on Transform Variables and chooserun. You will see the transformations that were made. 3.3 MODELING REGRESSION ANALYSIS Choose under Model the Regression node and connect it with Transform Variables. You can define the status of the variables i.e. define the variables you use in your regression model (you will now use all the available ones) clicking in the menu below Variables.
18 18 You see EM suggests a model with main effects, that is all we are going to use. The regression type suggested is Logistic Regression which is the special regression analysis with 0/1 dependent variable. Our choice! When we roll down this menu you see the following.
19 19 Here we are especially interested in specifying the Model Selection type in regression. You may choose Backward/Forward/Stepwise where we ask you to use Stepwise. When you run the flow (Right-click the Regression node and Run ) and choose Results in the pop-up window, the results window contains two parts of most interest for you. The next set of independent/explanatory (input) variables that I will present here is only a subset of the variables that you use. The results are presented as an example to highlight the interface and interpretation of the figures and graphs. The Effects plot is providing for you the information of the final regression model as a histogram. Only those variables that are significant explanatory variables (whose coefficient is different from zero after statistical testing) are present in the histogram. You see their names and coefficient values estimated when you move the cursor on the bars. Below there are are two significant explanatory variable in the model and moreover the intercept is significant. Blue color means that the corresponding coefficients are negative. It is nice to see the coefficient values in the graph, which you will see by selecting rightclick Graph Properties and tick Show Labels. Still you need to keep track of the variable that has the coefficient.
20 20 Those that are interested in the t measures (statistical measures) you may look at them from View-Model-Estimate Selection Plot. In the Example below you see that there have been three steps in the selection process and finally in the third step there are three variables in the model (and the intercept). We know now that all the variable coefficients in the logit model are significantly different from zero.
21 21 FOR THOSE THAT HAVE MORE BACKGROUND IN STATISTICS: You may select t value in the drop-down menu. A t-value aways corresponds to a p-value which is more familiar to us. COEFFICIENT INTERPRETATION FOR CLASS VARIABLES Regression analysis deals with class variables in a way that it produces a constant for all the possible values except one that is chosen to have the value zero. In your data if HISTORY0 is included in the set of significant variables with coefficient it means that if your HISTORY value is 1, then that has a 0 effect on your score but if it is 0 then the coefficient tells the effect. Moreover if GENDER1 and GENDER2 are appearing among your significant variables it tells you that GENDER being 3 (meaning female) has a zero effect on the score. Assume that for GENDER2 (male) the parameter estimate is Then it means that compared with the case of a female customer the probability to order is bigger for a male. Take an example. We have four significant variables: intercept (-2.8), GENDER (with coefficient 0.02 when gender is 2) HISTORY (coefficient -0.5 when history is 0) and TSLBO (-0.2). Then according to the logit regression formula (check your slides) a customer that is male with no previous resposes to the promotions and for whom the TSLBO is 4 (and remember the transformation we made, the log, log(4)=0.6) has the score (probability to order)
22 * *0.6 e /(1+ e ) = Now we could check how good the model is in predicting, how good the fit is. There are some measures displayed for that as well. This time the normal measure of fit is NOT going to be used. However, we will later assess the goodness of the model when we have got all the three models fitted. We note that window Score Rankings overlay with option Lift is giving us information how many times we can multiply the expected response rate if we use the model for top 10 per cent best customers. This time we may multiply the response rate by 3. Thus in case (assumption) the baseline response rate without model for the respondents was we may expect the response rate of 0.11 among the best 10 % or the respondents. NOTE : it may seem odd that the intercept has been negative in our examples. However, if all the other regression coefficients are 0 and only the intercept is used then the predicted ORDER value is e /(1+ e ) =0.074 > 0
23 23 NEURAL NETWORK Select AutoNeural node under Model and connect Data Partition with it. Thus we are using the nontransformed variable in the neural network this time. We need not adjust anything in the AutoNeural node because its purpose is to choose good settings except that increase the number of Maximum iterations to 15. Note that the Termination criterion defined overfitting refers to the minimization of the Average Error or misclassification rate (I assume that the misclassification rate is the criterion that is used though it is nowhere reported). Right-click on the node and select. Run. When the program stops it asks you if you wish to view the Result in a pop-up. Click Results.
24 24 The average error does not decrease in the validation sample after 5 iterations and this is true also as for the misclassification rate. Thus the model reached in 5 iterations is selected as the model to be used. Remember that the neural network is a black box, it does not produce as easily interpretable information on how it got the results, what were the essential variables etc.
25 25 TREE Add a decision tree node into your chart under Model and link it with Data Partition. Note that we will NOT use transformed variables here to make it easier for you to interpret the tree output. Looking at the options on the left we will use the default values but will change the significance level to 0.3 (to have a bigger tree for you to interpret). After that only right-click on the node and Run. A tree will find for you with sequential splits of the data groups that differ most in their response behavior. We will view below a tree not from the same data as ours. I is in window Tree Map. Below you see a tree where the splits are always into two. We see that if the number of paid products is exceeding 7 then this group will respond to promontion with probability 49.4 % (note you need to look at the validation file). However only very very few customers have more than 7 orders. CKBK082 indicates if the respondent responded to another recent promotion. We see that if he/she did and even if the total number of paid orders is 7 or smaller the probability of responding to this offer in this group is 7 % (note, again you need to look at the validation column). Furthermore we see that if the number of paid orders is exceeding 3 but belo 8 and moreover CKBK082 offer was taken favorably then the predicted response rate is as high as 15.7 %.
26 26 If we consider the tree above we see that my file I had 3355 respondents in the analysis file and 1441 respondents in the validation file. In the rows 0 refers to non-respondents and 1 to respondents. We see that the response rate is 5.0 % in both of the files. The splits are made to distinguish responders from non-responders. CHECK ALWAYS ALSO THE NUMBER OF OBSERVATIONS IN EACH OF THE BOXES PRODUCED. WE MAY IDENTIFY A GROUP WITH A HIGH RESPONSE RATE BUT IF ONLY VERY FEW OBSERVATIONS BELONG TO THAT GROUP IT IS OF NO USE. NOTE In case no tree will appear for you, first check that you have provided the decision weights earlier. If that does not help, pls send me an and you will receive
27 27 a tree, you analyse it and in the assessment node you assess only regression and neural networks. 3.4 ASSESSING THE MODELS This time we unfortunately will pass the tree without any further considerations and will next assess the three models. Drag the Model Comparison node under Assess and connect the three analysis nodes to it. Right-click Model Comparison on and Run. You will be displayed the following types of windows. The Score Ranking Overlay window below gives the cumulative lift of the models. You see the validation file if you click on the lower border of the window and drag down. In a lift chart (gains chart) the customers are ranked from best to worst based on probability of responding. Idea in lift chart is to compare response models to a no model scenario. For example if response rate in a dataset is 5%, then taking a random sample, whose size is10% from this dataset, would have a 5% response rate as well. Now let s assume that a response model has a lift of, for example, 3.5. This means that with the model it is possible to handpick the best 10% and within this sample achieve a response rate that is 3.5 times larger than response rate of no model random sample. This also means that lift of nomodel scenario equals 1. Lift can also be expressed as gains. Gain is the percentage increase in response rate model can bring to a sample. Therefore our example lift of 3.5 would be the same thing as a gain of 250 %. When looking at the lifts charts and other charts below, view the validation file!
28 28 Using the drop-down menu you may choose alternative charts. Next we look at the % Response chart. That includes much the same information as the incremental gains charts in the slides (except the gains column), specifically the response % for the validation sample ranked top down. Now we only have much smaller buckets. Notice that the response rate is monotonously decreasing in the validation file in the graph below except at the end (not so serious). Specifically we may see below e.g. (always read the validation file) that if we promote everyone exceeding a break even point, say, 5 % we promote only 16 per cent using the neural model. However, we may also use the regression or tree model in which case we may promote 30 per cent of the target population..
29 29 16 % 30 % Now, which is better. In order to answer, we must look at the cumulative % Response chart. Below you see the cumulative % Response chart. If we promote 16 % of the target segment employing the autoneural model we will expect a respose rate of 9.5 %. Instead if we use the regression or tree model we get the response rate of 6.5 % (reading the result in the graph). Now which model should we use?
30 % 6.5 % 16 % 30 % If we use the neural network model and the size of the target population is X (and not take into account that part of the responders do not pay) we get the profit ( )*0.095*0.16*X (0.65)*0.905*0.16*X = 0,117*X If we use the regression or the tree model the expected profit is ( )*0.065*0.3*X (0.65)*0.935*0.3*X = 0,1392*X Thus we use the regression model where the profit is greater. Opening the saved project Open EM Start Enterprise Miner. File Open Project your project on Desktop or your usb. It normally also offers to you the most recent project. Open your diagram.
31 31 WHAT HAPPENS NEXT? Our assignment ends at assessing the three models that we used to predict the ordering of a customer. In the assessment node we could see how useful the models were in finding the customers that had the highest likelihoods to respond to our promotion. In real life the results of the sample are applied into a segment in the database to score it and identify those to promote and those who are not going to be promoted.
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
SAS Analyst for Windows Tutorial
Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating
Sample Table. Columns. Column 1 Column 2 Column 3 Row 1 Cell 1 Cell 2 Cell 3 Row 2 Cell 4 Cell 5 Cell 6 Row 3 Cell 7 Cell 8 Cell 9.
Working with Tables in Microsoft Word The purpose of this document is to lead you through the steps of creating, editing and deleting tables and parts of tables. This document follows a tutorial format
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
Drawing a histogram using Excel
Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Gestation Period as a function of Lifespan
This document will show a number of tricks that can be done in Minitab to make attractive graphs. We work first with the file X:\SOR\24\M\ANIMALS.MTP. This first picture was obtained through Graph Plot.
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing
What Do You Think? for Instructors
Accessing course reports and analysis views What Do You Think? for Instructors Introduction As an instructor, you can use the What Do You Think? Course Evaluation System to see student course evaluation
Directions for Frequency Tables, Histograms, and Frequency Bar Charts
Directions for Frequency Tables, Histograms, and Frequency Bar Charts Frequency Distribution Quantitative Ungrouped Data Dataset: Frequency_Distributions_Graphs-Quantitative.sav 1. Open the dataset containing
Statgraphics Getting started
Statgraphics Getting started The aim of this exercise is to introduce you to some of the basic features of the Statgraphics software. Starting Statgraphics 1. Log in to your PC, using the usual procedure
Using Microsoft Excel to Plot and Analyze Kinetic Data
Entering and Formatting Data Using Microsoft Excel to Plot and Analyze Kinetic Data Open Excel. Set up the spreadsheet page (Sheet 1) so that anyone who reads it will understand the page (Figure 1). Type
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
Excel 2007 A Beginners Guide
Excel 2007 A Beginners Guide Beginner Introduction The aim of this document is to introduce some basic techniques for using Excel to enter data, perform calculations and produce simple charts based on
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)
Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different) Spreadsheets are computer programs that allow the user to enter and manipulate numbers. They are capable
How To Analyze Data In Excel 2003 With A Powerpoint 3.5
Microsoft Excel 2003 Data Analysis Larry F. Vint, Ph.D [email protected] 815-753-8053 Technical Advisory Group Customer Support Services Northern Illinois University 120 Swen Parson Hall DeKalb, IL 60115 Copyright
Client Marketing: Sets
Client Marketing Client Marketing: Sets Purpose Client Marketing Sets are used for selecting clients from the client records based on certain criteria you designate. Once the clients are selected, you
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Formulas, Functions and Charts
Formulas, Functions and Charts :: 167 8 Formulas, Functions and Charts 8.1 INTRODUCTION In this leson you can enter formula and functions and perform mathematical calcualtions. You will also be able to
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Microsoft Excel Tutorial
Microsoft Excel Tutorial by Dr. James E. Parks Department of Physics and Astronomy 401 Nielsen Physics Building The University of Tennessee Knoxville, Tennessee 37996-1200 Copyright August, 2000 by James
Preface of Excel Guide
Preface of Excel Guide The use of spreadsheets in a course designed primarily for business and social science majors can enhance the understanding of the underlying mathematical concepts. In addition,
USING EXCEL ON THE COMPUTER TO FIND THE MEAN AND STANDARD DEVIATION AND TO DO LINEAR REGRESSION ANALYSIS AND GRAPHING TABLE OF CONTENTS
USING EXCEL ON THE COMPUTER TO FIND THE MEAN AND STANDARD DEVIATION AND TO DO LINEAR REGRESSION ANALYSIS AND GRAPHING Dr. Susan Petro TABLE OF CONTENTS Topic Page number 1. On following directions 2 2.
Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining
PowerWorld Simulator
PowerWorld Simulator Quick Start Guide 2001 South First Street Champaign, Illinois 61820 +1 (217) 384.6330 [email protected] http://www.powerworld.com Purpose This quick start guide is intended to
Excel Guide for Finite Mathematics and Applied Calculus
Excel Guide for Finite Mathematics and Applied Calculus Revathi Narasimhan Kean University A technology guide to accompany Mathematical Applications, 6 th Edition Applied Calculus, 2 nd Edition Calculus:
Excel 2003 A Beginners Guide
Excel 2003 A Beginners Guide Beginner Introduction The aim of this document is to introduce some basic techniques for using Excel to enter data, perform calculations and produce simple charts based on
STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.
STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter
SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis
SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS Add-In 2.1 for Microsoft Office: Getting
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
Scientific Graphing in Excel 2010
Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.
PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD. To explore for a relationship between the categories of two discrete variables
3 Stacked Bar Graph PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD To explore for a relationship between the categories of two discrete variables 3.1 Introduction to the Stacked Bar Graph «As with the simple
This activity will show you how to draw graphs of algebraic functions in Excel.
This activity will show you how to draw graphs of algebraic functions in Excel. Open a new Excel workbook. This is Excel in Office 2007. You may not have used this version before but it is very much the
einstruction CPS (Clicker) Instructions
Two major approaches to run Clickers a. Anonymous b. Tracked Student picks any pad as s/he enters classroom; Student responds to question, but pad is not linked to student; Good for controversial questions,
In This Issue: Excel Sorting with Text and Numbers
In This Issue: Sorting with Text and Numbers Microsoft allows you to manipulate the data you have in your spreadsheet by using the sort and filter feature. Sorting is performed on a list that contains
Netigate User Guide. Setup... 2. Introduction... 5. Questions... 6. Text box... 7. Text area... 9. Radio buttons...10. Radio buttons Weighted...
Netigate User Guide Setup... 2 Introduction... 5 Questions... 6 Text box... 7 Text area... 9 Radio buttons...10 Radio buttons Weighted...12 Check box...13 Drop-down...15 Matrix...17 Matrix Weighted...18
Microsoft Excel Basics
COMMUNITY TECHNICAL SUPPORT Microsoft Excel Basics Introduction to Excel Click on the program icon in Launcher or the Microsoft Office Shortcut Bar. A worksheet is a grid, made up of columns, which are
APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING
Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology
Excel 2003 Tutorial I
This tutorial was adapted from a tutorial by see its complete version at http://www.fgcu.edu/support/office2000/excel/index.html Excel 2003 Tutorial I Spreadsheet Basics Screen Layout Title bar Menu bar
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2003. Data Mining Using SAS Enterprise
Creating a Gradebook in Excel
Creating a Spreadsheet Gradebook 1 Creating a Gradebook in Excel Spreadsheets are a great tool for creating gradebooks. With a little bit of work, you can create a customized gradebook that will provide
Summary of important mathematical operations and formulas (from first tutorial):
EXCEL Intermediate Tutorial Summary of important mathematical operations and formulas (from first tutorial): Operation Key Addition + Subtraction - Multiplication * Division / Exponential ^ To enter a
To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click.
EDIT202 Spreadsheet Lab Assignment Guidelines Getting Started 1. For this lab you will modify a sample spreadsheet file named Starter- Spreadsheet.xls which is available for download from the Spreadsheet
Task Force on Technology / EXCEL
Task Force on Technology EXCEL Basic terminology Spreadsheet A spreadsheet is an electronic document that stores various types of data. There are vertical columns and horizontal rows. A cell is where the
Creating and Using Forms in SharePoint
Creating and Using Forms in SharePoint Getting started with custom lists... 1 Creating a custom list... 1 Creating a user-friendly list name... 1 Other options for creating custom lists... 2 Building a
How to make a line graph using Excel 2007
How to make a line graph using Excel 2007 Format your data sheet Make sure you have a title and each column of data has a title. If you are entering data by hand, use time or the independent variable in
How To Use Spss
1: Introduction to SPSS Objectives Learn about SPSS Open SPSS Review the layout of SPSS Become familiar with Menus and Icons Exit SPSS What is SPSS? SPSS is a Windows based program that can be used to
Increasing Productivity and Collaboration with Google Docs. Charina Ong Educational Technologist [email protected]
Increasing Productivity and Collaboration with Google Docs [email protected] Table of Contents About the Workshop... i Workshop Objectives... i Session Prerequisites... i Google Apps... 1 Creating
The Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
Using Excel for Business Analysis: A Guide to Financial Modelling Fundamentals
Excel 2003 Instructions Using Excel for Business Analysis: A Guide to Financial Modelling Fundamentals contains extensive instructions for using Excel 2010 and Excel for Mac 2011. There are a few instances
EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002
EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002 Table of Contents Part I Creating a Pivot Table Excel Database......3 What is a Pivot Table...... 3 Creating Pivot Tables
Excel 2007 Basic knowledge
Ribbon menu The Ribbon menu system with tabs for various Excel commands. This Ribbon system replaces the traditional menus used with Excel 2003. Above the Ribbon in the upper-left corner is the Microsoft
Creating a PowerPoint Poster using Windows
Creating a PowerPoint Poster using Windows Copyright 2001 Michael Dougherty ([email protected]) Purpose The purpose of this tutorial is to illustrate how to create a 3 x 4 ft. poster using PowerPoint. This
Produced by Flinders University Centre for Educational ICT. PivotTables Excel 2010
Produced by Flinders University Centre for Educational ICT PivotTables Excel 2010 CONTENTS Layout... 1 The Ribbon Bar... 2 Minimising the Ribbon Bar... 2 The File Tab... 3 What the Commands and Buttons
Using the SAS Enterprise Guide (Version 4.2)
2011-2012 Using the SAS Enterprise Guide (Version 4.2) Table of Contents Overview of the User Interface... 1 Navigating the Initial Contents of the Workspace... 3 Useful Pull-Down Menus... 3 Working with
Working with Tables: How to use tables in OpenOffice.org Writer
Working with Tables: How to use tables in OpenOffice.org Writer Title: Working with Tables: How to use tables in OpenOffice.org Writer Version: 1.0 First edition: January 2005 First English edition: January
S P S S Statistical Package for the Social Sciences
S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
City of De Pere. Halogen How To Guide
City of De Pere Halogen How To Guide Page1 (revised 12/14/2015) Halogen Performance Management website address: https://global.hgncloud.com/cityofdepere/welcome.jsp The following steps take place to complete
Excel Tutorial. Bio 150B Excel Tutorial 1
Bio 15B Excel Tutorial 1 Excel Tutorial As part of your laboratory write-ups and reports during this semester you will be required to collect and present data in an appropriate format. To organize and
Copyright EPiServer AB
Table of Contents 3 Table of Contents ABOUT THIS DOCUMENTATION 4 HOW TO ACCESS EPISERVER HELP SYSTEM 4 EXPECTED KNOWLEDGE 4 ONLINE COMMUNITY ON EPISERVER WORLD 4 COPYRIGHT NOTICE 4 EPISERVER ONLINECENTER
Intellect Platform - Tables and Templates Basic Document Management System - A101
Intellect Platform - Tables and Templates Basic Document Management System - A101 Interneer, Inc. 4/12/2010 Created by Erika Keresztyen 2 Tables and Templates - A101 - Basic Document Management System
Creating a Poster in PowerPoint 2010. A. Set Up Your Poster
View the Best Practices in Poster Design located at http://www.emich.edu/training/poster before you begin creating a poster. Then in PowerPoint: (A) set up the poster size and orientation, (B) add and
ABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your
Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)
Business Intelligence Professor Chen NAME: Due Date: Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*) Tutorial Summary Objective: Richard would
SPSS Workbook 1 Data Entry : Questionnaire Data
TEESSIDE UNIVERSITY SCHOOL OF HEALTH & SOCIAL CARE SPSS Workbook 1 Data Entry : Questionnaire Data Prepared by: Sylvia Storey [email protected] SPSS data entry 1 This workbook is designed to introduce
Using Word 2007 For Mail Merge
Using Word 2007 For Mail Merge Introduction This document assumes that you are familiar with using Word for word processing, with the use of a computer keyboard and mouse and you have a working knowledge
SPSS Manual for Introductory Applied Statistics: A Variable Approach
SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Introduction To Microsoft Office PowerPoint 2007. Bob Booth July 2008 AP-PPT5
Introduction To Microsoft Office PowerPoint 2007. Bob Booth July 2008 AP-PPT5 University of Sheffield Contents 1. INTRODUCTION... 3 2. GETTING STARTED... 4 2.1 STARTING POWERPOINT... 4 3. THE USER INTERFACE...
Microsoft Excel 2010 Part 3: Advanced Excel
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Excel 2010 Part 3: Advanced Excel Winter 2015, Version 1.0 Table of Contents Introduction...2 Sorting Data...2 Sorting
Manual. Sealer Monitor Software. Version 0.10.7
Manual Sealer Monitor Software Version 0.10.7 Contents 1 Introduction & symbols 1 2 Installation 2 2.1 Requirements 2 2.2 Installation process 2 3 Menu & Tooblar 5 3.1 File menu 5 3.2 Print menu 6 3.3
Migrating to Excel 2010 from Excel 2003 - Excel - Microsoft Office 1 of 1
Migrating to Excel 2010 - Excel - Microsoft Office 1 of 1 In This Guide Microsoft Excel 2010 looks very different, so we created this guide to help you minimize the learning curve. Read on to learn key
Excel Companion. (Profit Embedded PHD) User's Guide
Excel Companion (Profit Embedded PHD) User's Guide Excel Companion (Profit Embedded PHD) User's Guide Copyright, Notices, and Trademarks Copyright, Notices, and Trademarks Honeywell Inc. 1998 2001. All
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
EXCEL FINANCIAL USES
EXCEL FINANCIAL USES Table of Contents Page LESSON 1: FINANCIAL DOCUMENTS...1 Worksheet Design...1 Selecting a Template...2 Adding Data to a Template...3 Modifying Templates...3 Saving a New Workbook as
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
PowerPoint 2007 Basics Website: http://etc.usf.edu/te/
Website: http://etc.usf.edu/te/ PowerPoint is the presentation program included in the Microsoft Office suite. With PowerPoint, you can create engaging presentations that can be presented in person, online,
ECDL. European Computer Driving Licence. Spreadsheet Software BCS ITQ Level 2. Syllabus Version 5.0
European Computer Driving Licence Spreadsheet Software BCS ITQ Level 2 Using Microsoft Excel 2010 Syllabus Version 5.0 This training, which has been approved by BCS, The Chartered Institute for IT, includes
Introduction to SPSS 16.0
Introduction to SPSS 16.0 Edited by Emily Blumenthal Center for Social Science Computation and Research 110 Savery Hall University of Washington Seattle, WA 98195 USA (206) 543-8110 November 2010 http://julius.csscr.washington.edu/pdf/spss.pdf
Using Pivot Tables in Microsoft Excel 2003
Using Pivot Tables in Microsoft Excel 2003 Introduction A Pivot Table is the name Excel gives to what is more commonly known as a cross-tabulation table. Such tables can be one, two or three-dimensional
The following is an overview of lessons included in the tutorial.
Chapter 2 Tutorial Tutorial Introduction This tutorial is designed to introduce you to some of Surfer's basic features. After you have completed the tutorial, you should be able to begin creating your
Microsoft PowerPoint Tutorial
Microsoft PowerPoint Tutorial Contents Starting MS PowerPoint... 1 The MS PowerPoint Window... 2 Title Bar...2 Office Button...3 Saving Your Work... 3 For the first time... 3 While you work... 3 Backing
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
2: Entering Data. Open SPSS and follow along as your read this description.
2: Entering Data Objectives Understand the logic of data files Create data files and enter data Insert cases and variables Merge data files Read data into SPSS from other sources The Logic of Data Files
Maximizing Microsoft Office Communicator
Maximizing Microsoft Office Communicator Microsoft Office Communicator is an instant messaging tool on the standard image for CG workstations. This Tech Tip contains basic instructions on how to use the
ReceivablesVision SM Getting Started Guide
ReceivablesVision SM Getting Started Guide March 2013 Transaction Services ReceivablesVision Quick Start Guide Table of Contents Table of Contents Accessing ReceivablesVision SM...2 The Login Screen...
How to Make the Most of Excel Spreadsheets
How to Make the Most of Excel Spreadsheets Analyzing data is often easier when it s in an Excel spreadsheet rather than a PDF for example, you can filter to view just a particular grade, sort to view which
State of Illinois Web Content Management (WCM) Guide For SharePoint 2010 Content Editors. 11/6/2014 State of Illinois Bill Seagle
State of Illinois Web Content Management (WCM) Guide For SharePoint 2010 Content Editors 11/6/2014 State of Illinois Bill Seagle Table of Contents Logging into your site... 2 General Site Structure and
Q&As: Microsoft Excel 2013: Chapter 2
Q&As: Microsoft Excel 2013: Chapter 2 In Step 5, why did the date that was entered change from 4/5/10 to 4/5/2010? When Excel recognizes that you entered a date in mm/dd/yy format, it automatically formats
MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC
MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL
Microsoft Excel Tips & Tricks
Microsoft Excel Tips & Tricks Collaborative Programs Research & Evaluation TABLE OF CONTENTS Introduction page 2 Useful Functions page 2 Getting Started with Formulas page 2 Nested Formulas page 3 Copying
Data Visualization. Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004
Data Visualization Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004 Contents Brief Overview of ArcMap Goals of the Exercise Computer
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
