USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS

Size: px
Start display at page:

Download "USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS"

Transcription

1 USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS Kirk L. Johnson, Tennessee Department of Revenue Richard W. Kulp, David Lipscomb College INTRODUCTION The Tennessee Department of Revenue (TDR) uses SAS/STAT REG procedure to develop statistical models to predict which sales and use tax field audits will yield the highest return per hour spent on the audit. To perform the analysis, the TDR uses the SAS System computer software which runs on both the state's mainframe computer and on personal computers in the Department. This process involves running SAS programs against taxpayer files on the state's mainframe computer and downloading subsets of data based on taxpayers' business types to a personal computer. The downloaded data is analyzed using PROC REG. This paper reports on our use of SAS diagnostics to compare competing models and to analyze potential problems in the data. Since the formulas used to calculate the statistics discussed in this paper are readily available in SAS documentation, we have chosen, for the most part, not to include this information in the paper. We, of course, relied very heavily upon SAS/STAT Guide for personal Computers, Version 6 Edition (Cary, NC: SAS Institute Inc., 1985) for our descriptions of the REG procedure and tried to conform to SAS terminology in so describing these procedures. In addition, some of the PROC REG's options discussed below produce a large amount of printed output. Therefore, the statistics reported in this paper were extracted from SAS output. We will be glad to make the full output available upon request. USING REGRESSION ANALYSIS TO PREDICT ASSESSMENTS Regression analysis can be used to do the following: - to explain how the independent variables account for variation in the dependent variable - to estimate the magnitude and signs of the parameters - to screen variables and rank them in order of importance - to predict, forecast, or estimate the dependent variable. As noted above, we are primarily interested in using regression analysis to predict the hourly return from sales tax field audits. It is important to state clearly the purpose for which a regression model is to be used since a model that predicts well may not necessarily be the best model for estimating parameters or performing some other task. Model Selection We have chosen to develop a different model for each business type for which there is sufficient audit history to justify the analysis. By a different model, we mean that the independent variables used in the models will differ from one business type to another. This is based upon our experience as well as the experience of other states which indicates that the variables which are useful for predicting assessments for one business type may not be useful for predicting assessments for another business type. Several exploratory techniques are available to assist in identifying which variables to include in the models. These include forward selection, backward selection, and stepwise selection. In Version 6.03, these are invoked using the SELECTION option of the MODEL statement of PROC REG. The syntax of the option is as follows: PROC REG DATA=SASdataset; MODEL dependents=regressors /SELECTION=name P COLLIN INFLUENCE PARTIAL; where name can be FORWARD (or F), BACKWARD (or B), STEPWISE, MAXR, MINR, RSQUARE, ADJRSQ, CP, or NONE (the full model). The default is NONE. P,, COLLIN, INFLUENCE and PARTIAL invoke the diagnostic procedures discussed below. 1047

2 Because of the large number of variables which are being considered (43) and the large number of models which are produced (95), it is necessary to develop a set of procedures to reduce the number of models which must be considered for each business group. The following outlines these procedures: Example 1 Use SELECTION=STEPWISE to reduce the number of variables. The default significance levels for entry into the model (0.15) and for staying in the model (0.15) were used. Use SELECTION=ADJRSQ and SELECTION=CP with variables selected by STEPWISE to fin~ models with best adjusted R and Mallow's Cpo Use P option to calculate PRESS statistic for competing models. Use option to calculate variance inflation factors and COLLIN option for collinearitydiagnostics. Use INFLUENCE and PARTIAL options to produce influence diagnostics and partial regression residual plots. We have a business type in the retail trade sector for which 67 sales tax audits have been performed, yielding an average per hour assessment of $621. The STEPWISE option produced the following model: part~l Mod~ Mallow's Step Entered R R Cp 1 T GROSS T-BALDUE T-EXEMPT GROSS BALDUE STRUCF EXEMPT where T GROSS=total gross sales, T BALDUE=total tax due, T-EXEMPT=total exempt sales, GROSS2=total gross sales squared, BALDUE2=total tax due squared, EXEMPT2=total exempt sales squared, and STRUCF=a dummy variable indicating whether the taxpayer registered as a foreign corporation,(le., corporate headquarters located outside TenneSSee). Mallow's C p reported in the table above is a prediction oriented statistic which indicates the presence of bias in a model. A C p greater than p+l (where p = the number of parameters in the model) is an indicator of an incompletely specified model. A C p less than p+l indicates the model is overspecified (i.e., the model contains too many variables). The recommended model is where C first approaches p+l (startirg from the full model). As the above table indicates, since Mallow's C for the last variable entered inpthe model is less than p+l (which would be eight in this case),,it is possible that this model is overfitted. As the table below indicates, this model compares well with the models produced using other selection methods: Selection Adju~ted PRES MSE Option R2 R (x 10 ) (X 10 3 ) NONE FORWARD BACKWARD STEPWISE As va iables are added to a model, the R 2 will always increase or, in the worse case, remain the same. Thus, the model with the highest R2 is not nec~ssarily the best model. Adjusted R takes into consideration the number of independent va 2 iables in the model. An adjusted R2 which is substantially less than R indicates that the model is overfitted. T~at is to say, the increase in R due to the additional variables included in the model does not make up for the loss of the degrees of freedom. None of the adjusted R 2 's reported above are causes for concern. Since we are most interested in predicting assessments per hour, we have relied heavily upon the SAS prediction diagnostics. The PRESS statistic is the sum of squares of predicted residual errors where the predicted residual for observation i is defined as the residual for the ith observation that results from oropping it from the parameter estimates. In evaluating competing models, a lower PRESS indicates better prediction capability. The model produced by STEPWISE has a much lower PRESS than the other models. PROBLEMS IN REGRESSION ANALYSIS Two well-known problems in the data used in regression analysis are particularly endemic to data dealing 1048

3 with tax assessments. These problems are multicollinearity among the values of the independent variables and influence data points. Multicollinearity Multicollinearity is present when an independent variable is nearly a linear combination of other independent variables in the model. Multicollinearity affects regression analysis in the following ways: A. produces large variances of coefficients. B. results in'unstable coefficients. c. produces regression coefficients that are too large in magnitude. D. can result in poor prediction. Given that prediction is our main goal, the potential presence of multicollinearity among the independent variables used in a model should be carefully investigated. An example of multicollinearity would be a business type where gross sales and exempt sales were highly correlated. In this case, the analyst may want to consider removing one of variables from the model. The and COLLIN options are collinearity diagnostics provided by SAS. The option reports the variance inflation factor which can be interpreted as follows: for a given variable, the variance inflation factor measures how much larger the variance of the parameter estimate is than if there was no multicollinearity present. As a rule of thumb, a greater than ten (10) can be used as an indicator of a potential collinearity problem. The COLLIN option produces a table which includes eigenvalues, condition indices, and variance proportions which can be used to examine which terms are causing the problem. The number of eigenvalues near zero indicate the number of near linear dependencies. Large values for the condition number also indicates collinearity. High loadings on the variance proportions indicate which terms are causing the problem. Example 1 (Continued) In the above example, we are co'ncerned about possible collinearity between T GROSS and T BALDUE and between GROSS2 and BALDUE2. As the variance inflation factors reported below indicate, the seven variable model selected by STEPWISE in the above example would appear to have multicollinearity problems: T GROSS 936 T-BALDUE 887 T-EXEMPT 18 GROSS BALDUE2 271 STRUCF 1 EXEMPT2 8 The table below reports the eigenvalues and condition numbers associated with this model: Condition Number Eigenvalue Number l7 The small eigenvalue and large condition number associated with the eighth principal component reported above are indications of a collinearity problem. The table below reports the variance proportions for the variables with the highest loadings on the eighth component: Variance Proportions Number T GROSS T BALDUE GROSS2 BALDUE Since the variable T GROSS has the highest variance inflation factor and the highest variance proportion for the eighth component, the decision was made to drop it from the model. This resulted in only a slight drop in adjusted R 2 whereas PRESS and Mallow 1 s Cp for the six variable model are ~lightly better. Moreover, the variance inflation factors, as the table below indicates, showed marked improvement although they still indicate the presence of collinearity in the model: T BALDUE 18 T-EXEMPT 10 GROSS 2 42 BALDUE2 28 STRUCF 1 EXEMPT

4 As the table below indicates, with the exception of dropping T EXEMPT which is discussed below, efforts to improve the model by dropping additional variables resulted in diminishing predictive capability based on PRESS and Mallow's C p (Mallow's Cp statistic was calculated using the full model MSE): AD~ PRESij R (x ~O) c p P+l Model T GROSS, T BALDUE, T EXEMPT, GROSS2. BALDOE2, STRUCF, EXEMPT T BALDUE, T EXEMPT, GROSS2, BALDUE2, STRUCF. EXEMPT T BALDUE, GROSS2, BALDUE2, STRUCF. EXEMPT T BALDUE, T EXEMPT, BALDUE2, STRUCF, EXEMPT T BALDUE, T EXEMPT, GROSS2, STRUCF. EXEMPT GROSS2, T EXEMPT, BALDUE2, STRUCF, EXEMPT GROSS2, BALDUE2, STRUCF, EXEMPT GROSS2. STRUCF, BALDUE2 ~ GROSS2, BALDUEZ What we seem to have here is a situation were two va-riables, GROSS2 and BALDUE2, are collinear but both must be included for the model to have an acceptable adjusted R2, PRESS, and Mallow's Cpo Reported below are the parameter estimates associated with the six variable model: Parameter standard Estimate Error prob>iti INTERCEP T BALDUE T-EXEMPT GROSS BALDUE STRUCF EXEMPT The presence of a variable, T EXEMPT, in the model which is not significant at the 0.05 level is also of concern. As the table above indicated, by dropping this variable, the improves slightly in adjusted mod R 1 l, PRESS, and Mallow's Cpo As reported below, the variance inflation factors are either the same or slightly better than the six variable model. T BALDUE 16 GROSS2 43 BALDUE2 28 STRUCF 1 EXEMPT2 1 Thus, the decision was made to use the five variable model. The parameter estimates are reported below: Parameter standard Estimate Error prob>iti INTERCEP T BALDUE GROSS BALDUE STRUCF EXEMPT Influence Data points Influence data points are points which exert an undue influence on the regression equation. Thi$ may be the result, for example, of an outlying observation. If a set of data for a given business type included one extremely large per hour field audit assessment, this data point could possibly exert an undue influence on the regression equation for that business type. It is important to note that the mere presence of such a data point does not necessarily mean that it does exert an undue influence, only that it may do so. If it does, the data point would be termed an outlier. Because of the nature of our data, influence data points are a serious problem for both the dependent and independent variables. The presence of large per hour assessments may produce outliers among the values of the dependent variables for some business types. The presence of large values for some independent variables (particularly large gross sales, large exempt sales, large use taxable, large tax balances due) may produce high leverage data points. The detection of influence data points is not always readily apparent. Moreover, the issue of the remedy is a source of some controversy. While some statisticians may recommend removing outliers from the data~ others do not. If the data' point is valid, that is to say, the data for that observation is correctly measured, then we feel that there should be a compelling reason for removing it from the data set. Example 2 We have a group of manufacturers for which 53 sales tax audits have been performed with an average per hour assessment of $18,241. This extremely high average per hour assessment leads us to suspect that there might be one or more outliers ip the data, that is to say, observations which exert an undue influence on the regression equation. 1050

5 Following the methodology discussed above, the stepwise option was used to select an initial model for analysis. This model is presented below: Mod 2 l Mallow's step Entered R prob>iti Cp 1 USE T USE BALDUE STRUCD DIRPAY STRUCA where USE2=use taxable squared, T USE=total use taxable, BALDUE2=total tax due squared, STRUCD=a dummy variable indicating whether the taxpayer registered as a domestic corporation, DIRPAY=a dummy variable indicating whether the taxpayer has a direct pay permit, and STRUCA=a dummy variable indicating whether the taxpayer registered as a sole proprietor ~ The dominance of USE2 further alerted us to the possibility of a problem with the data. Even though it had a high R 2, the large Mallow's C statistic indicated that the veriable has considerable bias also. In addition, the PRESS statistic for this model was extremely large, indicating poor prediction capability. The INFLUENCE option is used to produces statistics which measure the influence of each observation on the estimates. These statistics include the following: RSTUDENT (the studentized residuals), HAT DIAG H (the hat diagonals), COY RATIO (the covariance ratio), DFFITS (scaled measure of the change in the predicted value for the ith observation), DFBETAS (scaled measures of change in each parameter estimates for each variable included in the model). For the data set and model under consideration, the table below presents the values which would be considered as indicators of potential influence points: Statistic RSTUDENT HAT DIAG H COY RATIO Value If absolute value is greater than 2 If value is greater than.2642 (2p/n where p=number of parameters and n=sample size) If value is less than.6038 or greater than (1 plus or minus 3(p/n)) DFFITS DFBETAS If value is greater than.7268 (2 times the square root of the quantity pin) If value is greater than.2747 (2 over the square root of n) We found that a number of observations had values on one or more of the above statistics indicating that they may exert a large influence on the parameter estimates. One observation (Observation 11 in the data set) seemed to stand out from the others, however. The table below reports the influence diagnostics statistics for this observation: Statistic RSTUDENT HAT DIAG H COY RATIO DFFITS INTERCEP DFBETAS DIRPAY DFBETAS T USE DFBETAS STRUCA DFBETAS STRUCD DFBETAS BALDUE2 DFBETAS Value The values of the above statistics lead us to investigate this observation. We discovered although the data for the observation was correct, the assessment per hour for this observation was so large that it almost completely dominated the regression equation. We felt that we were justified in considering this data point to be an atypical value and therefore removing it from the data set. We removed this observation from the data set and ran PROC REG wit~ the STEPWISE option again. The R for the data set without the observation was This model is presented below: Step Model Mallo~'s Entered R prob>iti Cp 1 BALDUE USECODEO USE DIRPAY PERBALGR where BALDUE=total.tax due squared, USECODEO=a dummy variable indicating whether the taxpayer registered as a peddler, USE2=use taxable squared, DIRPAY=a dummy variable indicating whether the taxpayer has a direct pay permit, and PERBALGR=a derived 1051

6 variable measuring the percent of total tax due to gross sales. Even though the R2 is considerably lower, the PRESS statistic for the model for the data set with the atypical observation is much worse than it was for the model for the data set without the outlier. The PRESS statistic for the former model was 755,576,780,357 whereas for the latter model it was 5,636,307, Similarly, Mallow's C p for the former model was 30, for the latter model it was 15. The mean square error for the former model was 841,432 while for the latter model it was 64,658. Thus, we feel justified in removing the data point from the data set. We ran the INFLUENCE option against this new model to identify any additional influence data points. Using the same criteria discussed above, several data points still had values on the diagnostics which were of concern. Three data points particularly stood out. Two observations had studentized residuals well above the absolute value of two. The third observation had a covariance ratio of 74. Two of these observations had large values for the dependent variable (that is, large assessments per hour) whereas the other observation was the result of a no change audit (i.e., assessment per hour=o). We did not feel at this point in time that any of these observations were sufficie~tly atypical of audits performed by the TDR to justify removing them from the data set. We were concerned with the presence in the model of a term which was not significant at the.05 level. Therefore, we choose to run the model again without the variable PERBALGR. This resulted in a model ~ith a slightly worse adjusted R and PRESS but, as the table below indicates, all terms in the model are now significant at the.05 level. parameter standard Estimate Error Prob>ITI INTERCEP BALDUE USECODEO ~ USE DIRPAY Finally, we ran the option to get the variance inflation factors for the above model. The 's, reported below, indicated that the model did not have a col1inearity problem: BALDUE2 USECODEO USE2 DIRPAY CONCLUDING REMARKS In conclusion, we would like to make some remarks on the SAS diagnostic procedures. SAS offers an impressive array of diagnostics. For the novice the biggest problem may be deciding which diagnostics to use. Moreover, it is extremely easy to invoke most of the diagnostics. All the diagnostics discussed in this paper are options to the model statement. We were also impressed with the enhancements to version 6.03 such as the CP and ADJRSQ model selection options which produce a printout of the models ranked according to the best Mallow C p and adjusted ~2 statistics respectively. An option like this for the PRESS sta~istic would also be useful. We have not had an opportunity, however, to fully evaluate the enhancements to Version We were disappointed with some shortcomings, however. We were disappointed with some of the output. For example, the PARTIAL option which is used to produce partial regression residual plots does not offer a convenient way of identifying the points. Moreover, an option which would plot the regression line for the partial X residua~ on the partial Y residual would also be useful (the slope of this line is equal to the parameter estimate of the independent variable for that plot). Since we are running SAS/STAT on a system with 640K RAM, invoking some of these options on the full model.caused an out-af-memory error message. We were not able, for example, to run the CP model selection option for the full model. In canclusion l for the type of analysis we are interested in performing, we found SAS/STAT to be a very powerful and useful statistical package and would recommend its use in similar types of data analysis applications. 1052

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination

More information

Supplementary PROCESS Documentation

Supplementary PROCESS Documentation Supplementary PROCESS Documentation This document is an addendum to Appendix A of Introduction to Mediation, Moderation, and Conditional Process Analysis that describes options and output added to PROCESS

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Data Desk Professional: Statistical Analysis for the Macintosh. PUB DATE Mar 89 NOTE

Data Desk Professional: Statistical Analysis for the Macintosh. PUB DATE Mar 89 NOTE DOCUMENT RESUME ED 309 760 IR 013 926 AUTHOR Wise, Steven L.; Kutish, Gerald W. TITLE Data Desk Professional: Statistical Analysis for the Macintosh. PUB DATE Mar 89 NOTE 10p,; Paper presented at the Annual

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Paper PO 015. Figure 1. PoweReward concept

Paper PO 015. Figure 1. PoweReward concept Paper PO 05 Constructing Baseline of Customer s Hourly Electric Usage in SAS Yuqing Xiao, Bob Bolen, Diane Cunningham, Jiaying Xu, Atlanta, GA ABSTRACT PowerRewards is a pilot program offered by the Georgia

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

More information

A Comparison of Variable Selection Techniques for Credit Scoring

A Comparison of Variable Selection Techniques for Credit Scoring 1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

More information

An Introduction to Partial Least Squares Regression

An Introduction to Partial Least Squares Regression An Introduction to Partial Least Squares Regression Randall D. Tobias, SAS Institute Inc., Cary, NC Abstract Partial least squares is a popular method for soft modelling in industrial applications. This

More information

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

An Analysis of the Telecommunications Business in China by Linear Regression

An Analysis of the Telecommunications Business in China by Linear Regression An Analysis of the Telecommunications Business in China by Linear Regression Authors: Ajmal Khan h09ajmkh@du.se Yang Han v09yanha@du.se Graduate Thesis Supervisor: Dao Li dal@du.se C-level in Statistics,

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Benchmarking Residential Energy Use

Benchmarking Residential Energy Use Benchmarking Residential Energy Use Michael MacDonald, Oak Ridge National Laboratory Sherry Livengood, Oak Ridge National Laboratory ABSTRACT Interest in rating the real-life energy performance of buildings

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Prepared by Jim Gaetjens Presented to the Institute of Actuaries of Australia

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 5: Model Checking Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Regression Diagnostics Unusual and Influential Data Outliers Leverage Influence Heterosckedasticity Non-constant

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

1 Theory: The General Linear Model

1 Theory: The General Linear Model QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Multiple Regression Using SPSS

Multiple Regression Using SPSS Multiple Regression Using SPSS The following sections have been adapted from Field (2009) Chapter 7. These sections have been edited down considerably and I suggest (especially if you re confused) that

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Introduction to proc glm

Introduction to proc glm Lab 7: Proc GLM and one-way ANOVA STT 422: Summer, 2004 Vince Melfi SAS has several procedures for analysis of variance models, including proc anova, proc glm, proc varcomp, and proc mixed. We mainly will

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS 1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2

More information

Moderator and Mediator Analysis

Moderator and Mediator Analysis Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

NCSS Statistical Software. Multiple Regression

NCSS Statistical Software. Multiple Regression Chapter 305 Introduction Analysis refers to a set of techniques for studying the straight-line relationships among two or more variables. Multiple regression estimates the β s in the equation y = β 0 +

More information

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS Many treatments are equally spaced (incremented). This provides us with the opportunity to look at the response curve

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

More information

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

More information

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend Stock Price Forecasting Using Information from Yahoo Finance and Google Trend Selene Yue Xu (UC Berkeley) Abstract: Stock price forecasting is a popular and important topic in financial and academic studies.

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

SPSS-Applications (Data Analysis)

SPSS-Applications (Data Analysis) CORTEX fellows training course, University of Zurich, October 2006 Slide 1 SPSS-Applications (Data Analysis) Dr. Jürg Schwarz, juerg.schwarz@schwarzpartners.ch Program 19. October 2006: Morning Lessons

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

c 2015, Jeffrey S. Simonoff 1

c 2015, Jeffrey S. Simonoff 1 Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have

More information

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Abstract Virtually all businesses collect and use data that are associated with geographic locations, whether

More information

Chicago Insurance Redlining - a complete example

Chicago Insurance Redlining - a complete example Chapter 12 Chicago Insurance Redlining - a complete example In a study of insurance availability in Chicago, the U.S. Commission on Civil Rights attempted to examine charges by several community organizations

More information

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY Paper PO10 USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY Beatrice Ugiliweneza, University of Louisville, Louisville, KY ABSTRACT Objectives: To forecast the sales made by

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

DEGREES OF FREEDOM - SIMPLIFIED

DEGREES OF FREEDOM - SIMPLIFIED 1 Aust. J. Geod. Photogram. Surv. Nos 46 & 47 December, 1987. pp 57-68 In 009 I retyped this paper and changed symbols eg ˆo σ to VF for my students benefit DEGREES OF FREEDOM - SIMPLIFIED Bruce R. Harvey

More information

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information