Statistics & Analysis


 Caren Perry
 1 years ago
 Views:
Transcription
1 NESUG How to Increase Sales of Orthopedic Equipment in United States: Factor and Cluster Analysis using SAS and R George Obsekov American College of Radiology Research Center Philadelphia, PA INTRODUCTION This paper was designed to analyze the sales of orthopedic equipment to United States hospitals. The purpose of the analysis was to find a way to increase sales from the company to hospitals, and to define the list of hospitals where sale gains could be maximized. In order to construct such a comprehensive list of hospitals, I created a subset of, hospitals based on geographical location (Southern USA). I analyzed different descriptive variables for each hospital such as, number of beds, number of outpatient visits, number of certain types of operations, administrative cost, and etc. My response variable was the sales of rehabilitation equipment. After analyzing scatter plots for each explanatory variable against our response variable I performed either a log or square root transformation to make the scatter plots look close to display linear trend. The next step was to use the factor analysis in order to split all variables into main groups that appropriately describe the different aspects of our hospitals. Based on rotated table results, the first factor included a number of operations; the second factor included the size, and the third one included rehab. After defining all factors I applied a cluster analysis in order to group hospitals with similar characteristics and properties together. Reviewing Ward s minimum variance table, I investigated the gap between SPRSQ values and found that clusters were the best for our appropriate cutoff. Then I chose a cluster with high average sales that contained a few hospitals having very low or no sales. Following regression analysis helped me to determine the list of hospitals with the large possibility of the highest potential sale gains. Finally, I applied the methods for robust clustering (PAM) and for classifications and regression trees (rpart) using Rsoftware. The selected clusters from my cluster analysis were well supported by PAM method as well. MARKET SEGMENT SELECTION In order to increase the sales of orthopedic materials to USbased hospitals I was trying to create a subset of, , hospitals out of the total, hospitals given. The hospitals from the following states were chosen based on geographical selection (Southern states): California, Texas, Louisiana, Alabama, Georgia, Florida, South Carolina, and North Carolina. In total, this gave me the final subset of analyzing the market segment with the amount of, hospitals with variables describing the main characteristics of each hospital in subset. All major variables are presented in Table.
2 NESUG Variables considered in dataset Response variable: Y : Description of variable in data set Sales of Rehabilitation Equipment Jan  July Sales of Rehabilitation Equipment for previous months Comments Zero means missing. ZIP US Postal Code HID Hospital ID CITY City Name STATE State Name BEDS Number of Hospital Beds RBEDS Number of Rehab Beds OUTV Number of Outpatient Visits ADM* Administrative Cost In $ s per year. SIR Revenue from Inpatient HIP9 Number of HIP Operations for 99 KNEE9 Number of KNEE Operations for 99 TH (binary)* Teaching hospital = teaching, = nonteaching. TRAUMA (binary)* Do They Have a Trauma Unit? =Yes, =No. REHAB (binary)* Do They Have a Rehab Unit? =Yes, =No. HIP9 Number of HIP Operations for 99 KNEE9 Number of KNEE Operations for 99 FEMUR9 Number of FEMUR Operations for 99 Table : Demographic and Operational Variables used in the prediction of maximize sales TRANSFORMATION Analyzing the selected subset of hospitals I reviewed all scatter plots of each explanatory variable against my response variable (). After a close analysis it s seems to be clear that all my variables required transformation in order to appear close to linear. Figure A and B showed that variable BEDS can be better in square root transformation rather than log transformation.
3 NESUG BEDS Figure A: VS. BEDS Transformation Selection in SQRT BEDS Figure B: vs. BEDS Transformation Selection in LOG The number of rehab beds (RBEDS) and operational variables (HIP9, KNEE9, HIP9, KNEE9 and FEMUR9) were transformed to be log (+.xi), while OUTV, ADM and SIR variables
4 NESUG appeared to be closer to linear when log (+.xi) was applied. All binary variables (TH, TRAUMA and REHAB) didn t require any transformations. Finally, my response variable was also transformed from y to log (+y) where y was a combination of all sales for rehabilitation equipment. My final scatter plots after transformations appeared in Figure A and B. BEDS RBEDS HIP KNEE HIP KNEE9 Figure A: vs. BEDS, RBEDS, HIP9, KNEE9, HIP9, and KNEE9 after transformation
5 NESUG FEMUR9 OUTV ADM SIR Figure B: vs. FEMUR9, OUTV, ADM, and SIR TRANSFORMATION DIMENTION REDUCTION Dimension reduction has been made by using factor analysis to summarize operational and demographic variables in the selected subset. Using the factor procedure the three factors were constructed for future analysis: an operational factor (HIP9, KNEE9, HIP9, KNEE9, and FEMUR9), a size factor (BEDS, OUTV, ADM, SIR, TH, and TRAUMA) and a rehab factor (RBEDS and REHAB). After initial factor analysis of all variables in one stage I decided to use two stages for factor analysis in order to find a better interpretation of the factors. Factor analysis in two stages forced me to break the variables into two subgroups, one subgroup with operational variables only and another one with a size and rehab. As we see from an eigenvalues Table A for operational variables the eigenvalue for Factor has a proportion of 9.% while the eigenvalue for other factors has the proportion of more than % according to Table C. Factor pattern for stage Two divided all variables into groups: SIZE group (BEDS, OUTV, ADM, SIR, TH and TRAUMA) and REHAB group (RBEDS and REHAB).
6 NESUG Stage One: NFACT= Eigenvalue Difference Proportion Cumulative factor will be retained by the NFACTOR criterion. Table A: Eigenvalues of the Correlation Matrix in Stage One Variable Description Factor HIP9 NUMBER OF HIP OPERATIONS FOR 99.9 KNEE9 NUMBER OF KNEE OPERATIONS FOR 99.9 HIP9 NUMBER HIP OPERATIONS FOR 99.9 KNEE9 NUMBER KNEE OPERATIONS FOR 99.9 FEMUR9 NUMBER FEMUR OPERATIONS FOR 99.9 Table B: Factor Pattern in Stage One including Number of Operations Stage Two: NFACT= Eigenvalue Difference Proportion Cumulative factors will be retained by the NFACTOR criterion. Table C: Eigenvalues of the Correlation Matrix in Stage Two Variable Description Factor Factor BEDS NUMBER OF HOSPITAL BEDS.9. RBEDS NUMBER OF REHAB BEDS ..9 OUTV NUMBER OF OUTPATIENT VISITS ADM ADMINISTRATIVE COST.9 . SIR REVENUE FROM INPATIENT.9 . TH TEACHING HOSPITAL?.. TRAUMA DO THEY HAVE A TRAUMA UNIT?.. REHAB DO THEY HAVE A REHAB UNIT?..9 Table D: Rotated Factor Pattern for Two Factors (SIZE and REHAB) Figure presented Eigen Values distribution for stage using one factor and stage using two factors.
7 NESUG Eigen values for one factor Eigen values for two factors Egien value... Egien value Stage one Stage two Figure : Eigen values for an operational (left) and size/rehab (right) factors Final distribution of analyzing variables is shown in Table. Variable Description Variable Name Stage One Stage Two Factor Factor Factor Number of hospital beds BEDS Number of rehab beds RBEDS Number of outpatient visits OUTV Administrative Cost ADM Revenue from inpatient SIR Number of HIP operation for 99 HIP9 Number of KNEE operation for 99 KNEE9 Teaching hospital TH Do they have a trauma unit? TRAUMA Do they have a rehab unit? REHAB Number of HIP operation for 99 HIP9 Number of KNEE operation for 99 KNEE9 Number of FEMUR operation for 99 FEMUR9 Table Final Distribution of Variables in Factors CLUSTER ANALYSIS I used a cluster analysis in order to determine the best cluster to concentrate on for improving our sales. Table demonstrates Ward s Analysis and presents the biggest jump between cluster and with % difference. Therefore, I chose clusters for my future analysis.
8 NESUG NCL Clusters Joined SPRSQ Difference CL CL9.9 9 CL CL. CL CL. CL CL. CL CL. CL9 OB. CL CL. CL CL. CL CL.9 CL CL.99 CL CL. 9 CL CL.9 CL9 CL. % CL CL. CL CL. CL9 CL. Table : Cluster selection based on cluster history using WARD variance table Next, I created a box plot of sales against the clusters (Figure ). Based on this graph, cluster had the highest mean for sales and had some hospitals within it that didn t have any sales at all. CLUSTER Figure : Box Plot with per CLUSTER
9 NESUG Following examination of the table with means sales (Table ) discovered that the chosen cluster has the highest mean sales. Cluster contained hospitals in it, so it cannot be assumed that they are homogeneous. Since our sample size is large, I applied regression estimate for future analysis. CLUSTER FREQ msales mf mf mf Table : Sales and factors per cluster ( clusters) REGRESSION ANALYSIS In my regression analysis I used the stepwise backwards elimination procedure to determine if any of the factors are significant and must be retained in the model. The elimination did remove two factors and define an operational factor that is significant for our model. Next, I considered which hospitals have no sales. Once they were indentified, I analyzed the gain for each hospital and found that in order to increase the sales of orthopedic equipment; we should concentrate on six hospitals for potential gain of $9,9. Here is the list of the hospitals with their hospitals ID: HID (Galveston, TX), HID (Thomasville, GA), HID 9 (Los Angeles, CA), HID 9 (Valdosta, GA), HID 9 (Fort Myers, FL), and HID 999 (Downey, CA). PAM ANALYSIS IN R I used the R software to apply the method for robust clustering (PAM) in order to identify the best cluster. As we can see on Figure, k= generate the highest average silhouette width (.9). 9
10 SAS Cluster Selection NESUG Figure : Average Silhouette Width Comparison based on Clusters Table proves that robust clustering method used in R match well to the same selection for our market segments as SAS software. clpam PAM Cluster Selection Table : Cluster Selection Table using SAS and R software
11 NESUG RPART ANALYSIS IN R Using RPART analysis I found that the segment with the highest number of potential gain (.) contained missing from 9 total observations (Figure ). Figure : Final Regression Tree with following number of observations: (n=, n=, n=, n=9, n=) I did identify hospitals in this segment to determine if any of those were previously chosen for increasing our sales gain. As shown in Table, hospital with HID 9 (Fort Myers, FL) has been selected before by the regression analysis. Observation CITY STATE HID CLUSTER 9 Hemet CA 99 NA Cape Coral FL 9 NA Hawaiian CA 99 NA Oakland CA 9 NA Greensboro NC NA 9 Melbourne FL 9 NA Sacramento CA 99 NA Fort Myers FL 9 NA Table : Cluster Selection Table using RPART Method
12 NESUG RANDOM FOREST METHOD Using random forest method I found for Fort Myers hospital (HID 9) that the more accurate number for analysis is.9 and exp (.9) can generate about $, sales gain. CONCLUSIONS According to the project I was able to analyze the subset of hospitals, based on geographical selection, and put them into market segments that closely resemble one to another by using cluster analysis. By finding the best cluster, having the highest mean sales that contained hospitals with no sales, I was able to estimate potential sales gain above $, for the following hospitals: HID (Galveston, TX), HID (Thomasville, GA), HID 9 (Los Angeles, CA), HID 9 (Valdosta, GA), HID 9 (Fort Myers, FL), and HID 999 (Downey, CA). I found the final results by running PAM, RPART and Random Forest methods providing strong evidence that these selected hospitals will be the perfect candidates for improving sales of orthopedic equipment as a short term solution. REFERENCES: Statistical Consulting, Javier Cabrera and Andrew McDougall, SpringerVerlag, New York,, No. of pages: xii + 9. ISBN 9 "Understanding Robust and Exploratory Data Analysis," by Hoaglin, Mosteller and Tukey, John Wiley & Sons, 9 SAS Institute Inc., SAS Programming Tips: A Guide to Efficient SAS Processing, Cary, NC: SAS Institute Inc., 99, pp. Rajan Sambandam (9), Cluster Analysis Gets Complicated. Reprinted with permission from the American Marketing Association (Marketing Research, Vol., No., Spring ) Robert Adams (), Merck & Co., Inc., North Wales, PA, Box Plots in SAS : UNIVARIATE, BOXPLOT, or GPLOT? NESUG ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Thanks also to Stan Legum, cochair of section, whose feedback has proved invaluable to the writing of this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: George Obsekov American College of Radiology Research Center Market Street Philadelphia, PA 9 Work Phone:  Fax: Web:
13 NESUG APENDIX: Following code has been used to make full analysis of the selecting dataset: DATA sasuser.hospital; INFILE 'hospital.txt' DELIMITER=','; INPUT ZIP $ HID $ CITY $ STATE $ BEDS RBEDS OUTV ADM SIR Y HIP9 KNEE9 TH TRAUMA REHAB HIP9 KNEE9 FEMUR9; DATA hospital; SET sasuser.hospital; label ZIP = US POSTAL CODE HID = HOSPITAL ID CITY = CITY NAME STATE = STATE NAME BEDS = NUMBER OF HOSPITAL BEDS RBEDS = NUMBER OF REHAB BEDS OUTV = NUMBER OF OUTPATIENT VISITS ADM = ADMINISTRATIVE COST SIR = REVENUE FROM INPATIENT Y = OF REHABILITATION EQUIPMENT SINCE "JAN JULY " = OF REHAB EQUIP FOR THE PREVIOUS "" MO HIP9 = NUMBER OF HIP OPERATIONS FOR "99" KNEE9 = NUMBER OF KNEE OPERATIONS FOR "99" TH = TEACHING HOSPITAL? TRAUMA = DO THEY HAVE A TRAUMA UNIT? REHAB = DO THEY HAVE A REHAB UNIT? HIP9 = NUMBER HIP OPERATIONS FOR "99" KNEE9 = NUMBER KNEE OPERATIONS FOR "99" FEMUR9 = NUMBER FEMUR OPERATIONS FOR "99"; /* new response variable  */ = log(+ +Y); IF = THEN =.; /* code for selecting subsets based on hospital location south STATES*/ IF STATE EQ 'CA' OR STATE EQ 'FL' or state='tx' or state='sc' or state='la' or state='ga' or state='nc' or state='al'; ARRAY X {} BEDS RBEDS HIP9 KNEE9 HIP9 KNEE9 FEMUR9 OUTV ADM SIR; /* STEP TRANSFORMATIONS */ DO I= TO ; X{I} = SQRT(X{I}); END; DO i= to ; X{I} = LOG(+.*X{I}); END; DO I= TO ; X{I} = LOG(+.*X{I}); END; /* factor analysis in two stages, grouping the variables in subgroups */ PROC FACTOR data=hospital METHOD=PRIN NFACT= out=z; VAR HIP9 KNEE9 HIP9 KNEE9 FEMUR9; PROC FACTOR data=hospital METHOD=PRIN NFACT= ROTATE=VARIMAX out=z; VAR BEDS RBEDS OUTV ADM SIR TH trauma rehab; DATA z; set z; factor = factor; keep factor factor; DATA hospout; merge z z;
14 NESUG /*cluster analysis using WARD */ PROC CLUSTER data=hospout METHOD=WARD; VAR factorfactor; COPY ZIP CITY STATE HID BEDS RBEDS OUTV ADM SIR HIP9 KNEE9 TH TRAUMA REHAB HIP9 KNEE9 FEMUR9 factorfactor; PROC TREE NOPRINT NCL= OUT=TXCLUST; COPY ZIP CITY STATE HID BEDS RBEDS OUTV ADM SIR HIP9 KNEE9 TH TRAUMA REHAB HIP9 KNEE9 FEMUR9 factorfactor; /* produce the cluster summary and pick the best cluster*/ PROC sort data= TXCLUST; by cluster; PROC means noprint; BY cluster; VAR factorfactor; OUTPUT out=c mean= msales mfmf; PROC boxplot data= TXCLUST; plot *cluster; SELECT TXCLUST; DATA cl; set TXCLUST; if cluster=; PROC REG DATA=cl; MODEL sales = Factorfactor/ P R selection=b; OUTPUT OUT=C P=PRED R=RESID STDP=STDP; /* finally undo the clusters and calculate the potential gain */ DATA C; SET C; rowp = exp(pred+.*stdp*stdp); epred = exp(pred); sales = exp(sales) ; gain = rowp  sales; PROC sort; by gain; PROC print; /* code for the special case when the cluster size is very small*/ DATA cl; SET TXCLUST; IF cluster=; sales = exp(sales) ; PROC print; PROC means data=cl; VAR sales; endsas; /* suppose the mean of sales is. */ DATA cl; set cl; gain =.  sales; PROC sort; by gain; PROC print;
15 NESUG /***** FACTOR ANALYSIS USING RSOFTWARE hh = read.xport("hosp.xpt") dim(hh) hh[,] library(cluster) plot(silhouette(pam(hh[,:], k=)), main = paste("k = ",),do.n.k=false) plot(silhouette(pam(hh[,:], k=)), main = paste("k = ",),do.n.k=false) plot(silhouette(pam(hh[,:], k=9)), main = paste("k = ",), do.n.k=false) plot(silhouette(pam(hh[,:], k=)), main = paste("k = ",),do.n.k=false) clpam = pam(hh[,:], k=)$cluster table(clpam) table(hh[,]) table(clpam,hh[,]) library(rpart) rpart( ~FACTOR+FACTOR+FACTOR, data=hh) predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh)) table(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh))) length(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh))) length(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh),newdata=hh)) table(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh),newdata=hh)) plot(rpart( ~FACTOR+FACTOR+FACTOR, data=hh)) text(rpart( ~FACTOR+FACTOR+FACTOR, data=hh)) /***** RPART ANALYSIS USING RSOFTWARE library(rpart) hh[,] hh.rp = rpart( ~FACTOR+FACTOR+FACTOR, data=hh) plot(hh.rp) text(hh.rp) hh.rp = rpart( ~FACTOR+FACTOR+FACTOR, data=hh,control=rpart.control(cp=.)) hh.rp = rpart( ~FACTOR+FACTOR+FACTOR, data=hh,control=rpart.control(cp=.)) plot(hh.rp) text(hh.rp) plot(hh.rp, uni=t) text(hh.rp,use.n=true,cex=.) hh.rp = rpart( ~FACTOR+FACTOR+FACTOR, data=hh,control=rpart.control(cp=.)) plot(hh.rp, uni=t) text(hh.rp,use.n=true,cex=.) hh.rp table(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh))) table(predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh),newdata=hh)) predv = (predict(rpart( ~FACTOR+FACTOR+FACTOR, data=hh),newdata=hh)) factor(predv)[:] as.numeric(factor(predv))[:] table(as.numeric(factor(predv))) cluster = (as.numeric(factor(predv))) hh[ cluster==,] factor(predv)[:] exp(.)  factor(predv)[:]
Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.
Paper 6925 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution
More informationPRINCIPAL COMPONENT ANALYSIS
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationIBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
More informationIBM SPSS Direct Marketing 21
IBM SPSS Direct Marketing 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 21 and to
More informationUse of Social Media Data to Predict Retail Sales Performance. Li Zhang, Ph.D., Alliance Data Systems, Inc., Columbus, OH
Paper BI112014 Use of Social Media Data to Predict Retail Sales Performance Li Zhang, Ph.D., Alliance Data Systems, Inc., Columbus, OH ABSTRACT Big data in terms of unstructured social media data is
More informationGetting Started with Minitab 17
2014 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationData Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type
Data Cleaning 101 Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ INTRODUCTION One of the first and most important steps in any data processing task is to verify that your data values
More informationWhere the Bugs Are. Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com. Elaine J.
Where the Bugs Are Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com Elaine J. Weyuker AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 weyuker@research.att.com
More informationAnalyzing Data with GraphPad Prism
Analyzing Data with GraphPad Prism A companion to GraphPad Prism version 3 Harvey Motulsky President GraphPad Software Inc. Hmotulsky@graphpad.com GraphPad Software, Inc. 1999 GraphPad Software, Inc. All
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationAre Automated Debugging Techniques Actually Helping Programmers?
Are Automated Debugging Techniques Actually Helping Programmers? Chris Parnin and Alessandro Orso Georgia Institute of Technology College of Computing {chris.parnin orso}@gatech.edu ABSTRACT Debugging
More informationIntroduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationGetting Started with SAS Enterprise Miner 7.1
Getting Started with SAS Enterprise Miner 7.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Getting Started with SAS Enterprise Miner 7.1.
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationResearch to Develop. Community Needs Index. US Department of Housing and Urban Development Office of Policy Development and Research
A Research to Develop Community Needs Index US Department of Housing and Urban Development Office of Policy Development and Research Visit PD&R s Web Site www.huduser.org to find this report and others
More informationHelp File Version 1.0.14
Version 1.0.14 By engineering@optimumg.com www.optimumg.com Welcome Thank you for purchasing OptimumT the new benchmark in tire model fitting and analysis. This help file contains information about all
More informationA 61MillionPerson Experiment in Social Influence and Political Mobilization
Supplementary Information for A 61MillionPerson Experiment in Social Influence and Political Mobilization Robert M. Bond 1, Christopher J. Fariss 1, Jason J. Jones 2, Adam D. I. Kramer 3, Cameron Marlow
More informationNSSE MultiYear Data Analysis Guide
NSSE MultiYear Data Analysis Guide About This Guide Questions from NSSE users about the best approach to using results from multiple administrations are increasingly common. More than three quarters of
More informationTable Lookups: From IFTHEN to KeyIndexing
Paper 15826 Table Lookups: From IFTHEN to KeyIndexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine
More informationMissingdata imputation
CHAPTER 25 Missingdata imputation Missing data arise in almost all serious statistical analyses. In this chapter we discuss a variety of methods to handle missing data, including some relatively simple
More informationHRS/AHEAD Documentation Report
HRS/AHEAD Documentation Report IMPUTE: A SAS Application System for Missing Value Imputations  With Special Reference to HRS Income/Assets Imputations Honggao Cao Survey Research Center Institute for
More informationOccupationSpecific Human Capital and Local Labor Markets. Jeffrey A. Groen, U.S. Bureau of Labor Statistics
BLS WORKING PAPERS U.S. DEPARTMENT OF LABOR Bureau of Labor Statistics OFFICE OF EMPLOYMENT AND UNEMPLOYMENT STATISTICS OccupationSpecific Human Capital and Local Labor Markets Jeffrey A. Groen, U.S.
More informationSuccessfully Implementing Predictive Analytics in Direct Marketing
Successfully Implementing Predictive Analytics in Direct Marketing John Blackwell and Tracy DeCanio, The Nature Conservancy, Arlington, VA ABSTRACT Successfully Implementing Predictive Analytics in Direct
More informationNational Charter School Study 2013
National Charter School Study 2013 National Charter School Study 2013 Edward Cremata, M.A. Technical Appendix Devora Davis Original 16State Update Kathleen Dickey, M.S. Database Manager Kristina Lawyer,
More informationCMA. Financial Reporting, Planning, Performance, and Control
Sixth Edition CMA Preparatory Program Part 1 Financial Reporting, Planning, Performance, and Control Manufacturing Input Variances Sample Brian Hock, CMA, CIA and Lynn Roden, CMA HOCK international, LLC
More informationCRISPDM 1.0. Stepbystep data mining guide
CRISPDM 1.0 Stepbystep data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth
More informationAn exploratory study of the possibilities of analog postprocessing
Intern rapport ; IR 200902 An exploratory study of the possibilities of analog postprocessing Dirk Wolters De Bilt, 2009 KNMI internal report = intern rapport; IR 200902 De Bilt, 2009 PO Box 201 3730
More informationResults from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six freeresponse questions Question #1: Extracurricular activities
More information