SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing

Size: px
Start display at page:

Download "SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg"

Transcription

1 SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing

2 IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression- GLM Univariate procedure Logistic Models Binary Logistic Model Multinomial Logistic Model Ordinal Logistic Model 2

3 OUTLINE Curve Estimation Non-linear Regression Missing Value Analysis & Multiple Imputation Survival Analysis Mixed Linear Model Time Series Analysis & Forecasting (self-study) 3

4 CURVE ESTIMATION Using Curve Estimation to Model the Law of Diminishing Returns when the relationship between the dependent variable(s) and the independent variable is not necessarily linear. Analyze Regression Curve Estimation Example: advert.sav A retailer wants to examine the relationship between money spent on advertising and the resulting sales. To this end, they have collected past sales figures and the associated advertising costs. 4

5 CURVE ESTIMATION 5

6 CURVE ESTIMATION 6

7 CURVE ESTIMATION Use chart builder to plot diagnostic graphs. 7

8 NON-LINEAR REGRESSION Using Nonlinear Regression to Model the Law of Diminishing Returns when the relationship between the dependent and independent variables is not intrinsically linear. Example: advert.sav Analyze Regression Nonlinear 8

9 NON-LINEAR REGRESSION The asymptotic regression model: y b 1 b 2 exp( b 3 x) When b1>0, b2<0, and b3<0, it gives Mistcherlich's model of the "law of diminishing returns". This model initially increases quickly with increasing values of x, but then the gains slow and finally taper off just below the value b1. Starting values: b1 represents the upper asymptote for sales. Looking at the chart, even the largest sales values fall just short of 13, so that's a reasonable starting value. b2 is the difference between the value of y when x=0 and the upper asymptote. A reasonable starting value is the minimum value of y minus b1. Looking at the chart, say that's about 7-13=-6. b3 can be roughly initially estimated by the negative of the slope between two "well separated" points on the plot. Looking at the chart there are a few points about x=2, y=8, and about x=5, y=12. The slope between these points is (12-8)/(5-2)=1.33, thus a rough initial estimate for b3 is

10 NON-LINEAR REGRESSION 10

11 NON-LINEAR REGRESSION 11

12 MISSING VALUE ANALYSIS (MVA) Data Imputation for Missing Value Usisng MVA Procedure Types of Missingness: Missing completely at random (MCAR): exists when missing values are randomly distributed across all cases. SPSS MVA procedure supports Little s MCAR test for MCAR. Missing at random (MAR): exists when missing values are not randomly distributed across all cases but are randomly distributed within one or more subsamples. SPSS MVA generates a table of Separate Variance t Tests. Significant result in any cell means that missing cases in the row variable are significantly correlated with the column variable and thus are not MAR. Non-ignorable missingness: exists when missing values are not randomly distributed across cases, but the probability of missingness cannot be predicted from the variables in the model. It is the most problematic form. 12

13 MISSING VALUE ANALYSIS Estimation methods: Pattern analysis: describes the pattern of missing data. Listwise / pairwise deletion: both deletion methods assume MCAR. Mean substitution: was once popular but is no longer prefered. Multiple regression: uses non-missing data to predict the values of missing data. Maximum likelihood estimation (MLE): makes few demand of the data in terms of statistical assumptions and is generally considered superior to imputation by multiple regression. This is now the most common method of imputation. Approximate Bayesian bootstrap (ABB): uses logistic regression to estimate the probability of response/non-response based on covariates. SPSS does not yet support ABB. Multiple imputation: generates multiple simulated values for each incomplete datum, then iteratively analyzing datasets with each simulated value substituted in turn. 13

14 MISSING VALUE ANALYSIS When to use MVA: As a rule of thumb, if a variable has more than 5% missing values, cases are not deleted, and many researchers are much more stringent than this. It is not recommended for multivariate analysis, as imputation can distort coefficients of association and correlation relating variables in multivariate analysis. If researchers are still not sure whether to apply MVA, it is recommended running all analysis on both the original and imputed datasets, and discussing where imputation would make a difference for the substantive (not mealy statistical) interpretations. 14

15 MISSING VALUE ANALYSIS Using SPSS Multiple Imputation of Missing Values: Step 1: Describe the pattern of missing data Analyze Missing Value Analysis Example: telco_missing.sav A telecommunications provider wants to better understand service usage patterns in its customer database. The company wants to ensure that the data are missing completely at random before running further analyses. 15

16 MISSING VALUE ANALYSIS 16

17 MISSING VALUE ANALYSIS 17

18 MISSING VALUE ANALYSIS 18

19 MULTIPLE IMPUTATION Step 2: Multiple imputation Analyze Multiple Imputation Analyze Patterns 19

20 MULTIPLE IMPUTATION 20

21 MULTIPLE IMPUTATION Analyze Multiple Imputation Impute Missing Data Values 21

22 MULTIPLE IMPUTATION 22

23 MULTIPLE IMPUTATION Step 3: Run analysis using complete data e.g: Multinomial logistic model with dependent variable custcat: customer categories. 23

24 SURVIVAL ANALYSIS Survival Data Data: survival data is time-to-event data. It s quantitative data corresponding to time from a well-defined time origin till the occurrence of some particular event of interest or endpoint. Reason of using survival model: The distribution of survival data tends to be positively skewed and not likely to be normal distribution and it may not be possible to find a transformation. Time-varying covariates could not be handled. In addition, some duration is censored. Censored observations: could be the event has not occurred at endpoint; lost to follow-up; withdraws from study; other interventions offered; event occurred but for unrelated cause; etc. 24

25 SURVIVAL ANALYSIS Survival Model Survival function: S( t) P( T t) 1 F( t) Hazard function: f ( t) d log( S( t)) h( t) h( t) S( t) dt S( t) exp( H( t)) H(t) is cumulative hazard function. 25

26 SURVIVAL ANALYSIS Kaplan-Meier Estimator: Sˆ( t) t j t ( j ) t t d (1 n j j ) ( 1) (2)... ( n) t The number of individuals who experience the event at time t The number of individuals who have not yet experienced the event at time t Cox Regression: h ( t) h i 0 log( H T ( t)exp( x i ( t)) log i H ) 0 S T ( t) x i ( t) i S 0 ( t) exp( T x i ) h 0 ( t) is the baseline hazard function. T exp( ( x x i j )) is the hazard ratio (HR) or incident rate ratio. 26

27 SURVIVAL ANALYSIS Example: telco.sav As part of its efforts to reduce customer churn, a telecommunications company is interested in examining the "time to churn". Variable name age Variable information Age in years marital Marital status 0=unmarried 1=married address income ed employ reside Years in current address Household income in thousands Level of educations 1= didn t complete high school 2= high school degree 3= college degree 4= undergraduate 5= postgraduate Years with current employer Number of people in household gender Gender 0=male 1=female tenure churn custcat Months with service Churn within last month 0 = No 1= Yes Customer categories 1= basic service 2= E-service 3= plus service 4=total service 27

28 SURVIVAL ANALYSIS Life table Analyze Survival Life Tables 28

29 SURVIVAL ANALYSIS 29

30 SURVIVAL ANALYSIS Cox regression Analyze Survival Cox Regression 30

31 SURVIVAL ANALYSIS 31

32 SURVIVAL ANALYSIS 32

33 MIXED LINEAR MODEL The mixed linear model Factors. Categorical predictors should be selected as factors in the model. Each level of a factor can have a different linear effect on the value of the dependent variable. Fixed-effects factors are generally thought of as variables whose values of interest are all represented in the data file. Random-effects factors are variables whose values in the data file can be considered a random sample from a larger population of values. They are useful for explaining excess variability in the dependent variable. Covariates. Scale predictors should be selected as covariates in the model. Within combinations of factor levels (or cells), values of covariates are assumed to be linearly correlated with values of the dependent variables. Random effects covariance structure. SPSS Mixed Linear Model procedure allows you to specify the relationship between the levels of random effects. By default, levels of random effects are uncorrelated and have the same variance (Univariate linear). 33

34 MIXED LINEAR MODEL Repeated effects. It allows you to relax the assumption of independence of the error terms. In order to model the covariance structure of the error terms, you need to specify the following: Repeated effects variables are variables whose values in the data file can be considered as markers of multiple observations of a single subject. Subject variables define the individual subjects of the repeated measurements. The error terms for each individual are independent of those of other individuals. Covariance structure specifies the relationship between the levels of the repeated effects. The types of covariance structures available allow for residual terms with a wide variety of variances and covariances. Hierarchical notation: Mixed model notation: Y ij 0 j 1 j 0 j j X Z Z ij j j r u ij u 0 j 1 j Y ij Z j X 01 ij Z j u 0 j 10 X u ij 1 j X ij r ij 34

35 MIXED LINEAR MODEL Using Mixed Linear Model to Model Random Effects and Repeated Measures Example: testmarket.sav A fast food chain plans to add a new item to its menu. However, they are still undecided between three possible campaigns for promoting the new product. In order to determine which promotion has the greatest effect on sales, the new item is introduced at locations in several randomly selected markets. A different promotion is used at each location, and the weekly sales of the new item are recorded for the first four weeks. Variable name marketid mktsize locid ageloc promo Variable information Market ID Market size 1 = small 2 = median 3 = large Location ID Age of store location Promotion types week Week: week 1, 2, 3, 4 sales Units sold in thousands 35

36 MIXED LINEAR MODEL Data structure for mixed models: 36

37 MIXED LINEAR MODEL Analyze Mixed Models Linear 37

38 MIXED LINEAR MODEL 38

39 39

40 TIME SERIES ANALYSIS Definitions, Applications and Techniques Time series data: each case represents a point in time. Each cell gives a value for each variable for each time period. Stationarity: Data are stationary. A stationary process has the property that the mean, variance and autocorrelation structure do not change over time. Seasonality: By seasonality, we mean periodic fluctuations. The usage of time series models is: to obtain an understanding of underlying forces and structures that produce the observed data. to fit a model and proceed to forecasting and monitoring. Techniques: Exponential Smoothing ARIMA Models 40

41 TIME SERIES ANALYSIS Exponential Smoothing Four available model types: Simple. The simple model assumes that the series has no trend and no seasonal variation. Holt. The Holt model assumes that the series has a linear trend and no seasonal variation. Winters. The Winters model assumes that the series has a linear trend and multiplicative seasonal variation (its magnitude increases or decreases with the overall level of the series). Custom. A custom model allows you to specify the trend and seasonality components. 41

42 ARIMA Model ARIMA(p, d, q) (P, D, Q) Autoregression (AR): p is the order of autoregression Integration (I): d is the order of integration (differencing) Moving-Average (MA): q is the order of moving-average AR(p) model: MA(q) model: ARIMA(p, d, q) model: (P,D,Q) are their seasonal counterparts. t p t p t t t A X X X X q t q t t t t A A A A X t q i i i t d p i i i A L X L L ) (1 ) )(1 ( TIME SERIES ANALYSIS

43 TIME SERIES ANALYSIS Example: catalog_seasfac.sav A catalog company, interested in developing a forecasting model, has collected data on monthly sales of men's clothing along with several series that might be used to explain some of the variation in sales. Possible predictors include the number of catalogs mailed, the number of pages in the catalog, the number of phone lines open for ordering, the amount spent on print advertising, and the number of customer service representatives. Variable name date men mail page phone print service Variable information Date Sales of Men s clothing Number of catalogs mailed Number of pages in catalog Number of phone lines open for ordering Amount spent on print advertising Number of customer service representatives 43

44 TIME SERIES ANALYSIS Step 1: to draw a sequence chart to identify potential seasonality Analyze Forecasting Sequence Charts 44

45 TIME SERIES ANALYSIS Step 2: to build the model with the Expert Modeler Analyze Forecasting Create Models 45

46 TIME SERIES ANALYSIS 46

47 TIME SERIES ANALYSIS 47

48 TIME SERIES ANALYSIS 48

49 TIME SERIES ANALYSIS Step 3: to make prediction by applying saved models One way to make prediction is to save the predicted values and set the forecast period when constructing model: Recall Create Models. 49

50 TIME SERIES ANALYSIS The other way to make prediction is to save the model structure and apply the saved model to data for forecasting. Analyze Forecasting Apply Models 50

51 TIME SERIES ANALYSIS Norusis, M SPSS 13.0 Advanced Statistical Procedures Companion. Upper Saddle-River, N.J.: Prentice Hall, Inc.. Bates, D. M., and D. G. Watts Nonlinear Regression Analysis and its Applications. New York: John Wiley and Sons. Hosmer, D. W., and S. Lemeshow Applied Survival Analysis. New York: John Wiley and Sons. Brown, H., and R. Prescott Applied mixed models in medicine. New York: John Wiley and Sons. Verbeke, G., and G. Molenberghs Linear mixed models for longitudinal data. New York: Springer-Verlag. 51

52 THANKS! CAC statistical WIKI page: Statistical consultation service: 52

Sun Li Centre for Academic Computing lsun@smu.edu.sg

Sun Li Centre for Academic Computing lsun@smu.edu.sg Sun Li Centre for Academic Computing lsun@smu.edu.sg Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic

More information

Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

More information

Analysis of algorithms of time series analysis for forecasting sales

Analysis of algorithms of time series analysis for forecasting sales SAINT-PETERSBURG STATE UNIVERSITY Mathematics & Mechanics Faculty Chair of Analytical Information Systems Garipov Emil Analysis of algorithms of time series analysis for forecasting sales Course Work Scientific

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

IBM SPSS Forecasting 22

IBM SPSS Forecasting 22 IBM SPSS Forecasting 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release 0, modification

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

Time Series Analysis

Time Series Analysis JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),

More information

IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Time Series Analysis: Basic Forecasting.

Time Series Analysis: Basic Forecasting. Time Series Analysis: Basic Forecasting. As published in Benchmarks RSS Matters, April 2015 http://web3.unt.edu/benchmarks/issues/2015/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD jonathan.starkweather@unt.edu

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these

More information

Time Series - ARIMA Models. Instructor: G. William Schwert

Time Series - ARIMA Models. Instructor: G. William Schwert APS 425 Fall 25 Time Series : ARIMA Models Instructor: G. William Schwert 585-275-247 schwert@schwert.ssb.rochester.edu Topics Typical time series plot Pattern recognition in auto and partial autocorrelations

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Advanced Forecasting Techniques and Models: ARIMA

Advanced Forecasting Techniques and Models: ARIMA Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: admin@realoptionsvaluation.com

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Chapter 7 The ARIMA Procedure. Chapter Table of Contents

Chapter 7 The ARIMA Procedure. Chapter Table of Contents Chapter 7 Chapter Table of Contents OVERVIEW...193 GETTING STARTED...194 TheThreeStagesofARIMAModeling...194 IdentificationStage...194 Estimation and Diagnostic Checking Stage...... 200 Forecasting Stage...205

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

More information

SPSS Introduction. Yi Li

SPSS Introduction. Yi Li SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Energy Load Mining Using Univariate Time Series Analysis

Energy Load Mining Using Univariate Time Series Analysis Energy Load Mining Using Univariate Time Series Analysis By: Taghreed Alghamdi & Ali Almadan 03/02/2015 Caruth Hall 0184 Energy Forecasting Energy Saving Energy consumption Introduction: Energy consumption.

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Longitudinal Data Analysis. Wiley Series in Probability and Statistics Brochure More information from http://www.researchandmarkets.com/reports/2172736/ Longitudinal Data Analysis. Wiley Series in Probability and Statistics Description: Longitudinal data analysis for biomedical

More information

16 : Demand Forecasting

16 : Demand Forecasting 16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Abstract Three examples of time series will be illustrated. One is the classical airline passenger demand data with definite seasonal

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Abstract Virtually all businesses collect and use data that are associated with geographic locations, whether

More information

Chapter 25 Specifying Forecasting Models

Chapter 25 Specifying Forecasting Models Chapter 25 Specifying Forecasting Models Chapter Table of Contents SERIES DIAGNOSTICS...1281 MODELS TO FIT WINDOW...1283 AUTOMATIC MODEL SELECTION...1285 SMOOTHING MODEL SPECIFICATION WINDOW...1287 ARIMA

More information

Modern Methods for Missing Data

Modern Methods for Missing Data Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

More information

Missing Data Dr Eleni Matechou

Missing Data Dr Eleni Matechou 1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

More information

Univariate and Multivariate Methods PEARSON. Addison Wesley

Univariate and Multivariate Methods PEARSON. Addison Wesley Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Module 6: Introduction to Time Series Forecasting

Module 6: Introduction to Time Series Forecasting Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and

More information

IBM SPSS Forecasting 21

IBM SPSS Forecasting 21 IBM SPSS Forecasting 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 107. This edition applies to IBM SPSS Statistics 21 and to all

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

A Multiplicative Seasonal Box-Jenkins Model to Nigerian Stock Prices

A Multiplicative Seasonal Box-Jenkins Model to Nigerian Stock Prices A Multiplicative Seasonal Box-Jenkins Model to Nigerian Stock Prices Ette Harrison Etuk Department of Mathematics/Computer Science, Rivers State University of Science and Technology, Nigeria Email: ettetuk@yahoo.com

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents

Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents Prasanna Desikan and Jaideep Srivastava Department of Computer Science University of Minnesota. @cs.umn.edu

More information

IBM SPSS Missing Values 20

IBM SPSS Missing Values 20 IBM SPSS Missing Values 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 20 and to all

More information

SPSS TUTORIAL & EXERCISE BOOK

SPSS TUTORIAL & EXERCISE BOOK UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

COMP6053 lecture: Time series analysis, autocorrelation. jn2@ecs.soton.ac.uk

COMP6053 lecture: Time series analysis, autocorrelation. jn2@ecs.soton.ac.uk COMP6053 lecture: Time series analysis, autocorrelation jn2@ecs.soton.ac.uk Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Advanced Linear Modeling

Advanced Linear Modeling Ronald Christensen Advanced Linear Modeling Multivariate, Time Series, and Spatial Data; Nonparametric Regression and Response Surface Maximization Second Edition Springer Preface to the Second Edition

More information

Time Series Analysis of Aviation Data

Time Series Analysis of Aviation Data Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in

More information

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Time Series Laboratory

Time Series Laboratory Time Series Laboratory Computing in Weber Classrooms 205-206: To log in, make sure that the DOMAIN NAME is set to MATHSTAT. Use the workshop username: primesw The password will be distributed during the

More information

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

More information

Promotional Forecast Demonstration

Promotional Forecast Demonstration Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

Spreadsheet software for linear regression analysis

Spreadsheet software for linear regression analysis Spreadsheet software for linear regression analysis Robert Nau Fuqua School of Business, Duke University Copies of these slides together with individual Excel files that demonstrate each program are available

More information

Threshold Autoregressive Models in Finance: A Comparative Approach

Threshold Autoregressive Models in Finance: A Comparative Approach University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Informatics 2011 Threshold Autoregressive Models in Finance: A Comparative

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set April 2015 SHORT REPORT Baby FACES 2009 This page is left blank for double-sided printing. Imputing Attendance Data in a Longitudinal

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

More information

Chapter 27 Using Predictor Variables. Chapter Table of Contents

Chapter 27 Using Predictor Variables. Chapter Table of Contents Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

PASW Direct Marketing 18

PASW Direct Marketing 18 i PASW Direct Marketing 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

More information

SPSS: Descriptive and Inferential Statistics. For Windows

SPSS: Descriptive and Inferential Statistics. For Windows For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information