Package hier.part. February 20, 2015. Index 11. Goodness of Fit Measures for a Regression Hierarchy



Similar documents
Poisson Models for Count Data

Package empiricalfdr.deseq2

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package EstCRM. July 13, 2015

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

SAS Software to Fit the Generalized Linear Model

Package MDM. February 19, 2015

Package acrm. R topics documented: February 19, 2015

Package bigrf. February 19, 2015

Package neuralnet. February 20, 2015

GLM I An Introduction to Generalized Linear Models

Chapter 2 Exploratory Data Analysis

Package CoImp. February 19, 2015

Package MixGHD. June 26, 2015

Package trimtrees. February 20, 2015

Package dunn.test. January 6, 2016

Package retrosheet. April 13, 2015

Package dsmodellingclient

Package cpm. July 28, 2015

Package tagcloud. R topics documented: July 3, 2015

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Package pesticides. February 20, 2015

Package metafuse. November 7, 2015

Generalized Linear Models

Package MBA. February 19, Index 7. Canopy LIDAR data

R2MLwiN Using the multilevel modelling software package MLwiN from R

Package smoothhr. November 9, 2015

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Logistic Regression (1/24/13)

Package COSINE. February 19, 2015

Package ChannelAttribution

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Package HHG. July 14, 2015

Gamma Distribution Fitting

Package benford.analysis

Package uptimerobot. October 22, 2015

Analysis of Variance. MINITAB User s Guide 2 3-1

Package support.ces. R topics documented: April 27, Type Package

Package missforest. February 20, 2015

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

STA 4273H: Statistical Machine Learning

data visualization and regression

Joint models for classification and comparison of mortality in different countries.

Automated Biosurveillance Data from England and Wales,

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Package CRM. R topics documented: February 19, 2015

Ordinal Regression. Chapter

Some Essential Statistics The Lure of Statistics

Multivariate Analysis of Ecological Data

Package TRADER. February 10, 2016

Package survpresmooth

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistical Models in R

Package hoarder. June 30, 2015

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Package fastghquad. R topics documented: February 19, 2015

Package SHELF. February 5, 2016

The Data Mining Process

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

11. Analysis of Case-control Studies Logistic Regression

Package support.ces. R topics documented: August 27, Type Package

Package MFDA. R topics documented: July 2, Version Date Title Model Based Functional Data Analysis

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Yew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Package depend.truncation

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Package erp.easy. September 26, 2015

Examining a Fitted Logistic Model

Package cgdsr. August 27, 2015

Factors affecting online sales

Package GSA. R topics documented: February 19, 2015

Package gambin. June 23, 2016

Multiple Linear Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Package gazepath. April 1, 2015

Software for Distributions in R

How to set the main menu of STATA to default factory settings standards

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

extreme Datamining mit Oracle R Enterprise

Package sjdbc. R topics documented: February 20, 2015

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

SUMAN DUVVURU STAT 567 PROJECT REPORT

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

R: A Free Software Project in Statistical Computing

SUGI 29 Statistics and Data Analysis

Offset Techniques for Predictive Modeling for Insurance

Lecture 3: Linear methods for classification

How To Test Granger Causality Between Time Series

HLM software has been one of the leading statistical packages for hierarchical

Statistical Models in R

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Introduction to General and Generalized Linear Models

Transcription:

Version 1.0-4 Date 2013-01-07 Title Hierarchical Partitioning Author Chris Walsh and Ralph Mac Nally Depends gtools Package hier.part February 20, 2015 Variance partition of a multivariate data set Maintainer Chris Walsh <cwalsh@unimelb.edu.au> License GPL Repository CRAN Date/Publication 2013-01-08 11:25:26 NeedsCompilation yes R topics documented: all.regs............................................ 1 combos........................................... 3 hier.part........................................... 4 hier.part.data........................................ 6 partition........................................... 7 rand.hp........................................... 9 Index 11 all.regs Goodness of Fit Measures for a Regression Hierarchy Calculates goodness of fit measures for regressions of a single dependent variable to all combinations of N independent variables 1

2 all.regs all.regs(y, xcan, family = "gaussian", gof = "RMSPE", print.vars = FALSE) Arguments y xcan family gof print.vars a vector containing the dependent variables a dataframe containing the n independent variables family argument of glm Goodness-of-fit measure. Currently "RMSPE", Root-mean-square prediction error, "loglik", Log-Likelihood or "Rsqu", R-squared if FALSE, the function returns a vector of goodness-of-fit measures. If TRUE, a data frame is returned with first column listing variable combinations and the second column listing goodness-of-fit measures. This function calculates goodness of fit measures for the entire hierarchy of models using all combinations of N dependent variables, and returns them as an ordered list ready for input into the function partition. This function requires the gtools package in the gregmisc bundle Value gfs If print.vars is FALSE, a vector of goodness of fit measures for all combinations of independent varaiables in the hierarchy or, if print.vars is TRUE, a data frame listing all combinations of independent variables in the first column in ascending order, and the corresponding goodness of fit measure for the model using those variables Chris Walsh <cwalsh@unimelb.monash.edu.au>. References Hatt, B. E., Fletcher, T. D., Walsh, C. J. and Taylor, S. L. 2004 The influence of urban density and drainage infrastructure on the concentrations and loads of pollutants in small streams. Environmental Management 34, 112 124. Walsh, C. J., Papas, P. J., Crowther, D., Sim, P. T., and Yoo, J. 2004 Stormwater drainage pipes as a threat to a stream-dwelling amphipod of conservation significance, Austrogammarus australis, in southeastern Australia. Biodiversity and Conservation 13, 781 793. See Also hier.part, partition, rand.hp

combos 3 Examples #linear regression of log(electrical conductivity) in streams #against seven independent variables describing catchment #characteristics (from Hatt et al. 2004). data(urbanwq) env <- urbanwq[,2:8] all.regs(urbanwq$lec, env, fam = "gaussian", gof = "Rsqu", print.vars = TRUE) #logistic regression of an amphipod species occurrence in #streams against four independent variables describing #catchment characteristics (from Walsh et al. 2004). data(amphipod) env1 <- amphipod[,2:5] all.regs(amphipod$australis, env1, fam = "binomial", gof = "loglik", print.vars = TRUE) combos All Combinations of a Hierarchy of Models of n Variables Lists a matrix of combinations of 1 to n variables in ascending order combos(n) Arguments n an integer: the number of variables Value Lists hierarchy of all possible combinations of n variables in ascending order, starting with 1 variable, then all combinations of 2 variables, and so on until the one combination with all n variables. This function is used by all.regs to structure the models required for hierarchical partioning. This function requires the gtools package in the gregmisc bundle. a list containing ragged binary a matrix with zeroes in empty elements and 1 in column 1, 2 in column 2... n in column n for full elements a matrix as for ragged, but with 1 in all full elements

4 hier.part Chris Walsh <cwalsh@unimelb.edu.au> See Also all.regs hier.part Goodness of Fit Calculation and Hierarchical Partitioning Partitions variance in a multivariate dataset hier.part(y, xcan, family = "gaussian", gof = "RMSPE", barplot = TRUE) Arguments y xcan family gof barplot a vector containing the dependent variables a dataframe containing the n independent variables family argument of glm Goodness-of-fit measure. Currently "RMSPE", Root-mean-square prediction error, "loglik", Log-Likelihood or "Rsqu", R-squared If TRUE, a barplot of I and J for each variable is plotted expressed as percentage of total explained variance. This function calculates goodness of fit measures for the entire hierarchy of models using all combinations of N independent variables using the function all.regs. It takes the list of goodness of fit measures and, using the partition function, applies the hierarchical partitioning algorithm of Chevan and Sutherland (1991) to return a simple table listing each variable, its independent contribution (I) and its conjoint contribution with all other variables (J). Note earlier versions of hier.part (<1.0) produced a matrix and barplot of percentage distribution of effects as a percentage of the sum of all Is and Js, as portrayed in Hatt et al. (2004) and Walsh et al. (2004). This version plots the percentage distribution of independent effects only. The sum of Is equals the goodness of fit measure of the full model minus the goodness of fit measure of the null model. The distribution of joint effects shows the relative contribution of each variable to shared variability in the full model. Negative joint effects are possible for variables that act as suppressors of other variables (Chevan and Sutherland 1991). At this stage, the partition routine will not run for more than 12 independent variables. This function requires the gtools package in the gregmisc bundle.

hier.part 5 Value a list containing gfs IJ I.perc a data frame or vector listing all combinations of independent variables in the first column in ascending order, and the corresponding goodness of fit measure for the model using those variables a data frame of I, the independent and J the joint contribution for each independent variable a data frame of I as a percentage of total explained variance Note The function produces a minor rounding error for analyses with more than than 9 independent variables. To check if this error affects the inference from an analysis, run the analysis several times with the variables entered in a different order. There are no known problems for analyses with 9 or fewer variables. Chris Walsh <cwalsh@unimelb.edu.au> using c and fortran code written by Ralph Mac Nally <Ralph.MacNally@sci.monash.edu.au>. References Chevan, A. and Sutherland, M. 1991 Hierarchical Partitioning. The American Statistician 45, 90 96. Hatt, B. E., Fletcher, T. D., Walsh, C. J. and Taylor, S. L. 2004 The influence of urban density and drainage infrastructure on the concentrations and loads of pollutants in small streams. Environmental Management 34, 112 124. Mac Nally, R. 2000 Regression and model building in conservation biology, biogeography and ecology: the distinction between and reconciliation of predictive and explanatory models. Biodiversity and Conservation 9, 655 671. Walsh, C. J., Papas, P. J., Crowther, D., Sim, P. T., and Yoo, J. 2004 Stormwater drainage pipes as a threat to a stream-dwelling amphipod of conservation significance, Austrogammarus australis, in south-eastern Australia. Biodiversity and Conservation 13, 781 793. See Also all.regs, partition, rand.hp Examples #linear regression of log(electrical conductivity) in #streams against seven independent variables #describing catchment characteristics (from #Hatt et al. 2004) data(urbanwq)

6 hier.part.data env <- urbanwq[,2:8] hier.part(urbanwq$lec, env, fam = "gaussian", gof = "Rsqu") #logistic regression of an amphipod species occurrence in #streams against four independent variables describing #catchment characteristics (from Walsh et al. 2004). data(amphipod) env1 <- amphipod[,2:5] hier.part(amphipod$australis, env1, fam = "binomial", gof = "loglik") hier.part.data Example Data for hier.part Example data sets for hier.part package. data(urbanwq) data(amphipod) data(chevan) urbanwq.txt Seven catchment variables (fimp, catchment imperviousness 0.25 ; sconn, drainage connection 0.5 ; sdensep, septic tank density 0.5 ; unsealden, unsealed road density; fcarea, catchment area 0.25 ; selev, elevation 0.5 ; amgeast, longitude) and median baseflow concentrations of three water quality variables in sixteen streams draining 16 independent subcatchments (ldoc, log(dissolved organic carbon); lec, log(electrical conductivity); lnox, log(nitrate/nitrite). Data from Hatt et al. (2004). amphipod.txt Presence-absence data for a stream-dwelling amphipod (australis) in 58 sites, with four catchment variables (fimp, imperviousness 0.25 ; fconn, drainage connection 0.25 ; densep, density of septic tanks; unseal, unsealed road density). Data from Walsh et al. (2004). chevan.txt Chi-squared (chisq) and R-squared (rsqu) for logistic and linear regression example from Chevan and Sutherland (1991). Chris Walsh <cwalsh@unimelb.edu.au>

partition 7 References Chevan, A. and Sutherland, M. 1991 Hierarchical Partitioning. The American Statistician 45, 90 96. Hatt, B. E., Fletcher, T. D., Walsh, C. J. and Taylor, S. L. 2004 The influence of urban density and drainage infrastructure on the concentrations and loads of pollutants in small streams. Environmental Management 34, 112 124. Walsh, C. J., Papas, P. J., Crowther, D., Sim, P. T., and Yoo, J. 2004 Stormwater drainage pipes as a threat to a stream-dwelling amphipod of conservation significance, Austrogammarus australis, in southeastern Australia. Biodiversity and Conservation 13, 781 793. partition Hierarchical Partitioning from a List of Goodness of Fit Measures Partitions variance in a multivariate dataset from a list of goodness of fit measures partition(gfs, pcan, var.names = NULL) Arguments gfs an array as outputted by the function all.regs or a vector of goodness of fit measures from a hierarchy of regressions based on pcan variables in ascending order (as produced by function combos, but also including the null model as the first element) pcan the number of variables from which the hierarchy was constructed (maximum = 12) var.names an array of pcan variable names, if required This function applies the hierarchical partitioning algorithm of Chevan and Sutherland (1991) to return a simple table listing of each variable, its independent contribution (I) and its conjoint contribution with all other variables (J). The output is identical to the function hier.part, which takes the dependent and independent variable data as its input. Note earlier versions of partition (hier.part<1.0) produced a matrix and barplot of percentage distribution of effects as a percentage of the sum of all Is and Js, as portrayed in Hatt et al. (2004) and Walsh et al. (2004). This version plots the percentage distribution of independent effects only. The sum of Is equals the goodness of fit measure of the full model minus the goodness of fit measure of the null model. The distribution of joint effects shows the relative contribution of each variable to shared variability in the full model. Negative joint effects are possible for variables that act as suppressors of other variables (Chevan and Sutherland 1991).

8 partition Value At this stage, the partition routine will not run for more than 12 independent variables. This function requires the gtools package in the gregmisc bundle. a list containing gfs IJ I.perc J.perc a data frame listing all combinations of independent variables in the first column in ascending order, and the corresponding goodness of fit measure for the model using those variables a data frame of I, the independent and J the joint contribution for each independent variable a data frame of I as a percentage of total explained variance a data frame of J as a percentage of sum of all Js Note The function produces a minor rounding error for hierarchies constructed from more than 9 variables. To check if this error affects the inference from an analysis, run the analysis several times with the variables entered in a different order. There are no known problems for hierarchies with 9 or fewer variables. Chris Walsh <cwalsh@unimelb.edu.au> using c and fortran code written by Ralph Mac Nally <Ralph.MacNally@sci.monash.edu.au>. References Chevan, A. and Sutherland, M. 1991 Hierarchical Partitioning. The American Statistician 45, 90 96. Hatt, B. E., Fletcher, T. D., Walsh, C. J. and Taylor, S. L. 2004 The influence of urban density and drainage infrastructure on the concentrations and loads of pollutants in small streams. Environmental Management 34, 112 124. See Also all.regs, partition, rand.hp Examples #linear regression of log(electrical conductivity) in streams #against seven independent variables describing catchment #characteristics (from Hatt et al. 2004). data(urbanwq) env <- urbanwq[,2:8] gofs <- all.regs(urbanwq$lec, env, fam = "gaussian", gof = "Rsqu", print.vars = TRUE) partition(gofs, pcan = 7, var.names = names(urbanwq[,2,8]))

rand.hp 9 #hierarchical partitioning of logistic and linear regression #goodness of fit measures from Chevan and Sutherland (1991). data(chevan) partition(chevan$chisq, pcan = 4) partition(chevan$rsqu, pcan = 4) rand.hp Randomization Test for Hierarchical Partitioning Randomizes elements in each column in xcan and recalculates hier.part num.reps times rand.hp(y, xcan, family = "gaussian", gof = "RMSPE", num.reps = 100) Arguments y xcan family gof num.reps a vector containing the dependent variables a dataframe containing the n independent variables family argument of glm Goodness-of-fit measure. Currently "RMSPE", Root-mean-square Prediction error, "NLL", Negative log Likelihood or "Rsqu", R-squared Number of repeated randomizations This function is a randomization routine for the hier.part function which returns a matrix of I values (the independent contribution towards explained variance in a multivariate dataset) for all values from num.reps randomizations. For each randomization, the values in each variable (i.e each column of xcan) are randomized independently, and hier.part is run on the randomized xcan. As well as the randomized I matrix, the function returns a summary table listing the observed I values, the 95th and 99th percentile values of I for the randomized dataset. Value a list containing Irands matrix of num.reps + 1 rows of I values for each dependent variable. The first row contains the observed I values and the remaining num.reps rows contains the I values returned for each randomization.

10 rand.hp Iprobs data.frame of observed I values for each variable, Z-scores for the generated distribution of randomized Is and an indication of statistical significance. Z- scores are cacluated as (observed - mean(randomizations))/sd(randomizations), and statistical significance (*) is based on upper 0.95 confidence limit (Z >= 1.65). Chris Walsh <cwalsh@unimelb.edu.au>. References Hatt, B. E., Fletcher, T. D., Walsh, C. J. and Taylor, S. L. 2004 The influence of urban density and drainage infrastructure on the concentrations and loads of pollutants in small streams. Environmental Management 34, 112 124. Mac Nally, R. 2000 Regression and model building in conservation biology, biogeography and ecology: the distinction between and reconciliation of predictive and explanatory models. Biodiversity and Conservation 9, 655 671. Mac Nally, R. 2002 Multiple regression and inference in conservation biology and ecology: further comments on identifying important predictor variables. Biodiversity and Conservation 11, 1397 1401. Walsh, C. J., Papas, P. J., Crowther, D., Sim, P. T., and Yoo, J. 2004 Stormwater drainage pipes as a threat to a stream-dwelling amphipod of conservation significance, Austrogammarus australis, in southeastern Australia. Biodiversity and Conservation 13, 781 793. See Also hier.part, partition Examples #linear regression of log(electrical conductivity) in streams #against four independent variables describing catchment #characteristics (from Hatt et al. 2004). data(urbanwq) env <- urbanwq[,2:5] rand.hp(urbanwq$lec, env, fam = "gaussian", gof = "Rsqu")$Iprobs #logistic regression of an amphipod species occurrence in #streams against four independent variables describing #catchment characteristics (from Walsh et al. 2004). data(amphipod) env1 <- amphipod[,2:5] rand.hp(amphipod$australis, env1, fam = "binomial", gof = "loglik")$iprobs

Index Topic datasets hier.part.data, 6 Topic regression all.regs, 1 hier.part, 4 all.regs, 1, 4, 5, 8 amphipod (hier.part.data), 6 chevan (hier.part.data), 6 combos, 3 hier.part, 2, 4, 10 hier.part.data, 6 partition, 2, 5, 7, 8, 10 rand.hp, 2, 5, 8, 9 urbanwq (hier.part.data), 6 11