Package dsmodellingclient


 Bethany Simpson
 2 years ago
 Views:
Transcription
1 Package dsmodellingclient Maintainer Author Version License GPL3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD client site functions for statistical modelling Depends opal, dsbaseclient R topics documented: ds.gee ds.glm ds.lexis geelogindata geelogin_remoteserver glmlogindata glmlogin_remoteserver survivallogindata Index 11 ds.gee Fits a Generalized Estimating Equation (GEE) model A function that fits generalized estimated equations to deal with correlation structures arising from repeated measures on individuals, or from clustering as in family data. 1
2 2 ds.gee ds.gee(formula = NULL, family = NULL, data = NULL, corstructure = "ar1", clusterid = NULL, startcoeff = NULL, usermatrix = NULL, maxit = 20, checks = TRUE, display = FALSE, datasources = NULL) Arguments formula family data corstructure clusterid startcoeff usermatrix maxit checks display datasources a string character, the formula which describes the model to be fitted. a character, the description of the error distribution: binomial, gaussian, Gamma or poisson. the name of the data frame that hold the variables in the regression formula. a character, the correlation structure: ar1, exchangeable, independence, fixed or unstructure. a character, the name of the column that hold the cluster IDs a numeric vector, the starting values for the beta coefficients. a list of user defined matrix (one for each study). These matrices are required if the correlation structure is set to fixed. an integer, the maximum number of iteration to use for convergence. a boolean, if TRUE (default) checks that takes 13min are carried out to verify that the variables in the model are defined (exist) on the server site and that they have the correct characteristics required to fit a GEE. If FALSE (not recommended if you are not an experienced user) no checks are carried except some very basic ones and eventual error messages might not give clear indications about the cause(s) of the error. a boolean to display or not the intermediate results. Default is FALSE. a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a dataframe, from opal datasources. Details It enables a parallelized analysis of individuallevel data sitting on distinct servers by sending commands to each data computer to fit a GEE model model. The estimates returned are then combined and updated coefficients estimate sent back for a new fit. This iterative process goes on until convergence is achieved. The input data should not contain missing values. The data must be in a data.frame obejct and the variables must be refer to through the data.frame. Value a list which contains the final coefficient estimates (beta values), the pooled alpha value and the pooled phi value. Author(s) Gaye, A.; Jones EM.
3 ds.glm 3 References Jones EM, Sheehan NA, Gaye A, Laflamme P, Burton P. Combined analysis of correlated data when data cannot be pooled. Stat 2013; 2: See Also ds.glm for genralized linear models ds.lexis for survival analysis using piecewise exponential regression { } # load the login data file for the correlated data data(geelogindata) # login and assign all the stored variables to R opals < datashield.login(logins=geelogindata,assign=true) # set some parameters for the function 9the rest are set to default values) myformula < response~1+sex+age.60 myfamily < binomial startbetas < c(1,1,0) clusters < id mycorr < ar1 # run a GEE analysis with the above specifed parameters ds.gee(data= D,formula=myformula,family=myfamily,corStructure=mycorr,clusterID=clusters,startCoeff=startbeta # clear the Datashield R sessions and logout datashield.logout(opals) ds.glm Runs a combined GLM analysis of nonpooled data A function fit generalized linear models ds.glm(formula = NULL, data = NULL, family = NULL, offset = NULL, weights = NULL, checks = FALSE, maxit = 15, CI = 0.95, viewiter = FALSE, datasources = NULL)
4 4 ds.glm Arguments formula data family offset weights checks maxit CI viewiter datasources startbetas a character, a formula which describes the model to be fitted a character, the name of an optional data frame containing the variables in in the formula. The process stops if a non existing data frame is indicated. a description of the error distribution function to use in the model a character, null or a numeric vector that can be used to specify an a priori known component to be included in the linear predictor during fitting. a character, the name of an optional vector of prior weights to be used in the fitting process. Should be NULL or a numeric vector. a boolean, if TRUE (default) checks that takes 13min are carried out to verify that the variables in the model are defined (exist) on the server site and that they have the correct characteristics required to fit a GLM. The default value is FALSE because checks lengthen the runtime and are mainly meant to be # used as help to look for causes of eventual errors. the number of iterations of IWLS used instructions to each computer requesting nondisclosing summary statistics. The summaries are then combined to estimate the parameters of the model; these parameters are the same as those obtained if the data were physically pooled. a numeric, the confidence interval. a boolean, tells whether the results of the intermediate iterations should be printed on screen or not. Default is FALSE (i.e. only final results are shown). a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a dataframe, from opal datasources. starting values for the parameters in the linear predictor Details It enables a parallelized analysis of individuallevel data sitting on distinct servers by sending Value coefficients a named vector of coefficients residuals the working residuals, that is the residuals in the final iteration of the IWLS fit. fitted.values the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. rank the numeric rank of the fitted linear model. family the family object used. linear.predictors the linear fit on link scale. Author(s) Burton,P;Gaye,A;Laflamme,P
5 ds.lexis 5 See Also ds.lexis for survival analysis using piecewise exponential regression ds.gee for generalized estimating equation models { # load the file that contains the login details data(glmlogindata) # login and assign all the variables to R opals < datashield.login(logins=glmlogindata, assign=true) # Example 1: run a GLM without interaction (e.g. diabetes prediction using BMI and HDL levels and GENDER) mod < ds.glm(formula= D$DIS_DIAB~D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 2: run the above GLM model without an intercept # (produces separate baseline estimates for Male and Female) mod < ds.glm(formula= D$DIS_DIAB~0+D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 3: run the above GLM with interaction between GENDER and PM_BMI_CONTINUOUS mod < ds.glm(formula= D$DIS_DIAB~D$GENDER*D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 4: Fit a standard Gaussian linear model with an interaction mod < ds.glm(formula= D$PM_BMI_CONTINUOUS~D$DIS_DIAB*D$GENDER+D$LAB_HDL, family= gaussian ) mod # Example 5: now run a GLM where the error follows a poisson distribution # P.S: A poisson model requires a numeric vector as outcome so in this example we first convert # the categorical BMI, which is of type factor, into a numeric vector ds.asnumeric( D$PM_BMI_CATEGORICAL, BMI.123 ) mod < ds.glm(formula= BMI.123~D$PM_BMI_CONTINUOUS+D$LAB_HDL+D$GENDER, family= poisson ) mod # clear the Datashield R sessions and logout datashield.logout(opals) } ds.lexis Generates an expanded version of a dataset that contains survival data This function is meant to be used as part of a piecewise regression analysis. ds.lexis(data = NULL, intervalwidth = NULL, idcol = NULL, entrycol = NULL, exitcol = NULL, statuscol = NULL, variables = NULL, newobj = NULL, datasources = NULL)
6 6 ds.lexis Arguments data Details Value a character, the name of the table that holds the original data, this is the data to be expanded. intervalwidth, a numeric vector which gives the chosen width of the intervals ( pieces ). This can be one value (in which case all the intervals have same width) or several different values. If no value(s) are provided a single default value is used. That default value is the set to be the 1/10th of the mean of the exit time values across all the studies. idcol entrycol exitcol statuscol variables newobj datasources a character the name of the column that holds the individual IDs of the subjects. a character, the name of the column that holds the entry times (i.e. start of follow up). If no name is provided the default is to set all the entry times to 0 in a column named "STARTTIME". A message is then printed to alert the user as this has serious consequences if the actual entry times are not 0 for all the subjects. a character, the name of the column that holds the exit times (i.e. end of follow up). a character, the name of the column that holds the failure status of each subject, tells whether or not a subject has been censored. a character vector, the column names of the variables (covariates) to include in the final expanded table. The input table might have a large number of covariates and if only some of those variables are relevant for the sought analysis it make sense to only include those. By default (i.e. if no variables are indicated) all the covariates in the inout table are included and this will lengthen the run time of the function. the name of the output expanded table. By default the name is the name of the input table with the suffixe "_expanded". a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a data frame, from opal datasources It splits the survial interval time of subjects into subintervals and reports the failure status of the subjects at each subinterval. Each of those subinterval is given an id e.g. if the overall interval of a subject is split into 4 subinterval, those subintervals have ids 1, 2, 3 and 4; so this is basically the count of periods for each subject. The interval ids are held in a column named "TIMEID". The entry and exit times in the input table are used to compute the total survival time. By default all the covariates in the input table are included in the expanded output table but it is preferable to indicate the names of the covariates to be included via the argument variables. a dataframe, an expanded version of the input table. Author(s) Gaye, A.
7 ds.lexis 7 See Also ds.glm for genralized linear models ds.gee for generalized estimating equation models { # load the file that contains the login details data(survivallogindata) # login and assign all the variables to R opals < datashield.login(logins=survivallogindata,assign=true) # this example shows how to run survival analysis in HDataSHIELD using the piecewise exponential regression m # let us display the names of the variables in the original table (the table we assigned above and which by defau ds.colnames( D ) # specify some baseline hazard profile (i.e. the width of the intervals to be used) bh < c(2,1,3,0.5,1.5,2) # expand the original table (e.g the survial time of each individual is split into pieces equal to the interval # we use the function ds.lexis which expands the original table and saves the expanded table on the server site # we set the parameter variables to NULL (default) which means include all the covariates in the expanded table # to indicate the variables to include if you have many variables and wants to use only a subset of those. ds.lexis(data= D, intervalwidth=bh, idcol="id", entrycol="starttime", exitcol="endtime", statuscol="cens") # let us display the names of variables in the expanded table (by default it is the name of the priginal table fo ds.colnames( D_expanded ) # Now fit a GLM with a poisson model # there is a direct relationship between the poisson model with a logtime offset and the exponential model so we # use glm to fit a poisson model and include a factor for the time intervals ( TIMEID ) to have different rates. # The vector SURVIVALTIME (the time elapsed between start of follow up failure/censoring) and the vector TIME # which allows for different rates are generated when the initial table got expanded via the function ds.lxus. # In the below model the log of the survival time is used as an offset (some known information to be included in t # generate a vector of log survival time values ds.assign(toassign= log(d_expanded$survivaltime), newobj= logsurvival ) # Fit the GLM  the outcome is failure status ds.glm(formula= CENS~1+TIMEID+AGE.60+GENDER+NOISE.56+PM10.16, data= D_expanded, family= poisson, offset= lo # clear the Datashield R sessions and logout datashield.logout(opals) }
8 8 geelogin_remoteserver geelogindata Information required to login to opal servers for the GEE test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(geelogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(geelogindata) geelogin_remoteserver Information required to login to opal servers for the GEE test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(geelogin_remoteserver) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse
9 glmlogindata 9 data(geelogin_remoteserver) glmlogindata Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(glmlogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(glmlogindata) glmlogin_remoteserver Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(glmlogin_remoteserver)
10 10 survivallogindata Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(glmlogin_remoteserver) survivallogindata Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(survivallogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(survivallogindata)
11 Index ds.gee, 1, 5 ds.glm, 3 ds.lexis, 5, 5 geelogin_remoteserver, 8 geelogindata, 8 glmlogin_remoteserver, 9 glmlogindata, 9 survivallogindata, 10 11
Package dsstatsclient
Maintainer Author Version 4.1.0 License GPL3 Package dsstatsclient Title DataSHIELD client site stattistical functions August 20, 2015 DataSHIELD client site
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationPackage uptimerobot. October 22, 2015
Type Package Version 1.0.0 Title Access the UptimeRobot Ping API Package uptimerobot October 22, 2015 Provide a set of wrappers to call all the endpoints of UptimeRobot API which includes various kind
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationPackage retrosheet. April 13, 2015
Type Package Package retrosheet April 13, 2015 Title Import Professional Baseball Data from 'Retrosheet' Version 1.0.2 Date 20150317 Maintainer Richard Scriven A collection of tools
More informationPackage MDM. February 19, 2015
Type Package Title Multinomial Diversity Model Version 1.3 Date 20130628 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package
More informationPackage lss. February 20, 2015
Type Package Package lss February 20, 2015 Title the accelerated failure time model to right censored data based on leastsquares principle Version 0.52 Date 20061201 Author Lin Huang ,
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationlm {stats} R Documentation
lm {stats} R Documentation Fitting Linear Models Description lm is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationPackage metafuse. November 7, 2015
Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.01 Date 20151106 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More information7 Generalized Estimating Equations
Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2way tables Adds capability studying several predictors, but Limited to
More informationPackage missforest. February 20, 2015
Type Package Package missforest February 20, 2015 Title Nonparametric Missing Value Imputation using Random Forest Version 1.4 Date 20131231 Author Daniel J. Stekhoven Maintainer
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationSP10 From GLM to GLIMMIXWhich Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIXWhich Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationStephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: SIMPLIS Files Table of contents SIMPLIS SYNTAX FILES... 1 The structure of the SIMPLIS syntax file... 1 $CLUSTER command... 4
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationMultiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationCLC Server Command Line Tools USER MANUAL
CLC Server Command Line Tools USER MANUAL Manual for CLC Server Command Line Tools 2.5 Windows, Mac OS X and Linux September 4, 2015 This software is for research purposes only. QIAGEN Aarhus A/S Silkeborgvej
More informationintertrax Suite intertrax exchange intertrax monitor intertrax connect intertrax PIV manager User Guide Version 3 2011
intertrax Suite intertrax exchange intertrax monitor intertrax connect intertrax PIV manager User Guide Version 3 2011 Copyright 20032011 by Salamander Technologies, Inc. Protected by US Patents 5,573,278;
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationSurvey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln. LogRank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln LogRank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationSAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling WithinPerson Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationQuick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model
Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a creditscoring application in eight easy steps. Quick Start Century Corp., an electronics retailer,
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationAdvanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090
Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001(781)7516356 fax 001(781)3293379 trhodes@mib.com Abstract
More informationSupplementary PROCESS Documentation
Supplementary PROCESS Documentation This document is an addendum to Appendix A of Introduction to Mediation, Moderation, and Conditional Process Analysis that describes options and output added to PROCESS
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationextreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
More informationOfficial SAS Curriculum Courses
Certificate course in Predictive Business Analytics Official SAS Curriculum Courses SAS Programming Base SAS An overview of SAS foundation Working with SAS program syntax Examining SAS data sets Accessing
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationPEER REVIEW HISTORY ARTICLE DETAILS VERSION 1  REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12Aug2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
More informationSimple Linear Regression in SPSS STAT 314
Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,
More informationAras Corporation. 2005 Aras Corporation. All rights reserved. Notice of Rights. Notice of Liability
Aras Corporation 2005 Aras Corporation. All rights reserved Notice of Rights All rights reserved. Aras Corporation (Aras) owns this document. No part of this document may be reproduced or transmitted in
More informationPackage ATE. R topics documented: February 19, 2015. Type Package Title Inference for Average Treatment Effects using Covariate. balancing.
Package ATE February 19, 2015 Type Package Title Inference for Average Treatment Effects using Covariate Balancing Version 0.2.0 Date 20150216 Author Asad Haris and Gary Chan
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and treebased classification techniques.
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationIntroduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations
Introduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Goa, India,
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationColor Screen Phones: SIPT48G and SIPT46G with firmware version 73
This document provides detailed information on how to use ACD (automatic call distribution) feature on Yealink IP phones integrated with Star2Star platform. ACD enables organizations to manage a large
More informationModel Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 26426 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More information5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits
Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. HosmerLemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional
More informationA Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 1 Introduction 2 2 Installing R and RStudio 2 3 The RStudio Environment 2 4 Additions
More informationPackage bigrf. February 19, 2015
Version 0.111 Date 20140516 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type
More informationModel Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
More informationPolynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationBRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS
BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended
More informationPackage sjdbc. R topics documented: February 20, 2015
Package sjdbc February 20, 2015 Version 1.5.071 Title JDBC Driver Interface Author TIBCO Software Inc. Maintainer Stephen Kaluzny Provides a databaseindependent JDBC interface. License
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout
Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gearanalytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationCHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANIKALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationUsing R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling  A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationhp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines
The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu The Statistics menu is accessed from the ORANGE shifted function of the 5 key by pressing Ù. When pressed, a CHOOSE
More informationCoefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationJanuary 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics: Behavioural
More informationPackage neuralnet. February 20, 2015
Type Package Title Training of neural networks Version 1.32 Date 20120919 Package neuralnet February 20, 2015 Author Stefan Fritsch, Frauke Guenther , following earlier work
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & Oneway
More informationSnapLogic Salesforce Snap Reference
SnapLogic Salesforce Snap Reference Document Release: October 2012 SnapLogic, Inc. 71 East Third Avenue San Mateo, California 94401 U.S.A. www.snaplogic.com Copyright Information 2012 SnapLogic, Inc. All
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More information