Package dsmodellingclient
|
|
- Bethany Simpson
- 8 years ago
- Views:
Transcription
1 Package dsmodellingclient Maintainer Author Version License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD client site functions for statistical modelling Depends opal, dsbaseclient R topics documented: ds.gee ds.glm ds.lexis geelogindata geelogin_remoteserver glmlogindata glmlogin_remoteserver survivallogindata Index 11 ds.gee Fits a Generalized Estimating Equation (GEE) model A function that fits generalized estimated equations to deal with correlation structures arising from repeated measures on individuals, or from clustering as in family data. 1
2 2 ds.gee ds.gee(formula = NULL, family = NULL, data = NULL, corstructure = "ar1", clusterid = NULL, startcoeff = NULL, usermatrix = NULL, maxit = 20, checks = TRUE, display = FALSE, datasources = NULL) Arguments formula family data corstructure clusterid startcoeff usermatrix maxit checks display datasources a string character, the formula which describes the model to be fitted. a character, the description of the error distribution: binomial, gaussian, Gamma or poisson. the name of the data frame that hold the variables in the regression formula. a character, the correlation structure: ar1, exchangeable, independence, fixed or unstructure. a character, the name of the column that hold the cluster IDs a numeric vector, the starting values for the beta coefficients. a list of user defined matrix (one for each study). These matrices are required if the correlation structure is set to fixed. an integer, the maximum number of iteration to use for convergence. a boolean, if TRUE (default) checks that takes 1-3min are carried out to verify that the variables in the model are defined (exist) on the server site and that they have the correct characteristics required to fit a GEE. If FALSE (not recommended if you are not an experienced user) no checks are carried except some very basic ones and eventual error messages might not give clear indications about the cause(s) of the error. a boolean to display or not the intermediate results. Default is FALSE. a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a dataframe, from opal datasources. Details It enables a parallelized analysis of individual-level data sitting on distinct servers by sending commands to each data computer to fit a GEE model model. The estimates returned are then combined and updated coefficients estimate sent back for a new fit. This iterative process goes on until convergence is achieved. The input data should not contain missing values. The data must be in a data.frame obejct and the variables must be refer to through the data.frame. Value a list which contains the final coefficient estimates (beta values), the pooled alpha value and the pooled phi value. Author(s) Gaye, A.; Jones EM.
3 ds.glm 3 References Jones EM, Sheehan NA, Gaye A, Laflamme P, Burton P. Combined analysis of correlated data when data cannot be pooled. Stat 2013; 2: See Also ds.glm for genralized linear models ds.lexis for survival analysis using piecewise exponential regression { } # load the login data file for the correlated data data(geelogindata) # login and assign all the stored variables to R opals <- datashield.login(logins=geelogindata,assign=true) # set some parameters for the function 9the rest are set to default values) myformula <- response~1+sex+age.60 myfamily <- binomial startbetas <- c(-1,1,0) clusters <- id mycorr <- ar1 # run a GEE analysis with the above specifed parameters ds.gee(data= D,formula=myformula,family=myfamily,corStructure=mycorr,clusterID=clusters,startCoeff=startbeta # clear the Datashield R sessions and logout datashield.logout(opals) ds.glm Runs a combined GLM analysis of non-pooled data A function fit generalized linear models ds.glm(formula = NULL, data = NULL, family = NULL, offset = NULL, weights = NULL, checks = FALSE, maxit = 15, CI = 0.95, viewiter = FALSE, datasources = NULL)
4 4 ds.glm Arguments formula data family offset weights checks maxit CI viewiter datasources startbetas a character, a formula which describes the model to be fitted a character, the name of an optional data frame containing the variables in in the formula. The process stops if a non existing data frame is indicated. a description of the error distribution function to use in the model a character, null or a numeric vector that can be used to specify an a priori known component to be included in the linear predictor during fitting. a character, the name of an optional vector of prior weights to be used in the fitting process. Should be NULL or a numeric vector. a boolean, if TRUE (default) checks that takes 1-3min are carried out to verify that the variables in the model are defined (exist) on the server site and that they have the correct characteristics required to fit a GLM. The default value is FALSE because checks lengthen the runtime and are mainly meant to be # used as help to look for causes of eventual errors. the number of iterations of IWLS used instructions to each computer requesting non-disclosing summary statistics. The summaries are then combined to estimate the parameters of the model; these parameters are the same as those obtained if the data were physically pooled. a numeric, the confidence interval. a boolean, tells whether the results of the intermediate iterations should be printed on screen or not. Default is FALSE (i.e. only final results are shown). a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a dataframe, from opal datasources. starting values for the parameters in the linear predictor Details It enables a parallelized analysis of individual-level data sitting on distinct servers by sending Value coefficients a named vector of coefficients residuals the working residuals, that is the residuals in the final iteration of the IWLS fit. fitted.values the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. rank the numeric rank of the fitted linear model. family the family object used. linear.predictors the linear fit on link scale. Author(s) Burton,P;Gaye,A;Laflamme,P
5 ds.lexis 5 See Also ds.lexis for survival analysis using piecewise exponential regression ds.gee for generalized estimating equation models { # load the file that contains the login details data(glmlogindata) # login and assign all the variables to R opals <- datashield.login(logins=glmlogindata, assign=true) # Example 1: run a GLM without interaction (e.g. diabetes prediction using BMI and HDL levels and GENDER) mod <- ds.glm(formula= D$DIS_DIAB~D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 2: run the above GLM model without an intercept # (produces separate baseline estimates for Male and Female) mod <- ds.glm(formula= D$DIS_DIAB~0+D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 3: run the above GLM with interaction between GENDER and PM_BMI_CONTINUOUS mod <- ds.glm(formula= D$DIS_DIAB~D$GENDER*D$PM_BMI_CONTINUOUS+D$LAB_HDL, family= binomial ) mod # Example 4: Fit a standard Gaussian linear model with an interaction mod <- ds.glm(formula= D$PM_BMI_CONTINUOUS~D$DIS_DIAB*D$GENDER+D$LAB_HDL, family= gaussian ) mod # Example 5: now run a GLM where the error follows a poisson distribution # P.S: A poisson model requires a numeric vector as outcome so in this example we first convert # the categorical BMI, which is of type factor, into a numeric vector ds.asnumeric( D$PM_BMI_CATEGORICAL, BMI.123 ) mod <- ds.glm(formula= BMI.123~D$PM_BMI_CONTINUOUS+D$LAB_HDL+D$GENDER, family= poisson ) mod # clear the Datashield R sessions and logout datashield.logout(opals) } ds.lexis Generates an expanded version of a dataset that contains survival data This function is meant to be used as part of a piecewise regression analysis. ds.lexis(data = NULL, intervalwidth = NULL, idcol = NULL, entrycol = NULL, exitcol = NULL, statuscol = NULL, variables = NULL, newobj = NULL, datasources = NULL)
6 6 ds.lexis Arguments data Details Value a character, the name of the table that holds the original data, this is the data to be expanded. intervalwidth, a numeric vector which gives the chosen width of the intervals ( pieces ). This can be one value (in which case all the intervals have same width) or several different values. If no value(s) are provided a single default value is used. That default value is the set to be the 1/10th of the mean of the exit time values across all the studies. idcol entrycol exitcol statuscol variables newobj datasources a character the name of the column that holds the individual IDs of the subjects. a character, the name of the column that holds the entry times (i.e. start of follow up). If no name is provided the default is to set all the entry times to 0 in a column named "STARTTIME". A message is then printed to alert the user as this has serious consequences if the actual entry times are not 0 for all the subjects. a character, the name of the column that holds the exit times (i.e. end of follow up). a character, the name of the column that holds the failure status of each subject, tells whether or not a subject has been censored. a character vector, the column names of the variables (covariates) to include in the final expanded table. The input table might have a large number of covariates and if only some of those variables are relevant for the sought analysis it make sense to only include those. By default (i.e. if no variables are indicated) all the covariates in the inout table are included and this will lengthen the run time of the function. the name of the output expanded table. By default the name is the name of the input table with the suffixe "_expanded". a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a data frame, from opal datasources It splits the survial interval time of subjects into sub-intervals and reports the failure status of the subjects at each sub-interval. Each of those sub-interval is given an id e.g. if the overall interval of a subject is split into 4 sub-interval, those sub-intervals have ids 1, 2, 3 and 4; so this is basically the count of periods for each subject. The interval ids are held in a column named "TIMEID". The entry and exit times in the input table are used to compute the total survival time. By default all the covariates in the input table are included in the expanded output table but it is preferable to indicate the names of the covariates to be included via the argument variables. a dataframe, an expanded version of the input table. Author(s) Gaye, A.
7 ds.lexis 7 See Also ds.glm for genralized linear models ds.gee for generalized estimating equation models { # load the file that contains the login details data(survivallogindata) # login and assign all the variables to R opals <- datashield.login(logins=survivallogindata,assign=true) # this example shows how to run survival analysis in H-DataSHIELD using the piecewise exponential regression m # let us display the names of the variables in the original table (the table we assigned above and which by defau ds.colnames( D ) # specify some baseline hazard profile (i.e. the width of the intervals to be used) bh <- c(2,1,3,0.5,1.5,2) # expand the original table (e.g the survial time of each individual is split into pieces equal to the interval # we use the function ds.lexis which expands the original table and saves the expanded table on the server site # we set the parameter variables to NULL (default) which means include all the covariates in the expanded table # to indicate the variables to include if you have many variables and wants to use only a subset of those. ds.lexis(data= D, intervalwidth=bh, idcol="id", entrycol="starttime", exitcol="endtime", statuscol="cens") # let us display the names of variables in the expanded table (by default it is the name of the priginal table fo ds.colnames( D_expanded ) # Now fit a GLM with a poisson model # there is a direct relationship between the poisson model with a log-time offset and the exponential model so we # use glm to fit a poisson model and include a factor for the time intervals ( TIMEID ) to have different rates. # The vector SURVIVALTIME (the time elapsed between start of follow up failure/censoring) and the vector TIME # which allows for different rates are generated when the initial table got expanded via the function ds.lxus. # In the below model the log of the survival time is used as an offset (some known information to be included in t # generate a vector of log survival time values ds.assign(toassign= log(d_expanded$survivaltime), newobj= logsurvival ) # Fit the GLM - the outcome is failure status ds.glm(formula= CENS~1+TIMEID+AGE.60+GENDER+NOISE.56+PM10.16, data= D_expanded, family= poisson, offset= lo # clear the Datashield R sessions and logout datashield.logout(opals) }
8 8 geelogin_remoteserver geelogindata Information required to login to opal servers for the GEE test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(geelogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(geelogindata) geelogin_remoteserver Information required to login to opal servers for the GEE test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(geelogin_remoteserver) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse
9 glmlogindata 9 data(geelogin_remoteserver) glmlogindata Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(glmlogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(glmlogindata) glmlogin_remoteserver Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(glmlogin_remoteserver)
10 10 survivallogindata Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(glmlogin_remoteserver) survivallogindata Information required to login to opal servers for the GLM test data A table of with 5 columns: study name, URL, username, password and opal datasource. data(survivallogindata) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(survivallogindata)
11 Index ds.gee, 1, 5 ds.glm, 3 ds.lexis, 5, 5 geelogin_remoteserver, 8 geelogindata, 8 glmlogin_remoteserver, 9 glmlogindata, 9 survivallogindata, 10 11
Package dsstatsclient
Maintainer Author Version 4.1.0 License GPL-3 Package dsstatsclient Title DataSHIELD client site stattistical functions August 20, 2015 DataSHIELD client site
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationPackage uptimerobot. October 22, 2015
Type Package Version 1.0.0 Title Access the UptimeRobot Ping API Package uptimerobot October 22, 2015 Provide a set of wrappers to call all the endpoints of UptimeRobot API which includes various kind
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationPackage retrosheet. April 13, 2015
Type Package Package retrosheet April 13, 2015 Title Import Professional Baseball Data from 'Retrosheet' Version 1.0.2 Date 2015-03-17 Maintainer Richard Scriven A collection of tools
More informationPackage MDM. February 19, 2015
Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationPackage metafuse. November 7, 2015
Type Package Package metafuse November 7, 2015 Title Fused Lasso Approach in Regression Coefficient Clustering Version 1.0-1 Date 2015-11-06 Author Lu Tang, Peter X.K. Song Maintainer Lu Tang
More informationPackage missforest. February 20, 2015
Type Package Package missforest February 20, 2015 Title Nonparametric Missing Value Imputation using Random Forest Version 1.4 Date 2013-12-31 Author Daniel J. Stekhoven Maintainer
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationStephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: SIMPLIS Syntax Files
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: SIMPLIS Files Table of contents SIMPLIS SYNTAX FILES... 1 The structure of the SIMPLIS syntax file... 1 $CLUSTER command... 4
More information7 Generalized Estimating Equations
Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can
More informationSP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
More informationMultiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
More informationSurvey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationSAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationAdvanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090
Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001-(781)-751-6356 fax 001-(781)-329-3379 trhodes@mib.com Abstract
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationCLC Server Command Line Tools USER MANUAL
CLC Server Command Line Tools USER MANUAL Manual for CLC Server Command Line Tools 2.5 Windows, Mac OS X and Linux September 4, 2015 This software is for research purposes only. QIAGEN Aarhus A/S Silkeborgvej
More informationPackage bigrf. February 19, 2015
Version 0.1-11 Date 2014-05-16 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type
More informationCoefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationextreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
More informationPackage ATE. R topics documented: February 19, 2015. Type Package Title Inference for Average Treatment Effects using Covariate. balancing.
Package ATE February 19, 2015 Type Package Title Inference for Average Treatment Effects using Covariate Balancing Version 0.2.0 Date 2015-02-16 Author Asad Haris and Gary Chan
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationColor Screen Phones: SIP-T48G and SIP-T46G with firmware version 73
This document provides detailed information on how to use ACD (automatic call distribution) feature on Yealink IP phones integrated with Star2Star platform. ACD enables organizations to manage a large
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationModel Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationintertrax Suite intertrax exchange intertrax monitor intertrax connect intertrax PIV manager User Guide Version 3 2011
intertrax Suite intertrax exchange intertrax monitor intertrax connect intertrax PIV manager User Guide Version 3 2011 Copyright 2003-2011 by Salamander Technologies, Inc. Protected by US Patents 5,573,278;
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationPackage sjdbc. R topics documented: February 20, 2015
Package sjdbc February 20, 2015 Version 1.5.0-71 Title JDBC Driver Interface Author TIBCO Software Inc. Maintainer Stephen Kaluzny Provides a database-independent JDBC interface. License
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationSPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout
Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationSupplementary PROCESS Documentation
Supplementary PROCESS Documentation This document is an addendum to Appendix A of Introduction to Mediation, Moderation, and Conditional Process Analysis that describes options and output added to PROCESS
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationTitle: Lending Club Interest Rates are closely linked with FICO scores and Loan Length
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationAras Corporation. 2005 Aras Corporation. All rights reserved. Notice of Rights. Notice of Liability
Aras Corporation 2005 Aras Corporation. All rights reserved Notice of Rights All rights reserved. Aras Corporation (Aras) owns this document. No part of this document may be reproduced or transmitted in
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationPackage neuralnet. February 20, 2015
Type Package Title Training of neural networks Version 1.32 Date 2012-09-19 Package neuralnet February 20, 2015 Author Stefan Fritsch, Frauke Guenther , following earlier work
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationhp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines
The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu The Statistics menu is accessed from the ORANGE shifted function of the 5 key by pressing Ù. When pressed, a CHOOSE
More informationSnapLogic Salesforce Snap Reference
SnapLogic Salesforce Snap Reference Document Release: October 2012 SnapLogic, Inc. 71 East Third Avenue San Mateo, California 94401 U.S.A. www.snaplogic.com Copyright Information 2012 SnapLogic, Inc. All
More informationModel Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
More informationQuick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model
Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer,
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationIllustration (and the use of HLM)
Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will
More information7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
More informationCDD user guide. PsN 4.4.8. Revised 2015-02-23
CDD user guide PsN 4.4.8 Revised 2015-02-23 1 Introduction The Case Deletions Diagnostics (CDD) algorithm is a tool primarily used to identify influential components of the dataset, usually individuals.
More informationMAT 242 Test 3 SOLUTIONS, FORM A
MAT Test SOLUTIONS, FORM A. Let v =, v =, and v =. Note that B = { v, v, v } is an orthogonal set. Also, let W be the subspace spanned by { v, v, v }. A = 8 a. [5 points] Find the orthogonal projection
More informationIntroduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012
Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 202 ROB CRIBBIE QUANTITATIVE METHODS PROGRAM DEPARTMENT OF PSYCHOLOGY COORDINATOR - STATISTICAL CONSULTING SERVICE COURSE MATERIALS
More informationPEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
More informationPolynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationPackage TSfame. February 15, 2013
Package TSfame February 15, 2013 Version 2012.8-1 Title TSdbi extensions for fame Description TSfame provides a fame interface for TSdbi. Comprehensive examples of all the TS* packages is provided in the
More informationJournal of Statistical Software
JSS Journal of Statistical Software January 2011, Volume 38, Issue 5. http://www.jstatsoft.org/ Lexis: An R Class for Epidemiological Studies with Long-Term Follow-Up Martyn Plummer International Agency
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationCHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
More informationUsing R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationWESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More information