Package dsstatsclient



Similar documents
Package dsmodellingclient

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Chapter 2 Probability Topics SPSS T tests

Soci Data Analysis in Sociological Research. Homework 5 Computer Handout

Package retrosheet. April 13, 2015

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Tutorial 5: Hypothesis Testing

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Tutorial for proteome data analysis using the Perseus software platform

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Package uptimerobot. October 22, 2015

Federal Employee Viewpoint Survey Online Reporting and Analysis Tool

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

individualdifferences

Chapter 23 Inferences About Means

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Two Related Samples t Test

Package sjdbc. R topics documented: February 20, 2015

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Package dunn.test. January 6, 2016

Analysis of categorical data: Course quiz instructions for SPSS

Data Analysis Tools. Tools for Summarizing Data

Package TSfame. February 15, 2013

Study Guide for the Final Exam

TIPS FOR DOING STATISTICS IN EXCEL

Linear Models in STATA and ANOVA

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Inference for two Population Means

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

An introduction to IBM SPSS Statistics

Two-sample hypothesis testing, II /16/2004

3.4 Statistical inference for 2 populations based on two samples

Package lmertest. July 16, 2015

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

2 Sample t-test (unequal sample sizes and unequal variances)

Basic Statistical and Modeling Procedures Using SAS

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Section 13, Part 1 ANOVA. Analysis Of Variance

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Independent t- Test (Comparing Two Means)

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

t-test Statistics Overview of Statistical Tests Assumptions

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Guide to Microsoft Excel for calculations, statistics, and plotting data

NCSS Statistical Software

Introduction. Statistics Toolbox

Mind on Statistics. Chapter 13

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Using Excel for Statistics Tips and Warnings

Factors affecting online sales

Simple Linear Regression Inference

Multivariate normal distribution and testing for means (see MKB Ch 3)

3 The spreadsheet execution model and its consequences

Package missforest. February 20, 2015

A Short Guide to R with RStudio

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Package benford.analysis

Regression step-by-step using Microsoft Excel

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Analysis in SPSS. February 21, If you wish to cite the contents of this document, the APA reference for them would be

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Statistical Functions in Excel

Package RCassandra. R topics documented: February 19, Version Title R/Cassandra interface

Package neuralnet. February 20, 2015

Directions for using SPSS

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Multiple Linear Regression

One-Way Analysis of Variance (ANOVA) Example Problem

MTH 140 Statistics Videos

BackupAgent Management Console User Manual

Using Excel in Research. Hui Bian Office for Faculty Excellence

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Module 5: Statistical Analysis

Module 4 (Effect of Alcohol on Worms): Data Analysis

Descriptive Statistics

An SPSS companion book. Basic Practice of Statistics

HYPOTHESIS TESTING WITH SPSS:

Association Between Variables

Two-sample inference: Continuous data

Binary Diagnostic Tests Two Independent Samples

Robust t Tests. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

One-Way Analysis of Variance

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Permutation Tests for Comparing Two Populations

Recall this chart that showed how most of our course would be organized:

Package SHELF. February 5, 2016

Projects Involving Statistics (& SPSS)

Introduction to Regression and Data Analysis

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Transcription:

Maintainer <datashield@obiba.org> Author <datashield@obiba.org> Version 4.1.0 License GPL-3 Package dsstatsclient Title DataSHIELD client site stattistical functions August 20, 2015 DataSHIELD client site stattistical functions Depends opal, dsbaseclient R topics documented: ds.cor............................................ 1 ds.cortest.......................................... 3 ds.cov............................................ 4 ds.ttest........................................... 5 ds.var............................................ 7 logindata.......................................... 8 login_remoteserver..................................... 9 Index 10 ds.cor Computes correlation between two or more vectors This is similar to the R base function cor. ds.cor(x = NULL, y = NULL, naaction = "pairwise.complete.obs", datasources = NULL) 1

2 ds.cor Arguments x y naaction datasources a character, the name of a numerical vector, matrix or dataframe NULL (default) or the name of a vector, matrix or data frame with compatible dimensions to x. a character string giving a method for computing covariances in the presence of missing values. This must be one of the strings: "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". The default value is set to "pairwise.complete.obs" a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value In addition to computing correlations this function, unlike the R base function cor, produces a table outlining the number of complete cases to allow for the user to make a decision about the relevance of the correlation based on the number of complete cases included in the correlation calculations. a list containing the results of the test Author(s) Gaye, A. { # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_HDL, LAB_TSC, GENDER ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: generate the correlation matrix for the assigned dataset D # which contains 4 vectors (2 continuous and 1 categorical) ds.cor(x= D ) # Example 2: calculate the correlation between two vectors (first assign some vectors from the dataframe D ) ds.assign(newobj= labhdl, toassign= D$LAB_HDL ) ds.assign(newobj= labtsc, toassign= D$LAB_TSC ) ds.assign(newobj= gender, toassign= D$GENDER ) ds.cor(x= labhdl, y= labtsc ) ds.cor(x= labhdl, y= gender ) # clear the Datashield R sessions and logout

ds.cortest 3 } datashield.logout(opals) ds.cortest Tests for correlation between paired samples This is similar to the R base function cor.test. ds.cortest(x = NULL, y = NULL, datasources = NULL) Arguments datasources x y a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. a character, the name of a numerical vector a character, the name of a numerical vector Details Value Runs a two sided pearson test with a 0.95 confidence level. a list containing the results of the test Author(s) Gaye, A.; Burton, P. { # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_TSC, LAB_HDL ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # test for correlation between the variables LAB_TSC and LAB_HDL ds.cortest(x= D$LAB_TSC, y= D$LAB_HDL )

4 ds.cov } # clear the Datashield R sessions and logout datashield.logout(opals) ds.cov Computes covariance between two or more vectors This is similar to the R base function cov. ds.cov(x = NULL, y = NULL, naaction = "pairwise.complete.obs", datasources = NULL) Arguments datasources x y naaction a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. a character, the name of a numerical vector, matrix or dataframe NULL (default) or the name of avector, matrix or data frame with compatible dimensions to x. a character string giving a method for computing covariances in the presence of missing values. This must be one of the strings: "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". The default value is set to "pairwise.complete.obs" Details In addition to computing covariances; this function, unlike the R base function cov, produces a table outlining the number of complete cases to allow for the user to make a decision about the relevance of the covariance based on the number of complete cases included in the covariance calculations. Value a list containing the results of the test Author(s) GAYE, A.

ds.ttest 5 { } # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_HDL, LAB_TSC, GENDER ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: generate the covariance matrix for the assigned dataset D # which contains 4 vectors (2 continuous and 1 categorical) ds.cov(x= D ) # Example 2: calculate the covariance between two vectors # (first assign the vectors from D ) ds.assign(newobj= labhdl, toassign= D$LAB_HDL ) ds.assign(newobj= labtsc, toassign= D$LAB_TSC ) ds.assign(newobj= gender, toassign= D$GENDER ) ds.cov(x= labhdl, y= labtsc ) ds.cov(x= labhdl, y= gender ) # clear the Datashield R sessions and logout datashield.logout(opals) ds.ttest Runs a student s t-test Performs one and two sample t-tests on vectors of data. ds.ttest(x = NULL, y = NULL, type = "combine", alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, datasources = NULL) Arguments x y a character, the name of a (non-empty) numeric vector of data values or a formula of the form a~b where a is the name of a continuous variable and b that of a factor variable. a character, the name of an optional (non-empty) numeric vector of data values.

6 ds.ttest type alternative mu paired var.equal conf.level datasources a character which tells if the test is ran for the pooled data or not. By default type is set to combine and a t.test of the pooled data is carried out. If type is set to split, a t.test is ran for each study separately. a character specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter. a number indicating the true value of the mean (or difference in means if you are performing a two sample test). a logical indicating whether you want a paired t-test. a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch. (or Satterthwaite) approximation to the degrees of freedom is used. confidence level of the interval. a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value Summary statistics are obtained from each of the data sets that are located on the distinct computers/servers. And then grand means and variances are calculated. Those are used for performing t-test. The funtion allows for the calculation of t-test between two continuous variables or between a continuous and a factor variable; the latter option requires a formula (see parameter dataframe). If a formula is provided all other but conf.level=0.95 are ignored. a list containing the following elements: statistic the value of the t-statistic. parameter the degrees of freedom for the t-statistic. p.value p.value the p-value for the test. conf.int a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test. null.value the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. alternative a character string describing the alternative hypothesis method a character string indicating what type of t-test was performed an object of type htest if both x and y are continuous and a list otherwise. Author(s) Isaeva, J.; Gaye, A. { # load that contains the login details # login and assign all the variables opals <- datashield.login(logins=logindata,assign=true)

ds.var 7 } # Example 1: Run a t.test of the pooled data for the variables LAB_HDL and LAB_TSC - default ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC ) # Example 2: Run a test to compare the mean of a continuous variable across the two categories of a categorical v s <- ds.ttest(x= D$PM_BMI_CONTINUOUS~D$GENDER ) # Example 3: Run a t.test for each study separately for the same variables as above ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, type= split ) # Example 4: Run a paired t.test of the pooled data ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, paired=true) # Example 5: Run a paired t.test for each study separately ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, paired=true, type= split ) # Example 6: Run a t.test of the pooled data with different alternatives ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, alternative= greater ) ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, alternative= less ) # Example 7: Run a t.test of the pooled data with mu different from zero ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, mu=-4) # Example 8: Run a t.test of the pooled data assuming that variances of variables are equal ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, var.equal=true) # Example 9: Run a t.test of the pooled data with 90% confidence interval ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, conf.level=0.90) # Example 10: Run a one-sample t.test of the pooled data ds.ttest(x= D$LAB_HDL ) # the below example should not work, paired t.test is not possible if the y variable is missing # ds.ttest(x= D$LAB_HDL, paired=true) # clear the Datashield R sessions and logout datashield.logout(opals) ds.var Computes the variance of a given vector This function is similar to the R function var. ds.var(x = NULL, type = "combine", datasources = NULL)

8 logindata Arguments x type datasources a character, the name of a numerical vector. a character which represents the type of analysis to carry out. If type is set to combine, a global variance is calculated if type is set to split, the variance is calculated separately for each study. a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value It is a wrapper for the server side function a a global variance or one variance for each study. Author(s) Gaye, A. { } # load that contains the login details # login and assign specific variable(s) myvar <- list( LAB_TSC ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: compute the pooled variance of the variable LAB_TSC - default behaviour ds.var(x= D$LAB_TSC ) # Example 2: compute the variance of each study separately ds.var(x= D$LAB_TSC, type= split ) # clear the Datashield R sessions and logout datashield.logout(opals) logindata Information required to login to opal servers A table of with 5 columns: study name, URL, username, password and opal datasource.

login_remoteserver 9 Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse login_remoteserver Information required to login to opal servers A table of with 5 columns: study name, URL, username, password and opal datasource. data(login_remoteserver) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(login_remoteserver)

Index ds.cor, 1 ds.cortest, 3 ds.cov, 4 ds.ttest, 5 ds.var, 7 login_remoteserver, 9 logindata, 8 10