Package dsstatsclient

Transcription

1 Maintainer Author Version License GPL-3 Package dsstatsclient Title DataSHIELD client site stattistical functions August 20, 2015 DataSHIELD client site stattistical functions Depends opal, dsbaseclient R topics documented: ds.cor ds.cortest ds.cov ds.ttest ds.var logindata login_remoteserver Index 10 ds.cor Computes correlation between two or more vectors This is similar to the R base function cor. ds.cor(x = NULL, y = NULL, naaction = "pairwise.complete.obs", datasources = NULL) 1

2 2 ds.cor Arguments x y naaction datasources a character, the name of a numerical vector, matrix or dataframe NULL (default) or the name of a vector, matrix or data frame with compatible dimensions to x. a character string giving a method for computing covariances in the presence of missing values. This must be one of the strings: "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". The default value is set to "pairwise.complete.obs" a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value In addition to computing correlations this function, unlike the R base function cor, produces a table outlining the number of complete cases to allow for the user to make a decision about the relevance of the correlation based on the number of complete cases included in the correlation calculations. a list containing the results of the test Author(s) Gaye, A. { # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_HDL, LAB_TSC, GENDER ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: generate the correlation matrix for the assigned dataset D # which contains 4 vectors (2 continuous and 1 categorical) ds.cor(x= D ) # Example 2: calculate the correlation between two vectors (first assign some vectors from the dataframe D ) ds.assign(newobj= labhdl, toassign= D$LAB_HDL ) ds.assign(newobj= labtsc, toassign= D$LAB_TSC ) ds.assign(newobj= gender, toassign= D$GENDER ) ds.cor(x= labhdl, y= labtsc ) ds.cor(x= labhdl, y= gender ) # clear the Datashield R sessions and logout

3 ds.cortest 3 } datashield.logout(opals) ds.cortest Tests for correlation between paired samples This is similar to the R base function cor.test. ds.cortest(x = NULL, y = NULL, datasources = NULL) Arguments datasources x y a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. a character, the name of a numerical vector a character, the name of a numerical vector Details Value Runs a two sided pearson test with a 0.95 confidence level. a list containing the results of the test Author(s) Gaye, A.; Burton, P. { # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_TSC, LAB_HDL ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # test for correlation between the variables LAB_TSC and LAB_HDL ds.cortest(x= D$LAB_TSC, y= D$LAB_HDL )

4 4 ds.cov } # clear the Datashield R sessions and logout datashield.logout(opals) ds.cov Computes covariance between two or more vectors This is similar to the R base function cov. ds.cov(x = NULL, y = NULL, naaction = "pairwise.complete.obs", datasources = NULL) Arguments datasources x y naaction a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. a character, the name of a numerical vector, matrix or dataframe NULL (default) or the name of avector, matrix or data frame with compatible dimensions to x. a character string giving a method for computing covariances in the presence of missing values. This must be one of the strings: "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". The default value is set to "pairwise.complete.obs" Details In addition to computing covariances; this function, unlike the R base function cov, produces a table outlining the number of complete cases to allow for the user to make a decision about the relevance of the covariance based on the number of complete cases included in the covariance calculations. Value a list containing the results of the test Author(s) GAYE, A.

5 ds.ttest 5 { } # load that contains the login details # login and assign specific variable(s) # (by default the assigned dataset is a dataframe named D ) myvar <- list( LAB_HDL, LAB_TSC, GENDER ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: generate the covariance matrix for the assigned dataset D # which contains 4 vectors (2 continuous and 1 categorical) ds.cov(x= D ) # Example 2: calculate the covariance between two vectors # (first assign the vectors from D ) ds.assign(newobj= labhdl, toassign= D$LAB_HDL ) ds.assign(newobj= labtsc, toassign= D$LAB_TSC ) ds.assign(newobj= gender, toassign= D$GENDER ) ds.cov(x= labhdl, y= labtsc ) ds.cov(x= labhdl, y= gender ) # clear the Datashield R sessions and logout datashield.logout(opals) ds.ttest Runs a student s t-test Performs one and two sample t-tests on vectors of data. ds.ttest(x = NULL, y = NULL, type = "combine", alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, datasources = NULL) Arguments x y a character, the name of a (non-empty) numeric vector of data values or a formula of the form a~b where a is the name of a continuous variable and b that of a factor variable. a character, the name of an optional (non-empty) numeric vector of data values.

6 6 ds.ttest type alternative mu paired var.equal conf.level datasources a character which tells if the test is ran for the pooled data or not. By default type is set to combine and a t.test of the pooled data is carried out. If type is set to split, a t.test is ran for each study separately. a character specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter. a number indicating the true value of the mean (or difference in means if you are performing a two sample test). a logical indicating whether you want a paired t-test. a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch. (or Satterthwaite) approximation to the degrees of freedom is used. confidence level of the interval. a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value Summary statistics are obtained from each of the data sets that are located on the distinct computers/servers. And then grand means and variances are calculated. Those are used for performing t-test. The funtion allows for the calculation of t-test between two continuous variables or between a continuous and a factor variable; the latter option requires a formula (see parameter dataframe). If a formula is provided all other but conf.level=0.95 are ignored. a list containing the following elements: statistic the value of the t-statistic. parameter the degrees of freedom for the t-statistic. p.value p.value the p-value for the test. conf.int a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test. null.value the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. alternative a character string describing the alternative hypothesis method a character string indicating what type of t-test was performed an object of type htest if both x and y are continuous and a list otherwise. Author(s) Isaeva, J.; Gaye, A. { # load that contains the login details # login and assign all the variables opals <- datashield.login(logins=logindata,assign=true)

7 ds.var 7 } # Example 1: Run a t.test of the pooled data for the variables LAB_HDL and LAB_TSC - default ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC ) # Example 2: Run a test to compare the mean of a continuous variable across the two categories of a categorical v s <- ds.ttest(x= D$PM_BMI_CONTINUOUS~D$GENDER ) # Example 3: Run a t.test for each study separately for the same variables as above ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, type= split ) # Example 4: Run a paired t.test of the pooled data ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, paired=true) # Example 5: Run a paired t.test for each study separately ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, paired=true, type= split ) # Example 6: Run a t.test of the pooled data with different alternatives ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, alternative= greater ) ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, alternative= less ) # Example 7: Run a t.test of the pooled data with mu different from zero ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, mu=-4) # Example 8: Run a t.test of the pooled data assuming that variances of variables are equal ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, var.equal=true) # Example 9: Run a t.test of the pooled data with 90% confidence interval ds.ttest(x= D$LAB_HDL, y= D$LAB_TSC, conf.level=0.90) # Example 10: Run a one-sample t.test of the pooled data ds.ttest(x= D$LAB_HDL ) # the below example should not work, paired t.test is not possible if the y variable is missing # ds.ttest(x= D$LAB_HDL, paired=true) # clear the Datashield R sessions and logout datashield.logout(opals) ds.var Computes the variance of a given vector This function is similar to the R function var. ds.var(x = NULL, type = "combine", datasources = NULL)

8 8 logindata Arguments x type datasources a character, the name of a numerical vector. a character which represents the type of analysis to carry out. If type is set to combine, a global variance is calculated if type is set to split, the variance is calculated separately for each study. a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources. Details Value It is a wrapper for the server side function a a global variance or one variance for each study. Author(s) Gaye, A. { } # load that contains the login details # login and assign specific variable(s) myvar <- list( LAB_TSC ) opals <- datashield.login(logins=logindata,assign=true,variables=myvar) # Example 1: compute the pooled variance of the variable LAB_TSC - default behaviour ds.var(x= D$LAB_TSC ) # Example 2: compute the variance of each study separately ds.var(x= D$LAB_TSC, type= split ) # clear the Datashield R sessions and logout datashield.logout(opals) logindata Information required to login to opal servers A table of with 5 columns: study name, URL, username, password and opal datasource.

9 login_remoteserver 9 Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse login_remoteserver Information required to login to opal servers A table of with 5 columns: study name, URL, username, password and opal datasource. data(login_remoteserver) Format A data frame where the number of servers corresponds to the number of rows server a character, the formal name of the study url URL of the opal server user a character, a formal username or a path to a valid ssl certificate, if required password a character, a formal password or a path to a valid ssl key if required table a character, the path to the opal datasource that holds the data to analyse data(login_remoteserver)

10 Index ds.cor, 1 ds.cortest, 3 ds.cov, 4 ds.ttest, 5 ds.var, 7 login_remoteserver, 9 logindata, 8 10