: Overview of Statistical and Path Modeling Analyses Prof. Dr. Christian M. Ringle (Hamburg Univ. of Tech., TUHH) Prof. Dr. Jӧrg Henseler (University of Twente) Dr. Geoffrey Hubona (The Georgia R School) 1
Agenda Executive Summary Course Concept and Learning Objectives Schedule Detailed Program Outline Extra Information on Granting a TUHH Certificate for 3 Graduate Credit Hours Registration 2
Executive Summary (1/2) In conjunction with Professor Christian M. Ringle (Hamburg University of Technology, TUHH, http://www.tuhh.de/hrmo/team/prof-dr-c-m-ringle.html) Prof Dr. Jӧrg Henseler (University of Twente, http://www.henseler.com/) the Georgia R School and Dr. Geoffrey Hubona will conduct a live, online, accredited, accelerated 8-week 2015 Online Summer School through the Hamburg University of Technology (TUHH: http://www.tuhh.de). The program teaches a broad overview of conducting linear and non-linear modeling, multivariate analyses, PLS path modeling (PLS-PM) and covariance-based structural equation modeling (CB-SEM), using R software as well as SmartPLS and PLS-GUI software. No experience with R software or using these other applications is required. The accredited program is very useful for graduate students, especially PhD students, as well as for research faculty and other professionals in the analytics industry, and especially for people in these fields who currently use, or who wish to learn how to use, R software, including RStudio for application development and Rcmdr for comprehensive statistical analyses. 3
Executive Summary (2/2) The eight-week summer program will be taught live in 15 sessions in July and August with the take-home final project and exam due in late September. Participants should expect to devote 10-15 hours per week to attend class sessions and to complete the recommended assignments. The accredited online program will be taught live and online from the United States from 11:30AM-2:00PM EDT (GMT-5) and on 7:30PM-10:00PM EDT (GMT-5) on alternating days and should be available to anyone in the world with an Internet connection. Recordings of the live sessions are available to all participants 24/7. Successful participants will receive an official certificate from TUHH equivalent to a standard 3-credit graduate level course. These 3 graduate credits should easily transfer into any graduate-level PhD program in the EUC, and possibly to university programs in other countries, as well. 4
Course Concept and Learning Objectives (1/2) This TUHH 2015 online summer course will benefit TUHH students, faculty and other participants (e.g., practitioners) by: (1) Disseminating advanced research and statistical analysis techniques to TUHH students; (2) Making available state-of-the-art, no-cost, SmartPLS; lavaan; and PLS-GUI PLS path modeling software; (3) Teaching the use of R software, which is quickly becoming the de facto dominant statistical analysis software everywhere and including RStudio; (4) Training about the basic concepts and practices to perform data analysis using the Partial Least Squares path modeling technique with SmartPLS and with R software applications (PLS-GUI), including training on the use of the R lavaan package for structural equation modeling (SEM); (5) Training about the foundations of path modeling and SEM that is rooted in linear modeling and in multivariate analysis techniques; and (6) Providing additional, on-demand, 24/7 online background courses, accessible by each TUHH summer program participant. 5
Course Concept and Learning Objectives (2/2) The teaching and learning objectives for this 2015 TUHH online summer course include: (1) Providing a solid foundation of the basic techniques of statistical inference, regression, linear and non-linear modeling, and multivariate data analysis using R software; (2) Teaching and training with respect to the proper conduct of PLS path modeling analysis suitable for journal publication, with an introduction to structural equation modeling (SEM); and (3) Providing a comprehensive overview, and training on, the analytical capabilities of R software which is quickly becoming the dominant research, statistical analysis, and data analysis software in use by both academic and practitioner industries. 6
Course Topics Introduction to R and to statistical analysis using R, and to PLS path modeling. Session 1: Overview of R; The R software environment; Introduction to RStudio; Introduction to SmartPLS. Session 2: PLS Path Modeling. What is PLS path modeling? An introduction and overview of PLS path modeling. Session 3: Data types, data structures, objects and functions in R. Exercise: Import data into R; create data in R; run an R function. 7
Course Topics Intro to R data types/structures, to PLS reliability and validity assessment, and to applied multivariate data analysis. Session 4: Manipulating Data Frames. Exercise: Extended exercises manipulating Cars93 data frame. Session 5: Reliability and Validity Assessment of Path Models. Exercise: Evaluating reliability and validity with fitted path models. Session 6: PLS Path Modeling extensions with SmartPLS 3.0. Exercise: Create multiple linear and non-linear model objects, fit the models and evaluate the fits. Session 7: Applied multivariate data analysis. What is multivariate data? What is a covariance matrix? A correlation matrix? What is cluster analysis and principal components analysis? 8
Course Topics Introduction to SEM, Bootstrapping and to EFAs and CFAs. Session 8: An introduction to Confirmatory Factor Analysis (CFA) and Structural Equation Modeling (SEM) using the R lavaan package. Session 9: Bootstrapping and Resampling; Obtaining inferential statistics for model parameters. Exercise: Bootstrap model parameters and report on t-stats, p-values, and confidence intervals for model parameters. Session 10: Exploratory Factor Analysis (EFA) and more CFAs. 9
Course Topics Mediation and Moderation, more on Resampling Methods. Session 11: Mediation and Moderation Analyses with an Introduction to PROCESS. What is mediation? Moderation? How to estimate multiple cascading direct, indirect, and interaction effects. Exercise: Estimate mediating effects in path models and bootstrap significance levels. Session 12: Graphical Displays, Simple and Conditional Inference. Session 13: Bootstrapping and Jackknifing: Obtain inference statistics for model parameters to determine significance levels. Exercise: Bootstrap the model parameters and report on the t-statistics, p-values, and confidence intervals for the estimated path coefficients, weights and loadings. 10
Course Topics Advanced PLS Path Modeling Topics, GLMs and GAMs. Session 14: Orthogonalization, Consistent and Nonlinear PLS. Session 15: Generalized Linear and Additive models. 11
Course Assessment Capstone Project and Final Examination: Capstone Project Exercise (part of final exam): Specify, estimate, and evaluate a real-life PLS Path Model. Submit a written description of the overall fit, and of the indicated reliability and validity for both the measurement model and the structural model. Capstone exercise counts 25% of the final grade. Final Examination: There is a final examination that each participant must successfully complete with a passing grade in order to receive the program completion certificate. Final exam counts 75% of the final grade. 12
Detailed Program Outline and Contents (1/6) Six Simultaneous Online On-Demand 24/7 Background Pre-Courses The content for the above fifteen class sessions will be derived from the content and exercises in these 24/7 online courses available to each participant. Fundamentals of using R: Introduction (5 sessions in original course) This shorter course Fundamentals of Using R is an overview of what R is and what you can do with R. The course explains and demonstrates how to use R, and a collection of essential R concepts from a number of sources, much of it from the optional textbook The R Book by Michael J. Crawley (Wiley, 2007). The course also demonstrates how to use RStudio software, a useful and productivity-enhancing front end to the R Console. The console can be rather opaque and non-intuitive to use, especially for beginners. This introductory course teaches fundamental concepts that are essential to understand in order to effectively use R. The courses addresses these basic issues and topics and more: What is R? Getting started using R; Basic capabilities; Reading and writing data and data frames; Input and output; Data Objects; Functions in R; Graphics in R. This course uses the R software, and also the freely-available RStudio Integrated Development Environment (IDE). All software is available at no cost. The course also demonstrates and provides a large number of working scripts (i.e. sets of R commands) to use for your own purposes. 13
Detailed Program Outline and Contents (2/6) Introduction to R as a Statistical Environment (7 sessions in original course) Introduction to R as a Statistical Environment is intended to "break the statistical ice" by introducing someone new to R to the basic capabilities of R as a statistical computing and simulation environment. Those R features and capabilities include: numerical scripting; basics of quantitative versus categorical data; data visualization and analysis; graphical displays; simple and conditional inference; basics of linear modeling through GLMs, with a solid foundation in linear regression; randomizations, simulations, and permutations. The purpose of this course is to illustrate a range of statistical and probability computations using R for people who are using, teaching, or learning statistics. The course is a foundation for all of the other statistical methods courses which follow. It is helpful, but not required, for course participants to have covered the equivalent of a undergraduate level calculus-based course in statistics. Course content demonstrates and explains prototypical exploratory and inferential methods of statistical modeling for analyzing data using R software. The course contains original content and also "samples" representative content from several of the other statistical and programming courses developed by the Georgia R School. However, there is additional detail on each of these topics in the other courses. This course differs from Fundamentals of R Essentials in that this course specifically focuses on the general capabilities and features of R as a statistical computing environment, rather than as a general-purpose data, programming, and graphics environment. It is a 7-session course and an example-based introduction to R as a statistical computing environment. This course assumes no previous familiarity with R or other software packages. 14
Detailed Program Outline and Contents (3/6) Comprehensive Statistical Research Analyses with R (8 sessions original course) Comprehensive Statistical Research Analyses with R teaches how to perform a wide variety of common, research-oriented statistical analysis techniques with R. Participants also learn how to use the R Commander, an (open source) R package that serves as an effective statistical front-end to the R Console. The course is designed to provide a wide overview of how to conduct and interpret many statistical analysis techniques using R. This is a unique and very useful course for Ph.D. students, faculty, and practitioner researchers who are interested in learning R by "diving in" and using R for a variety of detailed statistical analyses. No previous experience with R is necessary. Course topics on comprehensive statistical research analyses using R include: 1. Analysis using graphical displays; 2. Simple inference; 3. Conditional inference; 4. Analysis of variance (ANOVA and MANOVA); 5. Simple and multiple linear regression; 6. Logistic regression; 7. Generalized linear models; 8. Recursive partitioning; 9. Survival analysis; 10. Smoothers and Generalized additive models; 11. Analyzing normally-distributed longitudinal response data; 12. Analyzing exponential, Poisson and binomial longitudinal response data; 13. Other forms of mixed, nested, and hierarchical linear models; 14. Simultaneous inference and multiple comparisons; 15. Meta-Analysis; and 16. Cluster analysis 15
Detailed Program Outline and Contents (4/6) PLS-PM with R: sempls and PLSPM Packages (6 sessions in original course) This course accomplishes three objectives. The course instructs and demonstrates (1) how to import (non-r) SmartPLS model output directly into R for additional processing; (2) the major capabilities and functions of the R sempls package; and (3) the major capabilities and functions of the R plspm package. Although the sempls and plspm R packages use the same PLS algorithm as does SmartPLS, and consequently produce identical PLS model estimates (in almost all cases with a few exceptions), each of the two R packages also contains additional, useful, complementary functions and capabilities. Specifically, sempls has some interesting plots and graphs of PLS path model estimates and also converts your model to run in covariance-based R functions. On the other hand, the plspm package has very complete and well-formatted PLS output that is consistent with the tables and reports required for publication, and also has very useful and unique multigroup-moderation analysis capabilities. If you are interested in learning a lot about PLS path modeling, it is certainly a good use of your time to become familiar with both the sempls and plspm packages in R. 16
Detailed Program Outline and Contents (5/6) Applied Multivariate Analysis with R (7 sessions in original course) In many disciplines, data sets are multivariate. Multivariate data is such that several measurements or observations are captured on each of the units, or variables, in the data set. Multivariate analysis is such that there are several simultaneous outcome variables. All of the variables are assumed to be random variables, unlike regression where only the predicted, dependent variable is random, and where the observed explanatory variables are fixed. Applied Multivariate Analysis with R is an introductory course on classical multivariate methodology and focuses on the utilization of R software to perform these analyses. It makes use of the R package MVA which includes the R code ( scripts ) and data sets used to demonstrate the utilization of R to perform these multivariate techniques. The course does not focus on the details of the theory behind each technique, but rather on the application of R to analyze multivariate data using each technique, and to visualize graphical representations of the data solutions. It is a hands-on course that makes use of many extended examples of R functions, applications, and data sets (all provided). 17
Detailed Program Outline and Contents (6/6) Structural Equation Modeling with Lavaan (8 sessions in original course) Lavaan ( latent variable analysis ) is a free, open source R package for latent variable analysis in the form of Structural Equation Modeling (SEM). Actually, you can use lavaan to estimate a large variety of multivariate statistical models, including covariance-based structural equation modeling (CB-SEM), path analysis, confirmatory factor analysis and growth curve models. The lavaan package, developed by Yves Rosseel, is intended to provide users, researchers and teachers with a free opensource, but commercial-quality package for structural equation modeling with latent variables. The goal of the lavaan development project is to implement all the state-ofthe-art capabilities that are currently available in commercial packages such as Mplus. The course has a decided hands-on orientation, as the majority of class time is spent demonstrating live examples using lavaan with real data. By attending the course, and by practicing the provided executable scripts and completing the exercises, participants will be able to specify, estimate, interpret and evaluate their own CB-SEM latent variable models using lavaan. In addition, they will be able to estimate CFAs, group analyses and growth curve models, use categorical variables properly, estimate indirect effects, mediation analyses, and act on modification indices. The course consists of a comprehensive tutorial using lavaan, interspersed with additional intermediate and advanced SEM examples. If you currently practice SEM-analyses, or if you are considering analyzing data using SEM, you will benefit from this course. Lavaan has many of the features that are only available elsewhere in more expensive commercial SEM software tools such as Mplus. 18
Extra Information on Granting a TUHH Certificate for 3 Graduate Credit Hours (1/2) In addition to fifteen 150-minute, live, online classes, there will be daily exercises to complete, and a final exam with a path modeling project component. The live online class sessions will consist of both a hands-on lecture and discussion as well as a hands-on lab period. There will be weekly readings and other course materials and assignments. The material covered in the live online sessions will be supplemented with the 24/7 availability of simultaneous, on-demand, Georgia R School (http://georgia-r-school.org) background courses which expand on the in-class-discussed topics. These courses are listed and briefly described as an appendix to this document. Successful participants will receive an official certificate of completion from TUHH. The program content and course work is equivalent to a standard 3- credit graduate-level course. The live portion of the summer program will consist of: 15 integrated 150- minute online class sessions (including the final exam and project), delivered on alternating days in both the AM and the PM, from the United States in late June, July and August; a minimum of ten daily assignments; and a graded final examination with a path modeling assessment portion. 19
Extra Information on Granting a TUHH Certificate for 3 Graduate Credit Hours (2/2) The faculty conducting the TUHH summer program includes: Prof. Dr. Christian M. Ringle (Hamburg Univ. of Tech., TUHH) Prof. Dr. Jӧrg Henseler (University of Twente) Dr. Geoffrey Hubona (The Georgia R School) Each is a doctorally-qualified faculty member with decades of university teaching experience. Each is regarded as an expert in statistics, structural equation modeling (SEM) and/or Partial Least Squares path modeling (PLS-PM). Dr. Christian Ringle is a co-developer of the very popular SmartPLS (http://smartpls.de) software. Dr. Ringle and Dr. Henseler are co-founders of the PLS School (http://plsschool.com) and of SEM n R (http://sem-n-r.com). Dr. Hubona is a co-developer of PLS-GUI (http://tinyurl.com/plsgui) software and the founder of the Georgia R School (http://georgia-r-school.org), and a co-founder of SEM n R. Here are brief online vitas for Dr. Ringle, Dr. Henseler and Dr. Hubona. 20
Certificate Hamburg University of Technology http://www.tuhh.de/hrmo Mrs. Martina MUSTERMANN successfully attended the 2014 TUHH Summer School course Overview of Statistical and Path Modeling Analyses with the following contents: Graduiertenakademie für Technologie und Innovation TUHH Schwarzenbergstr. 95 (E) 21073 Hamburg http://www.tuhh.de/graduiertenakademie Introduction to R and to statistical analysis using R, including inference and linear and non-linear modeling (3 sessions) Introduction to statistical analysis using R, including inference and linear and non-linear modeling (4 sessions) Multivariate Data, SEM, and PLS Path Modeling (6 sessions) PLS Path Modeling and SEM (2 sessions) Capstone Project Exercise and Final Examination The course was conducted by Prof. Dr. Joerg Henseler (University of Twente, The Netherlands), Dr. Geoffrey Hubona (The Georgia R School), and Prof. Dr. Christian M. Ringle (Hamburg University of Technology, TUHH). The total work load of this course was 90 hours which is eligible for 3 credits. Passing a final exam was a requirement to successfully complete this course. Hamburg, October 29, 2014 Prof. Dr. Christian M. Ringle http://www.tuhh.de/hrmo Dr. Krista Schölzig Graduiertenakademie TUHH 21
Registration This summer graduate program is offered by Hamburg University of Technology (TUHH ) and the Georgia R School. Registration may be accomplished in minutes by visiting: http://shop.georgia-r-school.org Early registration cost (through April 30) is: $895 USD non-student $695 USD full-time student Registration cost beyond April 30 increases $100 USD: $995 USD non-student $795 USD full-time student 22
Thank you for your attention! www.tuhh.de 23