1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be referred to in this document as the Node Pack. What is the Node Pack? The LAL Predictive and Statistical Analytics Node Pack is a set of nodes that allows business users to incorporate powerful statistical and predictive functionality within a Lavastorm Analytics Engine (LAE) graph or logical workflow without needing to use programming. What is included in the Node Pack? The Node Pack includes the following nodes: Linear Regression Enables users to model relationships between a dependent variable and a set of independent variables using a linear regression model. Predict Linear Regression Uses a linear regression model to predict the value of a dependent variable based on the specified values of the independent variables. Logistic Regression Enables users to model relationships between a binomial dependent variable and a set of independent variables using a logistic regression model. Predict Logistic Regression Uses a logistic (Generalized Linear Model) regression model to predict the probability of an outcome based on the specified values of the independent variables. K-Means Clustering Provides classification of data into a specified number of clusters. K-Means Advisor Helps users select the most appropriate number of clusters to use when performing K-Means classification. Hierarchical Clustering Classifies data by building a hierarchy of clusters. R Library Package Download Utility node that simplifies the download of CRAN Packages that are prerequisites for the correct operation of other nodes in the Statistical and Predictive Analytic Node Pack. Market Basket Analysis - Uses data mining to identify hidden relationships in data by identifying association rules. Page 1
Market Basket Miner Allows users to sort and filter rules that meet their specified criteria. Time Series Diagnostics Provides sets of diagnostic plots that help the user select the correct model to be used when creating a time series model and determine whether the model is a good fit for the data. Time Series Forecast Uses the Holt-Winters method to forecast a time series that can optionally contain a trend and seasonal variations, and allows the prediction of future values for the times series. Decision Forest Machine learning technique that uses an ensemble of decision trees to generate a model that can be used to classify data, perform regression modeling, or use an unsupervised learning mode to assess the importance of variables in determining the proximity of observations. Predict Decision Forest Uses a Random Forest model to predict the value or classification of a dependent variable based on the specified values of the independent variables. Is that the complete list? No, the above list reflects the set of nodes that are currently available in LAL 6.0.35.0 and LAL 5.1.35.0 releases. As with the existing LAL program, additional nodes will be added to the Node Pack in future LAL releases. Examples of these additional nodes include: Linear Regression Diagnostics Provides a set of diagnostic plots to aid the user in determining whether the model is a good fit for the data. Logistic Regression Diagnostics Provides a set of diagnostic plots to aid the user in determining whether the model is a good fit for the data. Quantile Analysis Uses regression to model the relationship between a dependent variable and a set of independent variables using a specified quantile function of the response. ARIMA Utilizes an AutoRegressive Integrated Moving Average technique to model time series data and forecast future values of the time series. How do I use the Node Pack? The nodes available in the Node Pack can be connected in a graph similar to any other node. The nodes can utilize prepared data originating from others nodes, and can also output data to other nodes for further analysis. Page 2
Figure 1: An example of a K-means clustering node using filtered data to prepare a clustered scatter plot and quick statistics. How is the node pack licensed? The nodes are part of the LAE Statistical and Predictive Node Pack. This is a premium product option that requires a separate Node Pack license in addition to the LAE license. Note: the LAE Statistical and Predictive Node Pack license does not provide access to the Power R node. A separate license is required to be able to use the Power R Node Pack. These two licenses are independent (i.e. they can be purchased separately with no dependence on the customer purchasing the other Node Pack license). Can I purchase the nodes individually? No, the nodes can only be purchased as part of the LAE Statistical and Predictive Node Pack. What are the supported LAE editions? The following LAE editions are supported: - Professional - Professional Plus - Team Server - Enterprise Server What platforms are supported? LAE 5.1.x Desktop Editions: - Microsoft Windows (64-bit): Windows 7 for x64; Windows 8.1 for x64 LAE 5.1.x Server Editions: - Microsoft Windows (64-bit): Windows Server 2008 - Oracle Linux: (x86-64; 64-bit): Enterprise 5 Page 3
- RedHat Linux (x86-64; 64-bit): Enterprise Linux 5, 6 - SUSE Linux Enterprise Server 11 Service Pack 2 Note: the R Library Download node is only supported on Windows platforms. LAE 6.0.x Desktop Editions: - Microsoft Windows (64-bit): Windows 7 for x64; Windows 8.1 for x64 LAE 6.0.x Server Editions: - Microsoft Windows (64-bit): Windows Server 2008 SP2; Windows Server 2012 R2 - Oracle Linux: (x86-64; 64-bit): Enterprise 6 - RedHat Linux (x86-64; 64-bit): Enterprise Linux 6, 7 - SUSE Linux Enterprise Server 11 Service Pack 2 Note: the R Library Download node is only supported on Windows platforms. What versions of LAE are supported? The nodes in the LAE Statistical and Predictive Node Pack are supported on LAE 6.0.x and LAE 5.1.x releases. A subset of the nodes in the Node Pack are also supported on LAE 5.0.x, see the Lavastorm Website for further details. What other related software available? The Power R node provides data scientists and other technical users with direct access to the power of TIBCO Enterprise Runtime for R, allowing them to us the R language to create custom statistical and predictive analytic functionality on an enterprise-grade and scalable platform. Can I use the Power R node if I purchase the Predictive and Statistical Analytics Node Pack? No. A separate Node Pack license is required to use the Power R node. Is the Node Pack based on Open Source R? No. The Node Pack is built on TIBCO Enterprise Runtime for R foundations, which is optimized for speed, scale, and stability. The majority of the Node Pack utilizes functionality that is natively supported by TIBCO Enterprise Runtime for R. Some functionality is based on the open source Comprehensive R Archive Network (see below for additional details). 1.2 Deploying and using the Node Pack How do customers get the software? Page 4
The Statistical and Predictive Node Pack will be available as part of the published LAL installer software that will be available on the Lavastorm Analytics website. Where a customer s LAE desktop installation is configured to automatically check for updates, they will be notified that the new software is available when their LAE installation checks for an update following the date at which it is publically released. 1.2.1 CRAN Packages Some of the nodes in the Node Pack use functionality provided by R packages. The prerequisite packages have an Open Source License and are available from the Comprehensive R Archive Network (or one of its mirror sites). As these packages are open source, they must not be distributed with LAE. However, for correct operation of the nodes in the Node Pack the prerequisite CRAN packages must be installed by the user on the machine hosting the LAE Server and installed into the file system location used by LAE. The R Library Download node provides a convenient mechanism by which non-technical users can download and install the prerequisite CRAN packages onto a Windows machine that hosts the LAE Server. The R Library Download node is not supported on Linux platforms. The R Library Download node requires access to the internet to download the prerequisite R binary packages from the CRAN repository. For Linux installations, it is necessary to install the packages by other means. Typically, CRAN packages for Linux systems are available from the CRAN repositories as source packages rather than in binary form. Open Source R (OS R) can be used to download and install the prerequisite packages. If OS R is installed on the machine hosting the LAE Server, when performing the installation, the installation directory can be set to that used by LAE for its package library location (i.e. <LAE Installation Directory> /LAE6.0/TERR/library). If the OS R installed on a separate machine is used to download and compile the prerequisite packages, is must have the same Linux configuration as the machine hosting the LAE server. Once the packages have been installed on the other machine, the packages can be copied to the directory used by LAE as the package library location (see above). Internet access will be required to download the source packages from the CRAN repository. The following CRAN packages are required by the Node Pack: - 'arules' - 'boot' - 'cluster' - 'gbm' - 'lattice' - Matrix - 'NbClust' - 'randomforest' - 'TTR' - 'zoo' Dependent packages are also required and should be automatically downloaded by OS R during the installation process. For further information on the package installation process, see the CRAN documentation (section 6.3): - http://cran.r-project.org/doc/manuals/r-release/r-admin.html Page 5
TIBCO Enterprise Runtime for R are either registered trademarks or trademarks of TIBCO Software Inc. and/or its subsidiaries in the United States and/or other countries Page 6