1 Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major vendors in the predictive analytics market today. Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. We highlight the strengths of each product so you can get a sense of what makes it unique, which may help guide you to pick the best tool for your particular predictive analytics goals. The vendors describe the features of their products through their white papers and websites. Any vendor should be willing to demo any of these products for your organization, and send you additional documentation to help you decide what solution will work best for the needs of your business.
2 BC-2 Predictive Analytics For Dummies Angoss Analytics Software Suite Angoss provides a suite of desktop, client-server, and big-data analytics software products, as well as fully managed and hosted cloud solutions. The software suite consists of KnowledgeSEEKER, KnowledgeSTUDIO, and KnowledgeREADER. KnowledgeSEEKER and KnowledgeSTUDIO provide data preparation, profiling, and visualization, as well as Angoss-patented Decision Trees, Strategy Trees, predictive modeling, and model performance and evaluation capabilities. Angoss provides an easy interface that offers advanced analytics. The products are built for both business and technical users. KnowledgeSTUDIO provides advanced modeling and predictive analytics features for quantitative analysts. KnowledgeREADER comes with text-analysis capabilities, combining visual text discovery and sentiment analysis with predictive analytics. Angoss products support multiple data sources including Hadoop, R, SAS, SPSS, SQL, Microsoft Excel, and databases. Angoss In-Database Analytics enables a complete analytics workflow within the database from data preparation to model development to model deployment. All three packages offer advanced data-preparation capabilities, using the data-menu wizards to combine and aggregate datasets. You can create new derived fields and apply any data transformation of interest. The products also have an extensive array of tools for data exploration and visualization. The Angoss software suite provides a guided workflow for business users who have no expertise in predictive analytics. In addition, visual data discovery allows you to edit, save, and organize charts. The software suite provides a visual environment within which data analysts can build scalable data-mining and predictive analytics solutions. Angoss has a strong foundation in financial services, insurance, telecom, retail, and high-tech organizations. FICO Analytic Modeler FICO Analytic Modeler is an enterprise analytic solution for predictive analytics and machine learning. It runs on FICO s recently launched FICO Analytic Cloud. FICO Analytic Modeler comes with a user-friendly graphical interface. Business users and technical users (such as data scientists) can build predictive models in a short period of time.
3 Bonus Chapter: Ten Major Predictive Analytics Vendors BC-3 FICO is used in a wide array of industries, but is best known for its application of risk management in the financial services industry. The FICO credit score (FICO Score for short) is probably the best-known use of a consumer scoring system to drive business decisions, specifically credit decisions for loan applications. Other areas where FICO is used are in fraud detection for insurance claims and in response modeling for consumer marketing. FICO Analytic Modeler currently allows you to upload comma-separated values (CSV) files to its cloud-based service for predictive modeling. For other types of data sources, you must use its parent application, Model Builder, which supports data sources such as relational databases, SAS datasets, XML, and JSON. Model Builder can also extract text from Office documents and PDFs. After the CSV data is loaded, most of the data preparation is done for you to get the training dataset ready for building a predictive model. FICO Analytic Modeler provides most of the common data-preparation functions such as handling outliers and dealing with missing values. Most of the work in creating a model has been reduced to a few clicks of a button. You just need to select your target variable and exclude the meaningless variables. The tool comes with a variable selection routine that tries to choose the optimal combination of variables to produce the best predictive model. FICO Analytic Modeler provides easy-to-read tables that represent the components of the predictive model. The tables show how each variable contributes to the model s overall strength. You can drill down further into each of the variables to see their relative weights (which depend on the value of each variable). FICO Analytic Modeler also provides easy to read graphs to evaluate the performance of the model. One such graph is a histogram that shows the separation strength between the outcome classes. IBM SPSS Modeler IBM SPSS Modeler is a predictive analytics platform that scales up from desktop installation to enterprise deployment. It s used by many industries, including (but not limited to) finance, government, manufacturing, telecommunications and retail. IBM SPSS Modeler can be used by anyone from a business user to a data scientist. Through an interactive visualization platform, you can manipulate data, create predictive models, and explore them.
4 BC-4 Predictive Analytics For Dummies IBM SPSS Modeler helps guide you through the lifecycle of predictive models. It has three editions: IBM SPSS Modeler Professional provides data preparation, predictive algorithms, and interactive visualization. It includes common and advanced predictive algorithms for classification, segmentation, and association and can be further extended with the open-source statistical language R. IBM SPSS Modeler Premium builds on the Professional Edition and includes text analytics, entity analytics, and social network analysis. IBM SPSS Modeler Premium employs linguistic technologies and Natural Language Processing to process unstructured data. Entity analytics helps you identify entities (such as people) on the basis of underlying data characteristics such as behavior. IBM SPSS Modeler Gold includes all the features of the Premium edition and adds the capability to deploy predictive models directly into the business process. To achieve this, IBM SPSS Modeler Gold uses Decision Management, which combines predictive analytics with rules, scoring, and optimization and integrates them into your organization s processes. IBM SPSS Modeler can use data from multiple sources, including text files, operational databases, and Hadoop. It supports in-memory, in-database, and Hadoop-based processing, mining, and scoring. IBM SPSS Modeler can be integrated with IBM technologies such as (but not limited to) IBM Cognos, IBM Content Analytics, and IBM Operational Decision Management. RapidMiner RapidMiner provides predictive analytics software solutions particularly suited to business managers and analysts of all levels. From its roots in open-source software, RapidMiner was a widely used machine-learning tool in academia, IT, and consulting services. It was used for teaching, training, research, and prototyping. It has since revamped itself to offer a commercial predictive analytics solution. RapidMiner is equipped with application wizards and a drag-and-drop interface that can be easily used with no prior programming knowledge.
5 Bonus Chapter: Ten Major Predictive Analytics Vendors BC-5 RapidMiner includes several methods for Data access Data transformation Data visualization It provides connectors to numerous data sources and third-party applications. It can use data from (but not limited to) MS Excel and Access files, relational databases, HDFS, SAP, SAS, and Hadoop. RapidMiner assists in the data preparation process by allowing you to filter rows and outliers, and to easily identify and remove duplicate data. The software offers a variety of visualization capabilities that include the most common charts and graphs. In addition, it offers most of the common algorithms used for predictive and descriptive modeling. RapidMiner Server provides remote analytics, web-based reporting, and enterprise collaboration. It can process data in-memory and in-database. It can handle big-data analysis through its extension to Hadoop clusters. It has an open-source heritage and as such, the core of RapidMiner remains open-source. The core is extended to address scalability and offers more data connectors to offer a full-featured product. RapidMiner markets its recently launched RapidMiner V6.0 to the life sciences, financial services, manufacturing, and telecommunications industries. Revolution R Enterprise Revolution R Enterprise enables companies to create predictive analytics for big data using the R programming language. The software is designed to overcome the in-memory and single-threaded limitations of R, and significantly improve the performance of open-source R on big data. Revolution Analytics aims to foster the R community as well as support the growing need of the R language for commercial users. It aims to bring the open-source R language to commercial applications by providing technical support and enhancing the execution and performance of R programs. The company also provides software and services that bring high performance, productivity, and ease of use to R. R is a programming language developed for statistical analysis that enjoys the majority share among languages adopted by data scientists today. Revolution R Enterprise capitalizes on the popularity of the R language, and brings enterprise solutions to commercial entities using R.
6 BC-6 Predictive Analytics For Dummies Revolution R Enterprise enables statisticians and data scientists to use what they already know and makes it possible for them to deploy their applications in operationally complex environments. Revolution R Enterprise is multithreaded and can make your existing R code run fast without limitations on data size through compilation techniques and linkages of specialized libraries. Through the implementation of parallel algorithms, Revolution R Enterprise provides fast and scalable performance when analyzing large datasets. The software supports multiple data sources, including (but not limited to) text files, SAS, SPSS, HDFS, and data residing in relational databases and Hadoop. Revolution R Enterprise provides a variety of capabilities to create predictive models, including data preparation, data visualization, and statistical analysis. Revolution R Enterprise is portable and can run inside a database or inside Hadoop. It also provides web services for deployment in other environments. As a provider of big-data analytics platforms, based on the open-source R programming language, Revolution R Enterprise is a potentially useful tool for data scientists who already know R. Some programming skills and knowledge of the R language are prerequisites for creating predictive analytics models using Revolution R Enterprise. Revolution R Enterprise has a big presence in financial services, digital media, health, and life science. Salford Systems Salford Systems SPM is an analytics and data-mining platform for creating predictive models. Salford Systems SPM comes configured with a default setting that is geared to users with beginning to average levels of experience. You can do nearly all your modeling in a single panel, which is potentially attractive to beginners. The software provides several other options and visualization tools for advanced users to explore. The software offers the most common and best-known algorithms used in predictive analytics. Some of its implementations of these algorithms are the actual original versions developed by the algorithms originators. For example, the software includes the following algorithms as they were originally written by Jerome H. Friedman:
7 Bonus Chapter: Ten Major Predictive Analytics Vendors BC-7 Classification and Regression Spline Regression Gradient Boosting Lasso Regularized Regression (Generalized PathSeeker) Random Forests (working from the original written by Leo Breiman) Salford Systems SPM supports the importation of data from a wide range of data sources, including (but not limited to) plain text, Excel, SAS and IBM/SPSS binary files, and data residing in relational databases. Salford Systems SPM is an in-memory processing software that allows building and analyzing models in very short time. SPM can be run on a laptop, or on a server. The company is currently working on a new version that supports the building and running of models on a distributed system (such as a cluster of servers). CART Decision Tree is one of the visualization tools that the software provides. Other visualization tools include representations of Random Forests and TreeNet gradient boosting. Salford Systems SPM is currently used by many companies in the banking and insurance industries. The product also has presence in other areas such as retail sales forecasting, biomedical, wildlife, and education research. SAP Predictive Analytics SAP provides various enterprise predictive analytics solutions that aim to help businesses extract more value from their data. SAP solutions can run what-if scenarios designed to find the best outcome in a relatively short time. It can help you deploy your predictive findings across applications and mobile devices. SAP Predictive Analytics consists of SAP Predictive Analysis and SAP InfiniteInsight SAP Lumira SAP HANA SAP HANA Studio
8 BC-8 Predictive Analytics For Dummies SAP HANA and SAP HANA Studio can help with data preparation and model deployment. SAP Lumira can help with data preparation, data exploration, and data discovery. SAP Predictive Analysis and SAP InfiniteInsight can help you throughout all phases of the predictive analytics model s lifecycle. SAP Predictive Analysis provides statisticians and data scientists with a productive environment for data modeling and data visualization. With dragand-drop capabilities, analysts can do data selection, preparation, and processing. You can create models by using native predictive algorithms and by employing algorithms from the statistical language R. SAP HANA supports in-memory analytics and provides native predictive algorithms for in-database processing. In addition, the tool is integrated with Hadoop for preprocessing and built-in text analysis. With no prior programming knowledge required, you can use SAP InfiniteInsight Scorer to deploy optimized scoring equations directly in-database; you can schedule model refreshes as often as needed. SAP Lumira provides transformation tools to facilitate data preparation, most notably the capability to merge datasets on the basis of common attributes. The transformations are subsequently applied to new data when the data is refreshed. Common predictive analytics algorithms are supported in SAP HANA, including K-means, K-nearest neighbor, C4.5 decision tree, multiple linear regressions, ABC classification, and weighted score tables. SAP Predictive Analytics has a big presence in retail, consumer packaged goods, finance, and manufacturing. SAS SAS provides enterprise solutions for predictive analytics that address diverse business problems, including risk management, supply chain, customer behavior models, and fraud detection. SAS helps to implement datadriven decision processes in almost every industry, with more emphasis on the financial and communications industries. For example, SAS is applied in fraud detection and prevention in the financial and the public sector, as well as for credit scoring in the financial industry. Energy and service companies apply SAS in the area of preventive maintenance, the analysis of machine-generated sensor data to predict and avoid equipment downtime. SAS provides data analytics products for data mining, text analytics, forecasting, optimization, and simulation. SAS can be used for data clustering, classification, predictions, anomaly detection, link analysis, and data-correlation detection.
9 Bonus Chapter: Ten Major Predictive Analytics Vendors BC-9 SAS encapsulates many different predictive analytics algorithms and capabilities into a workflow where you don t need to write a lot of code and can focus on solving the problem. Within the workflow, SAS allows you to run different analytic models in parallel, let them compete against each other, and then select the optimal one. The selected model can then be evaluated and deployed in operations for data-driven decision-making. SAS provides quick visualizations of raw data as well as visualizations of the analytical results. The visualizations are dynamic, and the analytical results can be visualized at every step in the analytics workflow. Data scientists can use both a visual programming interface and a programming environment to develop their predictive models, which should keep their learning curve manageable. SAS is being used not only at very large enterprises, but also at small and mid-size businesses. The software can handle many different data sources (such as blogs, databases, and log files) and various data types (structured, unstructured, streamed, and so on). SAS has been designed to confront the challenges of big data. It provides both distributed and in-memory processing and works with big data stored in a traditional RDBMS as well as in Hadoop. StatSoft STATISTICA Data Miner STATISTICA is a large-scale enterprise software application that offers a comprehensive set of capabilities for advanced data analytics, data mining, and predictive analytics. STATISTICA provides Windows server-based solutions for real-time scoring, batch scoring, and process monitoring, and it can handle complex data mining problems. There is also a Windows desktop version that can run predictive analytics on a dataset and derive insights in just a few minutes. STATISTICA can be used by all types of users, and can support both small size businesses and large enterprises. STATISTICA was designed in such a way that it can address the emerging properties of big data, including the processing of unstructured data in Hadoop or other distributed file systems. It offers a simple graphical user interface that is compatible with Microsoft based environments. It is a Microsoft Windows-compliant tool that allows you to easily perform a variety of Microsoft operations such as opening and viewing Microsoft Excel spreadsheets, accessing SQL server, and developing queries all without writing any SQL.
10 BC-10 Predictive Analytics For Dummies The tool can integrate data from other data analytics tools such as SAS, JMP, SPSS, and others. It encapsulates a wide array of methods and algorithms for data preparation including data cleaning, transformation, outlier detection, duplicate detection, and others. STATISTICA provides a comprehensive set of common predictive analytics tools that can be easily incorporated into your model in just a few clicks. It also offers a range of different visualizations for both raw data and analytical results. The tool offers a platform where the results can be shared, ed, printed or sent to other applications. Data processing can be done in-memory or in a hard-loop-based system, and there are no limits to data file size. The tool can operate in either virtual or cloud environments. STATISTICA is very easy to use. New users don t have to be trained or specialized to build predictive models. STATISTICA has been used extensively in regulated pharmaceutical manufacturing companies (to analyze such factors as stability shelf life), where it supports validation of all analytics, version control, audit logs, electronic signatures, approval processes, and other features that support governance of enterprise-wide data mining and data analytics. The product is also used at financial services (for example, for risk modeling) and insurance companies (for such purposes as fraud detection), and across many other industries. STATISTICA guides the modeler through a predictive model s development lifecycle, including such stages as data acquisition, data modeling, deployment, and post- deployment phases and managing the model. TIBCO Spotfire TIBCO Spotfire is an enterprise platform that provides powerful analytic capabilities to discover data, build models, and easily share work with business managers and stakeholders. TIBCO Spotfire enables you to present important and relevant predictive analytics to the business users. TIBCO Spotfire is fully integrated with R, S+, SAS and MATLAB. It also provides new capability using TIBCO Enterprise Runtime for R (TERR). These capabilities allow organizations to take advantage of any pre-built predictive analytics models that use these technologies or to create new ones. TERR is an enterprise analytic engine that addresses enterprise needs in terms of efficiency and performance. It s fully compatible with the R language, and was built on the basis of TIBCO s expertise with the S+ engine.
11 Bonus Chapter: Ten Major Predictive Analytics Vendors Spotfire predictive modeling tools allow you to build predictive models with simple steps that don t require any statistical programming knowledge. Analysts can use a range of convenient tools to run enhanced ad hoc analysis. The results can easily be shared with a wider variety of team members within the organization. The interface is intuitive and not overcrowded. It uses multiple tabs for different functionalities, encourages interactions, and offers advanced visualization capability. The application has an interactive visual platform that provides great filtering capability. Using a few clicks to filter the available parameters, you can go through complex what-if scenarios and ad-hoc analyses in a matter of seconds. The outcome of this filtering capability for a given scenario is immediately displayed to you visually. TIBCO Spotfire excels at in-memory processing and is equipped with eventdriven analytics (which makes it an attractive solution for streamed data), and real-time analytics. It s currently widely used in such industries as life sciences, energy, and financial services. Through its partners products, TIBCO Spotfire can offer in-database analytics and text analytics. These partnerships allow TIBCO Spotfire to simultaneously handle big data and unstructured data. TIBCO Spotfire empowers you to create actionable decisions through Easy data discovery and model building Interactive and intuitive visualization Ease of application sharing across the organization BC-11
12 BC-12 Predictive Analytics For Dummies