IBM SPSS Modeler Server performance and optimization

Size: px
Start display at page:

Download "IBM SPSS Modeler Server performance and optimization"

Transcription

1 IBM Software Business Analytics SPSS Modeler Server IBM SPSS Modeler Server performance and optimization Improve performance and scalability in high-volume environments

2 2 IBM SPSS Modeler Server performance and optimization Contents 2 Introduction 3 Performance and scalability 19 Optimizing performance 27 Advanced performance optimization 29 Scoping and sizing SPSS Modeler Server 30 Conclusion Introduction Predictive analytics offers organizations the ability to add predictive intelligence to the process of making decisions. Predictive intelligence improves the decisions made by individuals, groups, systems and organizations in multiple business areas such as customer analytics, operational analytics and proactive risk and fraud mitigation. Data mining is at the core of predictive analytics because it helps organizations understand the patterns in their data. As a result, organizations can make the smart decisions that drive superior outcomes. IBM SPSS Modeler is a data mining workbench that enables improved decision-making with quick development of predictive models and quick deployment of these models into business operations. SPSS Modeler: Works in a variety of operating environments Can scale from a single desktop to an enterprise-wide deployment Supports virtually any data source (including Hadoop when used with IBM SPSS Analytics Server) Provides the ability to incorporate structured and unstructured data. It is available in three editions: IBM SPSS Modeler Professional uncovers hidden patterns in structured data with advanced algorithms, data manipulation and automated modeling and preparation techniques. IBM SPSS Modeler Premium adds the ability to use natural language processing and sentiment analysis on text data as part of a predictive analytics project. Entity analytics disambiguates identities, and social network analysis identifies influencers in social networks. IBM SPSS Modeler Gold includes the full range of predictive capabilities for structured and unstructured data. Users can combine, optimize and deploy predictive models and business rules to an organization s processes and operational systems to provide recommended actions at the point of impact. As a result, people and systems can make the right decision every time.

3 IBM Software 3 All editions of SPSS Modeler use a client/server architecture. The client provides the visual workbench for predictive analytics. The server adds increased performance and efficiency, along with features that support additional scale. IBM SPSS Modeler Server is designed to improve performance by minimizing the need to move data in the client environment and by pushing memory-intensive operations such as scoring and data preparation to the server. SPSS Modeler Server also provides support for SQL push-back and in-database modeling capabilities so users can take better advantage of their existing infrastructure and data warehouse and further improve overall performance. This paper highlights the capabilities and possibilities of SPSS Modeler Server, and it serves as a guide to understanding and maximizing SPSS Modeler Server performance. Initial sections provide performance benchmarking results for IBM SPSS Modeler Professional and IBM SPSS Modeler Premium rather than the performance of models post-deployment. Subsequent sections describe performance optimization and sizing recommendations. Many of the results provided in this document address SPSS Modeler Server performance as it relates to issues of scalability. By utilizing options only available with the server (such as SQL pushback/generation, in-database mining, scoring adapters and more), users are able to fully exploit the client/server architecture to improve performance and deliver a quicker return on IT investment. Performance and scalability SPSS Modeler Server has been designed and developed to provide high performance and scalability for all data mining tasks. For example, SQL generation and parallel processing are automatic. As a result, SPSS Modeler users do not need to make any changes to the way they work to get consistently high performance. To benchmark performance, IBM measured the ability of SPSS Modeler Server to carry out the common tasks of data preparation, model building and model scoring. IBM used a variety of operating environments and altered the size of the data files. Data mining involves more than simply model building and model scoring. Data preparation is a major component of the process. So, IBM s tests also evaluated the performance of common steps such as reading, sorting, aggregating, merging and writing data. Reading and writing data Times have been recorded for reading the data sets in SPSS Modeler with the stream shown in Figure 1. The Sample node in these test streams means that IBM was able to record only the time taken to read the data in (no data write time was added).

4 4 IBM SPSS Modeler Server performance and optimization Figure 1: Test stream to read the data into SPSS Modeler To obtain the results for writing to the various formats, the stream in Figure 2 was used. To improve performance, SPSS Modeler executes the reading and writing operations at the same time; the data writing operation starts before all the data has been read in. Therefore, when measuring how fast SPSS Modeler writes data, the time includes both reading and writing. For the most frequently used data formats (CSV, Database Table, SPSS Statistics files [.sav]), a million records is read in less than 30 seconds and written back to the source in less than a minute. Figure 2: Test stream to write from SPSS Modeler to the file formats tested during benchmarking. Benchmarking results show that, as the number of records in a dataset increase, so does the processing time for reading and writing (Figure 3). Overall, performance is slower for XML and Excel when compared to other formats tested (CSV, database table and SPSS.sav). For CSV, database and SPSS. sav, read performance time doubles when the number of records is increased from 100,000 to 1 million. The processing time for those formats remains below 25 seconds at 1 million records.

5 IBM Software 5 Figure 3: Execution of data read from along with read from and written to is shown for various data formats. Results for CSV, SPSS.sav and DB2 tables are plotted by seconds on the left axis and those for XML and Excel by minutes on the right axis to better highlight the differences and similarities in performance between data sources. Results also show that performance is consistent for various environments for reading and writing datasets in all the various file formats.

6 6 IBM SPSS Modeler Server performance and optimization Sorting data The sorting test involved sorting the data sets by a single column with the stream shown in Figure 4. For a more realistic reflection of a customer scenario, the times include how long it took to read the data and the time taken to sort. Figure 4: Test stream for sorting data with SPSS Modeler. Figure 5 shows that the sorting performance of SPSS Modeler Server is linear as the number of records sorted is increased throughout increasingly powerful operating environments. The test results also show that the use of SQL pushback functionality provides a significant increase in performance for the sorting operation. By enabling SQL pushback in a stream, the SQL instructions are pushed back and executed in the database itself. This means that performance depends on the operating environment rather than on SPSS Modeler. Data aggregation A 5 million record data set was used to test aggregating with SPSS Modeler. The number of unique values that appeared in the designated field was scaled with the stream in Figure 6. For the test results to reflect how it would be used in an operating environment, the times measured for the SQL pushback also include the time it takes to read in the data. Figure 5: Sorting data with SPSS Modeler. SQL pushback improves the performance when a database is used. The CSV file was stored locally on the server system and the IBM DB2 database was running on a remote system.

7 IBM Software 7 The test results show that the aggregation operation scales well as the number of unique categories to be aggregated increases. Figure 7 highlights the performance times. Note the dramatic improvement in the SQL pushback functionality when a database table is used as the source data for aggregation as compared to a CSV file. The process was complete in almost half the time of the CSV file. This result was consistent in all the operating environments used for testing. Figure 6: Test stream to aggregate data with SPSS Modeler Merging data To check the performance of the merge operation, the stream in Figure 8 was used. Times were recorded as the size of the data sets increased. An inner join used a unique ID column, which means that the merge was one-to-one. Every record in the first data set had only one match in the second data set. Figure 7: Aggregating data with SPSS Modeler. Using SQL pushback within DB2 was almost twice as fast as a CSV file.

8 8 IBM SPSS Modeler Server performance and optimization Figure 8: Test stream to merge data with SPSS Modeler. The test results show that the merge operation scales relatively linearly in relation to the number of records being merged (Figure 9). Yet again, the improvement in time with the use of a database and SQL pushback is evident. With SQL pushback, the merge has already taken place before the data is read out of the database and then brought into SPSS Modeler. The increase in performance is most notable at scale. The time recorded in these results includes the time that was taken for the data to be read into SPSS Modeler. Text analytics The ability to structure text is an important capability of IBM SPSS Modeler Premium. Including concepts derived from text increases modeling accuracy. For example, when predicting customer purchase propensity for a product, customer attitudes and preferences are often derived from surveys, call center notes and social media to augment behavioral and demographic data. Building a text model provides a way to apply a structure to new text based on the analysis done on historical or existing text. The text analytics capabilities of IBM SPSS Modeler can use a variety of data sources. For testing purposes, however, IBM used . The testing for text analytics performance used the following input data: Approximately 500,000 s Average number of words per Average number of characters per Figure 9: Merging data with SPSS Modeler. With SQL pushback, the merge has already taken place before the data is read out of the database into SPSS Modeler.

9 IBM Software 9 Text analytics model building Tests were run to measure the performance of building a non-interactive (automatically derived) Text Mining concept model from Basic Resources and Opinions with the stream in Figure 10. Text analytics model scoring After concepts are extracted, SPSS Modeler creates a text model that can be used in predictive streams. Scoring against the text model means that new text is categorized with the patterns established during the model building process. IBM s tests assessed the speed of scoring new records against an existing model with the stream in Figure 12. Figure 10: Test stream for text analytics model building The tests showed that, after initial training (which uses more overhead), performance accelerates (Figure 11). Figure 12: Test stream for model scoring Figure 11: Initially, training time uses more overhead. After the training is complete, performance accelerates.

10 10 IBM SPSS Modeler Server performance and optimization The test results showed that scoring performance is linear in relation to the number of records (Figure 13). Cube Complexity Level (1 = simple, 5= complex) Number of fields when data is viewed in SPSS Modeler Number of Dimensions Cube_ Cube_ Cube_ Cube_ Cube_ Number of measures Figure 13: Scoring performance is linear in relation to the number of records. TM1 Integration SPSS Modeler supports data imports from and data exports to IBM Cognos TM1. These operations are controlled by Cognos TM1 process scripts in the Cognos TM1 server. When an SPSS Modeler TM1 import or export operation is executed, SPSS Modeler runs these process scripts first (alongside any native SPSS Modeler processing that is required). TM1 tests: Cube complexity levels The cube complexity levels defined in the tests are based on test cubes created by the Cognos TM1 team. The objective was to best represent the different levels of complexity that a Cognos TM1 user might have in a cube. The following table shows how the complexity levels are defined. TM1 import TM1 Import works by passing a view from Cognos TM1 to SPSS Modeler for additional analysis (Figure 14). To achieve the best performance, users are encouraged to define Cognos TM1 views that are as specific as possible to reduce the overhead of moving large data files between Cognos TM1 and SPSS Modeler Server systems.

11 IBM Software 11 read 1,000 records into SPSS Modeler, the size of the dataset passed over the network from Cognos TM1 to SPSS Modeler Server is actually 10,000 records. SPSS Modeler Server processes the full set and then filters this 10,000 record data set to the 1,000 records required for display. The largest dataset tested has 1 million records, and the size of the dataset passed over the network from Cognos TM1 to SPSS Modeler (before filtering) is 10 million records. Figure 14: Cognos TM1 import that indicates that no records are output. To represent a real user scenario, the cubes for the Cognos TM1 import test contained more information (a factor of 10 in relation to number of records) in the view than would be required based on the settings of the Cognos TM1 Import node. For example, when you import the simple cube 1 and Tests were run by importing data from TM1 with scaling by both cube complexity and cube size (number of records). The graphs in Figure 15 show how the SPSS Modeler and Cognos TM1 integration scales linearly in relation to both aspects.

12 12 IBM SPSS Modeler Server performance and optimization Figure 15: Cognos TM1 import test results. The integration scales linearly in relation to both aspects.

13 IBM Software 13 TM1 export Tests were run by exporting data from Cognos TM1 (Figure 16) and scaling by both cube size (number of records) and cube complexity. Figure 16: Export to Cognos TM1 stream Figure 18: Cognos TM1 export test results for cube complexity. The graphs in Figure 17 and Figure 18 show that the SPSS Modeler and Cognos TM1 integration scales linearly for both scaling aspects. Figure 17: Cognos TM1 export test results for number of cubes

14 14 IBM SPSS Modeler Server performance and optimization Model building Figure 19 shows a stream that was used to test the modelbuilding execution in SPSS Modeler. The test used datasets with 100,000 records, 500,000 records and 1 million records. Results show that performance is linear related to the size of data. Results are shown by model type to ease analysis and are grouped as follows: Classification models Segmentation models Association models Automated models Figure 19: Test stream used for model building. In this case, it was a C5.0 model. Classification models use the values of one or more input fields to predict the value of one or more output or target fields (for example, logistic regression or a decision tree). Neural Net had the slowest performance because of the sophistication and learning that the technique requires. However, all of the techniques built models in less than 3 minutes for a set of 1 million records (Figure 20). Segmentation models divide the data into segments or clusters of records that have similar patterns or characteristics, such as KMeans clustering. They can also identify patterns that are unusual, or anomaly detection. The KNN technique is included with this set, although it is typically used for classification. KNN classifies cases based on similarity to other cases nearby, which mirrors the computations that are done by classical segmentation models. Because more computation is used, its performance lags behind that of the other techniques shown. Anomaly, Kmeans and Two Step were the quickest. Two Step completed within 1 minute for 1 million records. Figure 21 shows the results. Figure 20: Model building times for classification models. Most of the models completed in less than 30 seconds for 250,000 records, within 1 minute for 500,000 records and all completed in less than 3 minutes for 1 million records.

15 IBM Software 15 Figure 21: Model building times for segmentation models in SPSS Modeler. The Two Step, KMeans and Anomaly were the quickest of the five models. Association models are used to find patterns in data where one or more entities (such as events, purchases or attributes) are associated with one or more other entities. The models construct rule sets that define these relationships. For example, these techniques are used for Market Basket Analysis, which models the next likely purchase for a customer based on their previous purchases and identifies products that are typically bought together or at a certain sequence. Figure 22 shows that both the Carma and Apriori models were built in less than 30 seconds for a dataset with 1 million records.

16 16 IBM SPSS Modeler Server performance and optimization Figure 22: Model building times for association models in SPSS Modeler. At 1,000,000 records, Apriori completed in 24 seconds and Carma completed in less than 17 seconds. The automated models (Auto Classifier, Auto Cluster and Auto Numeric) estimate, compare and combine multiple modeling techniques in a single run. Automated models eliminate the need for users to sequentially test multiple techniques individually. They are designed to make modeling easier for those users unfamiliar with all of the underlying algorithms that IBM SPSS Modeler supports. Although ALM (Automated Linear Modeling) does not use multiple algorithms to build a model, it does have an automated data preparation step that transforms the target and predictor variable automatically to maximize the predictive power of the model it creates. Figure 23 shows that the performance of the automated techniques is directly proportional to the size of the dataset. All complete within 8 minutes for 500,000 records and within 15 minutes for 1 million records. ALM completes within 2 minutes for 1 million records, which reflects the speed of the automatic data preparation.

17 IBM Software 17 Figure 23: Model building times for the automated models in SPSS Modeler. Model scoring Scoring is defined as applying a created model to new data. This process generates new data, which is typically a prediction (score). Multiple fields are typically calculated and appended to the records. Scoring can occur in batch or in real-time. Batch scoring is done as an event. For example, you can score customers each month whose contract is up for renewal against a model that calculates whether and how likely they are to cancel. An example of time scoring in real time is calculating and providing a likelihood of fraud score to an agent recording an insurance claim as the agent gathers data. Scoring in real-time is provided in SPSS Modeler Gold and is used by organizations that are integrating predictive intelligence into operational systems. IBM s test recorded the results for batch scoring that used a data set with 10,000 rows and 20 columns. The resulting model was then used in a stream (Figure 24) and files of various sizes were then scored. Figure 24: Test stream used for model scoring. In this case, it was a C5.0 model. The results showed that, as the number of records being scored increased, the performance of many models increased to a point and then remained constant. This increase is related to the fact that there is an initial fixed overhead related to the scoring process. This overhead is not related to the number of rows scored, rather a one-off cost. Therefore, the one-off cost becomes less important as the number of rows to be scored increases.

18 18 IBM SPSS Modeler Server performance and optimization Figure 25: Model scoring times for classification Figure 26: Model scoring times for segmentation and association models

19 IBM Software 19 Figure 27: Model building times for the automated models Figures 25, 26 and 27 illustrate the performance of the scoring models for classification models (Figure 25), segmentation and association models (Figure 26) and automated models (Figure 27). Note that the charts show the scores per second rather than elapsed time to allow for better comparisons between the test cases and your actual data scoring requirement. Optimizing performance SPSS Modeler Server achieves most of its high performance with optimizations that are running by default. However, at times analysts and data miners will need more control over the optimization of their SPSS Modeler streams. SPSS Modeler Server supports this by providing immediate feedback upon execution.

20 20 IBM SPSS Modeler Server performance and optimization In Figure 28, the SPSS Modeler stream is executed with SQL generation, and the nodes turn purple rather than the usual white. Purple nodes indicate that the operations they represent have been translated into SQL and executed in database. This feedback helps ensure that as much of the stream as possible is executed in the database. Additional options enable the user to examine the SQL that is generated. R model building IBM s tests measured model building performance for the R Linear model [lm()] algorithm, and they were run by scaling the number of records used for model building (Figure 29). Three data points were used: 250,000 records, 500,000 records and 1 million records. The test data used was 20 fields wide (1 target field, 19 model input fields). Model building times were recorded with the stream setup in Figure 14. The R syntax used in the Modeler R model building node is: modelermodel <- lm(modelerdata$offer_ PROFIT_1~.,modelerData) Figure 28: SQL generation and highlighting in a SPSS Modeler stream. The nodes have turned purple to indicate those nodes have been translated into SQL and executed in-database. R integration With SPSS Modeler, users can execute R syntax from SPSS Modeler R nodes. For performance testing, the focus was on the R model building and R model scoring operations. R model building can be run natively in SPSS Modeler Server. The R syntax is parsed by SPSS Modeler and sent to the R program to process. R model scoring can either be run natively in SPSS Modeler Server (with the same technique used for model building) or with the R in-database scoring functionality. For the R in-database scoring, R is present in the database to take advantage of the fact that processing can be done in the same database system that is storing the data. By reducing data movement, performance is improved. Using R in-database techniques for R model scoring is significantly faster than native R scoring in SPSS Modeler Server. For the performance tests, the focus was on running R in-database scoring with an IBM Netezza database. Figure 29: The stream setup used to test R model building R model building scales linearly in SPSS Modeler (Figure 30). An R model build operation is approximately at the lower end of performance relative to other native SPSS Modeler model building operations. However, even with the additional overhead, performance is within 5 minutes for 1 million records.

21 IBM Software 21 Figure 30: Processing time is within 5 minutes for 1 million records R model scoring IBM ran tests to measure the performance for scoring the R Linear model [lm()] algorithm. The model used for model scoring was built against a dataset with 10,000 rows x 20 columns. The model scoring was batch scoring. Model building times were recorded with the stream setup in Figure 31. The syntax used in the R model scoring node is: result <- predict(modelermodel,newdata=modelerdata) var1 <-c(fieldname= predicted, fieldlabel=,fieldstorage= real,fieldformat=,fieldmeasure=,fieldrole= ) Figure 31: Stream used to test R model scoring The test scenario represents a customer who has their data stored in a database and wishes to score their data with an R model and write the results of the scoring operation back to the database. modelerdatamodel<-data. frame(modelerdatamodel,var1) modelerdata <- cbind(modelerdata, result)

22 22 IBM SPSS Modeler Server performance and optimization Figure 32: R in-database scoring performance results The test results show (Figure 32) that R model scoring performance can be significantly increased by using the R in-database scoring techniques available in SPSS Modeler. This performance increase mainly relates to the R in-database function that enables a reduction in the data transfer operations between the database and SPSS Modeler because data does not need to be transferred out of the database.

23 IBM Software 23 SQL pushback to improve model scoring times Certain models in SPSS Modeler Server have functions that enable the SQL to be generated, pushing back the model scoring stage to the database itself. For modeling streams that use these models, the full SQL of the scoring procedure is pushed back to the database as SQL. The models with these functions are: C5.0 C&R Tree (CART) CHAID Quest Decision List Logistic Regression Neural Net PCA Linear Regression Figure 33 shows that the model scores per second metric increases dramatically by enabling SQL pushback. Most models improve performance by about 10 times. Because the SQL generated by Logistic Regression and Neural Net is exceedingly complex, those models do not show the kind of improvement that others do. In-database algorithm support to reduce data movement Many organizations have invested heavily in a database infrastructure for predictive analytics and business intelligence systems, but these systems are often under-utilized. One of the key benefits of SPSS Modeler Server is that it enables organizations to fully utilize their investments in highperformance database systems. With SPSS Modeler Server, organizations can take advantage of algorithms that are native to the database environment along with the many additional data preparation and modeling procedures. These algorithms and procedures are native to SPSS Modeler. Database-native algorithms are often tuned to perform better on the underlying database, and users often see performance improvements from those algorithms. The following algorithms are available in SPSS Modeler for use with the respective database: IBM InfoSphere algorithms: Decision Trees, Association Rules, Demographic Clustering, Kohonen Clustering, Sequence Rules, Transform Regression, Linear Regression, Polynomial Regression, Naive Bayes, Logistic Regression, Time Series Netezza algorithms: Decision Trees, K-Means, Bayes Net, Naive Bayes, KNN, Divisive Clustering, PCA, Regression Tree, Linear Regression, Time Series, Generalized Linear Microsoft SQL Server algorithms: Decision Trees, Clustering, Association Rules, Naive Bayes, Linear Regression, Neural Network, Logistic Regression, Time Series, Sequence, Clustering Oracle algorithms: Naive Bayes, Adaptive Bayes, Support Vector Machine (SVM), Generalized Linear Models (GLM), Decision Tree, O-Cluster, k-means, Nonnegative Matrix Factorization (NMF), Apriori, Minimum Descriptor Length (MDL), Attribute Importance (AI) Figure 33: Model scores per second with and without SQL pushback. SQL pushback is a feature only available with IBM SPSS Modeler Server.

24 24 IBM SPSS Modeler Server performance and optimization SQL pushback In-database algorithm support Scoring Adapter provided Read-Write (no SQL pushback) Read only (no SQL pushback) IBM DB2 Enterprise Server Edition X X X IBM DB2 for i (formerly i5/os) X IBM DB2 for z/os X X IBM Informix IBM Infosphere Classic Federation Server for z/os IBM Netezza Data Warehouse X X X Greenplum Database X Microsoft SQL Server X X MySQL Oracle Database X X Salesforce.com SAP Hana X SAP Sybase IQ X Teradata X X X X X X SPSS Modeler Scoring Adapters SPSS Modeler Scoring Adapters can expand the scope of in-database scoring beyond those models supported by SQL pushback. The Scoring Adapters can be applied to a much wider set of models, which provides more deployment flexibility to those who work with large data sets stored in enterprise warehouses. The Scoring Adapters enable a user to install a set of UDFs that execute the model scoring operation in the native database and eliminate the need to move data from the database. IBM provides Scoring Adapters for Netezza, Teradata, IBM DB2 for Linux, UNIX and Windows, DB2 for AIX and DB2 for z/os. IBM SPSS Scoring Adapters for Netezza, Teradata and DB2 (IBM AIX and Linux only) also support scoring text analytics models in the database. The support of these models is an important feature because SQL pushback is not available for text models. Figure 34 demonstrates that Scoring Adapters can improve the performance of text models. This improvement is especially true for concept models. Figure 34: Scoring performance for text analytics concept model with native SPSS Modeler Server and SPSS Modeler Scoring Adapters.

25 IBM Software 25 Intelligent SQL generation within stream execution to improve performance SPSS Modeler Server intelligently reorders operations in the SPSS Modeler stream to maximize performance without altering results. Analysts or data miners can organize streams in a way that makes sense to them, while SPSS Modeler Server reorganizes those operations in a way that makes sense to the database. Figure 35 shows a Derive node that has an operation that cannot be carried out in the database, whereas the Select node can be pushed back to the database as indicated by its purple color. Figure 35: SPSS Modeler Server optimizes the process so that the Select operation is performed before the Derive operation, which reduces data transfer and improves performance. In-database caching IBM SPSS Modeler Server supports the ability for a user to indicate caching on a given node. Caching prevents the reading of data that has not changed. When data passes through the node the first time, the cache is filled with data. On subsequent runs, data would be read from the cache rather than the data source. Applying caching selectively means data that is changing is read at run time but data that is consistent between runs should not be read multiple times. This caching can be a useful way to ensure that memory-intensive data processing is only executed once. Normally, the cache is stored as a temporary file on the file system, but SPSS Modeler Server can also cache this data into a temporary table in the database. It can then be accessed through the many SQL optimization options available in SPSS Modeler Server and can result in even more significant performance gains. Automatically generating SQL for all nodes that are attached to the cache can improve performance even further. In Figure 36, the Merge operation is highlighted, indicating that the operation is being executed in the database from the filled database cache. Figure 36: Setting a cache on a node that is likely to be re-executed will store the data in a temporary table on the database (where possible), enabling further in-database operations from that node on.

26 26 IBM SPSS Modeler Server performance and optimization Figure 37: SPSS Modeler PSM functionality can take advantage of a system s additional CPUs to use parallelism to increase the performance of the Model building functionality. Optimizing for very large data sets SPSS Modeler Server features options associated with the selected models, which enables users to specify that they are working with VLDs, which are referred to as PSM options during benchmark testing. VLDs divide the data into smaller data sets and build one model on each data set. The most accurate models are then automatically selected and assembled to create a single final model nugget. IBM tests focused on the scalability of the VLD options and compared them with a Neural Net model. These tests demonstrated considerable time savings when working with VLD and using SPSS Modeler Server VLD options on multi-processor machines. Tests were run on building the ALM, CART and Neural Net models and used three data sets: 1 million records, 5 million records and 10 million records. A high specification Windows Server system (16 CPUs, ~130GB RAM) was used. Test results show how the Modeler PSM functionality can take advantage of a system s additional CPUs to use parallelism to increase the performance of the model building functionality (Figure 37).

27 IBM Software 27 Advanced performance optimization SPSS Modeler and SPSS Modeler Server provide a number of additional advanced capabilities that enable data miners to optimize the performance of their streams. Database bulk-loading to relieve bottlenecking Data movement is often a bottleneck in performance especially when writing to a database. SPSS Modeler provides a number of features to optimize this process for large data volumes. By default, writing to a database occurs row by row. This prevents errors and provides data security but slows performance. Enabling SPSS Modeler to commit multiple rows at a time is a good way to gain more reasonable performance, and this option is available by default. In addition to the batch committal of records, SPSS Modeler supports two types of bulk loading. One is provided through ODBC bulk loading facilities and the other uses an external bulk loading tool for a database-native solution (Figure 38). Figure 38: The DB Export: Advanced Options dialog box easily enables bulk loading to the database with ODBC or an external loader. External bulk loading scripts are provided for IBM DB2, IBM Intelligent Miner for Data, IBM Netezza Performance Server, IBM Redbrick Warehouse, Microsoft SQL Server, Oracle Data Miner and Teradata Warehouse databases. These scripts can be customized and custom scripts may be written for other databases.

28 28 IBM SPSS Modeler Server performance and optimization Database indexing Indexing database tables maintains the performance of in-database options. Correct indexing significantly affects many subsequent database operations. SPSS Modeler Server enables users to create indexes on tables that are exported from SPSS Modeler (Figure 39). Simple indexes can be created fairly easily. Users can also customize the SQL statement used to create the index (for instance, to create a BITMAP, UNIQUE, or FILLFACTOR index). Optimized joins and sorts By default, SPSS Modeler operates on certain assumptions about the state of data in the system. For example, SPSS Modeler cannot operate on the assumption that any data has already been sorted. Therefore, many operations sort data, even if such a sort is redundant. SPSS Modeler enables a user to optimize a sort or join operation by specifying any existing sorts on the data. This eliminates redundancy and improves performance. Users can also optimize the performance of SPSS Modeler with special case algorithms for joins. SPSS Modeler s default join algorithm is designed for optimized performance when joining data sets of similar size. In some very common operations, such as using a join to connect an ID in one table to a label or description from another (for example, joining a product code in a table of transactions to a product name in a look-up table), the default join is inefficient. SPSS Modeler offers an alternate join algorithm for these situations, which significantly boosts performance speed. Figure 39: Create indexes on database tables from within IBM SPSS Modeler Server to improve database performance.

29 IBM Software 29 Parallel processing to improve performance Symmetric Multi-Processor (SMP) machines are widely used and available for all platforms supported by SPSS Modeler Server. They consist of multiple CPUs that share access to the same memory, disk, network and other input and output resources. When a multi-threaded application runs on an SMP machine, threads can be distributed over the CPUs and execute truly in parallel. Application processes and individual threads can usually migrate dynamically between CPUs to balance processor load. This process is generally handled transparently by the operating system. SPSS Modeler uses a parallel data sorting algorithm to improve the performance of a number of data processing operations. Sorting is used by many SPSS Modeler operations including binning, model evaluation, merge and the sort operation itself. All of these operations benefit from the parallelization of the sort operation. The parallelized sort algorithm uses a technique called record parallelism. This technique assigns records in a round-robin to separate sorting processes. Each process sorts its own subset of records and the results are joined. Sort times can be reduced by more than 30 percent when running on multi-processor hardware and at high data volumes. Scoping and sizing SPSS Modeler Server Many factors must be considered when scoping hardware requirements for an SPSS Modeler Server installation. The breadth of operations and differences in data volumes make it difficult to estimate performance for any specific hardware configuration. Impact of CPUs on performance Obviously the core speed of any individual CPU will affect data mining performance. Almost all data mining operations, especially modeling, are depend heavily on processors, so an increase in CPU speed will produce a proportional increase in performance for many SPSS Modeler processes. The main benefits of multiple CPUs (or multicore CPUs) occur when running multiple streams. Therefore, the number of users will often be the deciding factor in determining the optimum number of CPUs. Multiple CPUs will also benefit parallelized operations, but the main benefits will be from supporting multiple users as shown in the following table. Number of users Number of CPUs For a production server that is running scheduled data mining via IBM SPSS Collaboration and Deployment Services, the number of CPUs should be determined by the number of separate processes that must run simultaneously. Maximum performance can be achieved, for example, by splitting a model scoring process over multiple CPUs or building multiple models simultaneously.

30 30 IBM SPSS Modeler Server performance and optimization Impact of disk space on performance Before addressing disk space requirements, you must understand the volume of data that is likely to be used for the actual data mining. Most organizations store many terabytes of data, especially transactional data, but this amount will rarely be used. Normally the data is aggregated, selected or sampled before it is used for analysis. While large data volumes are typically used in model scoring, the model scoring processes usually rely on operations that do not use a lot of system resources. Disk usage for data processing steps can be relatively high when you are trying to maximize performance. The user often caches data to minimize execution times, and some operations will spill to disk when physical memory is unavailable. In addition, some operations can produce a dataset larger than the raw input data, further increasing disk requirements. Given that the large data preparation steps are typically executed infrequently (it is best practice to store the results of such processing as intermediate files or tables), a conservative rule of thumb is to reserve between 3-5 times the disk space required to store the original data. Conclusion The ever-growing amount of data (size and variety) created by organizations presents opportunities and challenges for data mining. SPSS Modeler enables users to use the full range of available data (structured and unstructured) to build and deploy powerful predictive models. SPSS Modeler combines high performance, scalability, performance optimization and flexible hardware requirements to handle large and complex data mining projects easily. With the features of IBM SPSS Modeler Server, your organization can: Make the most of high performance data mining and database investments and minimize data transfer costs. Optimize the use of multiple CPUs (or multi-core CPUs) in your operating environment by using parallel processing during a number of data preparation and model-building operations. Use in-database caching, database write-back with indexing and optimized merging to join tables outside of the database. Incorporate data mining algorithms from other database vendors. The end result is that your organization can use SPSS Modeler and SPSS Modeler server to analyze larger volumes of data more efficiently and better integrate predictive analytics into your business processes. As a result, you shorten the time needed to turn your data into better business decisions that boost ROI.

31 IBM Software 31 About IBM Business Analytics IBM Business Analytics software delivers data-driven insights that help organizations work smarter and outperform their peers. This comprehensive portfolio includes solutions for business intelligence, predictive analytics and decision management, performance management and risk management. Business Analytics solutions enable companies to identify and visualize trends and patterns in such areas as customer analytics that can have a profound effect on business performance. They can compare scenarios; anticipate potential threats and opportunities; better plan, budget and forecast resources; balance risks against expected returns and work to meet regulatory requirements. By making analytics widely available, organizations can align tactical and strategic decision making to achieve business goals. For more information, see ibm.com/ business-analytics. Request a call To request a call or to ask a question, go to ibm.com/businessanalytics/contactus. An IBM representative will respond to your inquiry within two business days.

32 Copyright IBM Corporation 2014 IBM Corporation Software Group Route 100 Somers, NY Produced in the United States of America April 2014 IBM, the IBM logo, ibm.com, AIX, Cognos, DB2, InfoSphere, Intelligent MinerSPSS, TM1, and z/os are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at Netezza is a registered trademark of IBM International Group B.V., an IBM Company. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. It is the user s responsibility to evaluate and verify the operation of any other products or programs with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANT- ABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Please Recycle YTW03026-USEN-03

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Expand

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

IBM SPSS Modeler 15 In-Database Mining Guide

IBM SPSS Modeler 15 In-Database Mining Guide IBM SPSS Modeler 15 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 217. This edition applies to IBM SPSS Modeler

More information

Solve your toughest challenges with data mining

Solve your toughest challenges with data mining IBM Software Business Analytics IBM SPSS Modeler Solve your toughest challenges with data mining Use predictive intelligence to make good decisions faster 2 Solve your toughest challenges with data mining

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

IBM SPSS Modeler Premium

IBM SPSS Modeler Premium IBM SPSS Modeler Premium Improve model accuracy with structured and unstructured data, entity analytics and social network analysis Highlights Solve business problems faster with analytical techniques

More information

IBM SPSS Direct Marketing

IBM SPSS Direct Marketing IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in

More information

The IBM Cognos Platform

The IBM Cognos Platform The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent

More information

The power of IBM SPSS Statistics and R together

The power of IBM SPSS Statistics and R together IBM Software Business Analytics SPSS Statistics The power of IBM SPSS Statistics and R together 2 Business Analytics Contents 2 Executive summary 2 Why integrate SPSS Statistics and R? 4 Integrating R

More information

Solve your toughest challenges with data mining

Solve your toughest challenges with data mining IBM Software IBM SPSS Modeler Solve your toughest challenges with data mining Use predictive intelligence to make good decisions faster Solve your toughest challenges with data mining Imagine if you could

More information

Predictive analytics with System z

Predictive analytics with System z Predictive analytics with System z Faster, broader, more cost effective access to critical insights Highlights Optimizes high-velocity decisions that can consistently generate real business results Integrates

More information

IBM SPSS Modeler 14.2 In-Database Mining Guide

IBM SPSS Modeler 14.2 In-Database Mining Guide IBM SPSS Modeler 14.2 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 197. This edition applies to IBM SPSS Modeler

More information

IBM Cognos Analysis for Microsoft Excel

IBM Cognos Analysis for Microsoft Excel IBM Cognos Analysis for Microsoft Excel Explore and analyze data in a familiar spreadsheet format Highlights Explore and analyze data drawn from IBM Cognos TM1 models and IBM Cognos Business Intelligence

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Making confident decisions with the full spectrum of analysis capabilities

Making confident decisions with the full spectrum of analysis capabilities IBM Software Business Analytics Analysis Making confident decisions with the full spectrum of analysis capabilities Making confident decisions with the full spectrum of analysis capabilities Contents 2

More information

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for

More information

Improve Results with High- Performance Data Mining

Improve Results with High- Performance Data Mining Clementine 10.0 Specifications Improve Results with High- Performance Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events. With

More information

IBM Analytical Decision Management

IBM Analytical Decision Management IBM Analytical Decision Management Deliver better outcomes in real time, every time Highlights Organizations of all types can maximize outcomes with IBM Analytical Decision Management, which enables you

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

Scorecarding with IBM Cognos TM1

Scorecarding with IBM Cognos TM1 Scorecarding with IBM Elevating the role of metrics in high-participation planning Highlights Link high-par ticipation planning, budgeting and forecasting processes to actual performance results. Model

More information

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management : Powerful and scalable business intelligence and performance management Highlights Arm every user with the analytics they need to act Support the way that users want to work with their analytics Meet

More information

IBM Cognos TM1 on Cloud Solution scalability with rapid time to value

IBM Cognos TM1 on Cloud Solution scalability with rapid time to value IBM Solution scalability with rapid time to value Cloud-based deployment for full performance management functionality Highlights Reduced IT overhead and increased utilization rates with less hardware.

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Improve Model Accuracy with Unstructured Data

Improve Model Accuracy with Unstructured Data IBM SPSS Modeler Premium Improve Model Accuracy with Unstructured Data Highlights Easily access, prepare and integrate structured data and text, Web and survey data Support the entire data mining process

More information

IBM Cognos Insight. Independently explore, visualize, model and share insights without IT assistance. Highlights. IBM Software Business Analytics

IBM Cognos Insight. Independently explore, visualize, model and share insights without IT assistance. Highlights. IBM Software Business Analytics Independently explore, visualize, model and share insights without IT assistance Highlights Explore, analyze, visualize and share your insights independently, without relying on IT for assistance. Work

More information

Predictive Analytics for Donor Management

Predictive Analytics for Donor Management IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management

More information

Making critical connections: predictive analytics in government

Making critical connections: predictive analytics in government Making critical connections: predictive analytics in government Improve strategic and tactical decision-making Highlights: Support data-driven decisions using IBM SPSS Modeler Reduce fraud, waste and abuse

More information

Business Analytics for Big Data

Business Analytics for Big Data IBM Software Business Analytics Big Data Business Analytics for Big Data Unlock value to fuel performance 2 Business Analytics for Big Data Contents 2 Introduction 3 Extracting insights from big data 4

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Jabil builds momentum for business analytics

Jabil builds momentum for business analytics Jabil builds momentum for business analytics Transforming financial analysis with help from IBM and AlignAlytics Overview Business challenge As a global electronics manufacturer and supply chain specialist,

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 11.1 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

The IBM Cognos family

The IBM Cognos family IBM Software Business Analytics Cognos Software The IBM Cognos family Analytics in the hands of everyone who needs it 2 The IBM Cognos Family Overview Business intelligence (BI) and business analytics

More information

How To Use Social Media To Improve Your Business

How To Use Social Media To Improve Your Business IBM Software Business Analytics Social Analytics Social Business Analytics Gaining business value from social media 2 Social Business Analytics Contents 2 Overview 3 Analytics as a competitive advantage

More information

Minimize customer churn with analytics

Minimize customer churn with analytics IBM Software Business Analytics Telecommunications Minimize customer churn with analytics Understand who s likely to churn and take action with IBM software 2 Minimize customer churn with analytics Contents

More information

Develop Predictive Models Using Your Business Expertise

Develop Predictive Models Using Your Business Expertise Clementine 8.5 Specifications Develop Predictive Models Using Your Business Expertise Clementine is an integrated data mining workbench, popular worldwide with data miners and business analysts alike.

More information

IBM Content Analytics adds value to Cognos BI

IBM Content Analytics adds value to Cognos BI IBM Software IBM Industry Solutions IBM Content Analytics adds value to Cognos BI 2 IBM Content Analytics adds value to Cognos BI Analyzing unstructured information It is generally accepted that about

More information

Solve Your Toughest Challenges with Data Mining

Solve Your Toughest Challenges with Data Mining IBM Software Business Analytics IBM SPSS Modeler Solve Your Toughest Challenges with Data Mining Use predictive intelligence to make good decisions faster Solve Your Toughest Challenges with Data Mining

More information

Get to Know the IBM SPSS Product Portfolio

Get to Know the IBM SPSS Product Portfolio IBM Software Business Analytics Product portfolio Get to Know the IBM SPSS Product Portfolio Offering integrated analytical capabilities that help organizations use data to drive improved outcomes 123

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

Harnessing the power of advanced analytics with IBM Netezza

Harnessing the power of advanced analytics with IBM Netezza IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW A high-performance solution based on IBM DB2 with BLU Acceleration Highlights Help reduce costs by moving infrequently used to cost-effective systems

More information

A financial software company

A financial software company A financial software company Projecting USD10 million revenue lift with the IBM Netezza data warehouse appliance Overview The need A financial software company sought to analyze customer engagements to

More information

Netezza and Business Analytics Synergy

Netezza and Business Analytics Synergy Netezza Business Partner Update: November 17, 2011 Netezza and Business Analytics Synergy Shimon Nir, IBM Agenda Business Analytics / Netezza Synergy Overview Netezza overview Enabling the Business with

More information

Driving business intelligence to new destinations

Driving business intelligence to new destinations IBM SPSS Modeler and IBM Cognos Business Intelligence Driving business intelligence to new destinations Integrating IBM SPSS Modeler and IBM Cognos Business Intelligence Contents: 2 Mining for intelligence

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

IBM InfoSphere Optim Test Data Management

IBM InfoSphere Optim Test Data Management IBM InfoSphere Optim Test Data Management Highlights Create referentially intact, right-sized test databases or data warehouses Automate test result comparisons to identify hidden errors and correct defects

More information

CUSTOMER Presentation of SAP Predictive Analytics

CUSTOMER Presentation of SAP Predictive Analytics SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop

More information

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

More information

The IBM Cognos family

The IBM Cognos family IBM Software Business Analytics Cognos software The IBM Cognos family Analytics in the hands of everyone who needs it The IBM Cognos family Overview Business intelligence (BI) and business analytics have

More information

BI forward: A full view of your business

BI forward: A full view of your business IBM Software Business Analytics Business Intelligence BI forward: A full view of your business 2 BI forward: A full view of your business Contents 2 Introduction 3 BI for today and the future 4 Predictive

More information

Three proven methods to achieve a higher ROI from data mining

Three proven methods to achieve a higher ROI from data mining IBM SPSS Modeler Three proven methods to achieve a higher ROI from data mining Take your business results to the next level Highlights: Incorporate additional types of data in your predictive models By

More information

Beyond listening Driving better decisions with business intelligence from social sources

Beyond listening Driving better decisions with business intelligence from social sources Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social

More information

Afni deploys predictive analytics to drive milliondollar financial benefits

Afni deploys predictive analytics to drive milliondollar financial benefits Afni deploys predictive analytics to drive milliondollar financial benefits Using a smarter approach to debt recovery to identify the best payers and focus collection efforts Overview The need Afni wanted

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Stella-Jones takes pole position with IBM Business Analytics

Stella-Jones takes pole position with IBM Business Analytics Stella-Jones takes pole position with IBM Faster, more accurate reports, budgets and forecasts support a rapidly growing business Overview The need Following several key strategic acquisitions, Stella-Jones

More information

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to

More information

Data Integrator Performance Optimization Guide

Data Integrator Performance Optimization Guide Data Integrator Performance Optimization Guide Data Integrator 11.7.2 for Windows and UNIX Patents Trademarks Copyright Third-party contributors Business Objects owns the following

More information

Bunzl Distribution. Solving problems for sales and purchasing teams by revealing new insights with analytics. Overview

Bunzl Distribution. Solving problems for sales and purchasing teams by revealing new insights with analytics. Overview Bunzl Distribution Solving problems for sales and purchasing teams by revealing new insights with analytics Overview The need Bunzl wanted to leverage its data for improved business decisions but gathering

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: combines powerful analytical techniques with existing fraud detection and prevention efforts

More information

eircom gains deep insights into customer experience

eircom gains deep insights into customer experience eircom gains deep insights into customer experience Reducing churn and improving customer experience with predictive analytics from IBM and Presidion Smart is... Using predictive analytics to identify

More information

Setting smar ter sales per formance management goals

Setting smar ter sales per formance management goals IBM Software Business Analytics Sales performance management Setting smar ter sales per formance management goals Use dedicated SPM solutions with analytics capabilities to improve sales performance 2

More information

Business intelligence for business users

Business intelligence for business users IBM Software Business Analytics Business intelligence Business intelligence for business users 2 R and SPSS software: Everyone wins Contents 2 Overview 3 Business users are faced with a number of analytics

More information

An In-Depth Look at In-Memory Predictive Analytics for Developers

An In-Depth Look at In-Memory Predictive Analytics for Developers September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)

More information

Big Data and Its Impact on the Data Warehousing Architecture

Big Data and Its Impact on the Data Warehousing Architecture Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research

More information

IBM PureData System for Operational Analytics

IBM PureData System for Operational Analytics IBM PureData System for Operational Analytics An integrated, high-performance data system for operational analytics Highlights Provides an integrated, optimized, ready-to-use system with built-in expertise

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Focus on the business, not the business of data warehousing!

Focus on the business, not the business of data warehousing! Focus on the business, not the business of data warehousing! Adam M. Ronthal Technical Product Marketing and Strategy Big Data, Cloud, and Appliances @ARonthal 1 Disclaimer Copyright IBM Corporation 2014.

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Analyze social media data to improve business outcomes Highlights Grow your business by understanding consumer sentiment and optimizing marketing campaigns. Make better decisions and strategies across

More information

Achieving customer loyalty with customer analytics

Achieving customer loyalty with customer analytics IBM Software Business Analytics Customer Analytics Achieving customer loyalty with customer analytics 2 Achieving customer loyalty with customer analytics Contents 2 Overview 3 Using satisfaction to drive

More information

How To Transform Customer Service With Business Analytics

How To Transform Customer Service With Business Analytics IBM Software Business Analytics Customer Service Transforming customer service with business analytics 2 Transforming customer service with business analytics Contents 2 Overview 2 Customer service is

More information

IBM Cognos Controller

IBM Cognos Controller IBM Cognos Controller Full financial close management in a solution managed by the office of finance Highlights Addresses your extended financial close process close, consolidate, report and file Delivers

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience

More information

IBM Cognos Controller

IBM Cognos Controller IBM Cognos Controller Accurate, auditable close, consolidation and reporting in a solution managed by the office of finance Highlights Provides all close, consolidation and reporting capabilities Automates

More information

Analysis for everyone

Analysis for everyone Analysis for everyone Highlights Satisfying the spectrum of user needs with simple to advanced analysis Getting the right people engaged in the decision-making process Delivering analysis where and when

More information

The IBM Cognos Platform for Enterprise Business Intelligence

The IBM Cognos Platform for Enterprise Business Intelligence The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics

More information

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate

More information

Fiserv. Saving USD8 million in five years and helping banks improve business outcomes using IBM technology. Overview. IBM Software Smarter Computing

Fiserv. Saving USD8 million in five years and helping banks improve business outcomes using IBM technology. Overview. IBM Software Smarter Computing Fiserv Saving USD8 million in five years and helping banks improve business outcomes using IBM technology Overview The need Small and midsize banks and credit unions seek to attract, retain and grow profitable

More information

Key Attributes for Analytics in an IBM i environment

Key Attributes for Analytics in an IBM i environment Key Attributes for Analytics in an IBM i environment Companies worldwide invest millions of dollars in operational applications to improve the way they conduct business. While these systems provide significant

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

IBM SmartCloud Workload Automation

IBM SmartCloud Workload Automation IBM SmartCloud Workload Automation Highly scalable, fault-tolerant solution offers simplicity, automation and cloud integration Highlights Gain visibility into and manage hundreds of thousands of jobs

More information

IBM Algo Asset Liability Management

IBM Algo Asset Liability Management IBM Algo Asset Liability Management Industry-leading asset and liability management solution for the enterprise Highlights The fast-paced world of global markets presents asset and liability professionals

More information

Maximizing Return and Minimizing Cost with the Decision Management Systems

Maximizing Return and Minimizing Cost with the Decision Management Systems KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management

More information

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Using visualization to understand big data

Using visualization to understand big data IBM Software Business Analytics Advanced visualization Using visualization to understand big data By T. Alan Keahey, Ph.D., IBM Visualization Science and Systems Expert 2 Using visualization to understand

More information