Data Mining for Everyone



Similar documents
Fast and Easy Delivery of Data Mining Insights to Reporting Systems

DATA MINING AND WAREHOUSING CONCEPTS

Solve your toughest challenges with data mining

Data Mining Solutions for the Business Environment

Solve Your Toughest Challenges with Data Mining

III JORNADAS DE DATA MINING

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Solve your toughest challenges with data mining

BUSINESS INTELLIGENCE

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

IBM SPSS Modeler Premium

SAP S/4HANA Embedded Analytics

IBM Cognos Performance Management Solutions for Oracle

Foundations of Business Intelligence: Databases and Information Management

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Business Intelligence for Everyone

Harnessing the power of advanced analytics with IBM Netezza

Predictive analytics with System z

Hexaware E-book on Predictive Analytics

Customer Analytics. Turn Big Data into Big Value

Database Marketing, Business Intelligence and Knowledge Discovery

Three proven methods to achieve a higher ROI from data mining

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

IBM SPSS Modeler Professional

A SAS White Paper: Implementing the Customer Relationship Management Foundation Analytical CRM

IBM Software A Journey to Adaptive MDM

The IBM Cognos Platform

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM

IBM Financial Transaction Manager for ACH Services IBM Redbooks Solution Guide

Hurwitz ValuePoint: Predixion

Stella-Jones takes pole position with IBM Business Analytics

A business intelligence agenda for midsize organizations: Six strategies for success

Achieving customer loyalty with customer analytics

BUSINESSOBJECTS PREDICTIVE WORKBENCH XI 3.0

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Delivering new insights and value to consumer products companies through big data

IBM Content Analytics adds value to Cognos BI

The Future of Business Analytics is Now! 2013 IBM Corporation

Cincom Business Intelligence Solutions

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

SAP Predictive Analysis: Strategy, Value Proposition

Getting the most out of big data

ETPL Extract, Transform, Predict and Load

IBM Analytical Decision Management

The Benefits of Data Modeling in Business Intelligence

Beyond listening Driving better decisions with business intelligence from social sources

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

IBM InfoSphere Optim Test Data Management

How To Use An Ibm Infosphere Mdm For Salesforce.Com

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Business-driven governance: Managing policies for data retention

Predictive Analytics: Turn Information into Insights

TEXT ANALYTICS INTEGRATION

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Industry Models and Information Server

World Programming System provides system performance monitoring and reporting, and supports end-user applications and reporting

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Predictive Analytics for Donor Management

The top 10 secrets to using data mining to succeed at CRM

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Making confident decisions with the full spectrum of analysis capabilities

In-Database Analytics

THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Database Marketing simplified through Data Mining

Focus on the business, not the business of data warehousing!

IBM Cognos Express Essential BI and planning for midsize companies

ADVANTAGES OF IMPLEMENTING A DATA WAREHOUSE DURING AN ERP UPGRADE

Continuing the MDM journey

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Data Mining + Business Intelligence. Integration, Design and Implementation

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Data Mining System, Functionalities and Applications: A Radical Review

ORACLE FINANCIAL SERVICES ANALYTICAL APPLICATIONS INFRASTRUCTURE

INSIGHT NAV. White Paper

Easily Identify Your Best Customers

Healthcare Measurement Analysis Using Data mining Techniques

How To Model Data For Business Intelligence (Bi)

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Business Intelligence Solutions for Gaming and Hospitality

SPSS Modeler Integration with IBM DB2 Analytics Accelerator

Use Advanced Analytics to Guide Your Business to Financial Success

Pentaho Data Mining Last Modified on January 22, 2007

Microsoft Business Analytics Accelerator for Telecommunications Release 1.0

Big Data. Fast Forward. Putting data to productive use

Data Mining Techniques

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Top 5 Transformative Analytics Applications in Retail

Transcription:

Page 1 Data Mining for Everyone Christoph Sieb Senior Software Engineer, Data Mining Development Dr. Andreas Zekl Manager, Data Mining Development

Page 2 Executive Summary Contents 2 Data mining in the context of business intelligence and dynamic warehousing 5 Creating a data mining application with InfoSphere Warehouse 9 Business scenarios Business intelligence systems are key tools for providing organizations with important business insights, allowing for fast and reliable business decisions. Data warehousing is an integral part of the business intelligence systems used by many organizations. Data mining, by contrast, is not as widespread as data warehousing, even though it is a mature methodology and technology that can be used to help discover hidden patterns in data and to enable predictive capabilities. Data mining is not only an analytic tool for knowledge discovery; it is also an important part of analytical and operational business processes that operate in real time and thus contributes directly to a successful business. In this paper, we show data mining in the context of IBM s dynamic warehousing strategy and outline the functionality that comes with IBM InfoSphere Warehouse. We then use a set of real-world business scenarios to illustrate the business value of those data mining solutions. Data mining in the context of business intelligence and dynamic warehousing Business intelligence comprises software, hardware, methods, applications, and best practices that provide fast and reliable insights into the current business and thus allow for fast and reliable business decisions. Business intelligence technologies have been used for decades. Data mining is the most advanced business intelligence analysis component: it produces more business insights than other analytical components such as online analytical processing (OLAP) [6] and standard reporting tools. Data mining originated in the late 1980s and has grown in importance since then, but many people still believe that using data mining techniques is difficult and, therefore, do not exploit its inherent business potential.

Page 3 Dynamic warehousing is a new approach that addresses the primary business challenges that organizations face today. Organizations need to deliver the right information to the right people at the right time to more effectively leverage business data and make more effective business decisions. Dynamic warehousing is about providing information on demand to optimize real-time processes. Dynamic warehousing provides four key features: Support for real-time access to aggregated, cleansed information representing a single version of the truth that can be delivered in the context of the activities and processes being performed The ability to extract knowledge from unstructured information 1 Embedded analytics that can be leveraged as part of a business process A complete set of integrated capabilities that extends beyond the data warehouse to enable the use of information on demand Data mining plays an integral role in the first two of these key features. In the following pages, we will show that you can embed data mining into a business process, providing real-time data access. Also, we will point out how you can use data mining technologies to extract additional business insight from unstructured information, which traditionally has been difficult to do. But what exactly is data mining, how can it be used, and who can use it? Data mining is defined as the nontrivial extraction of implicit, previously unknown and potentially useful information from data. [4] Data mining methods can be categorized into two groups: Discovery methods. These find patterns and associations in data that can be used to make business decisions. For example, for cross-selling purposes, a company can find out which products are bought together, or using customer segmentation, a company can decide how to advertise. Prediction methods. These use historical data to create models that help to predict currently unknown values of new data: for example, sales forecasts, stock price predictions, medical prognoses, and potential churners 2. 1 In contrast to structured data, which is atomic data like a name or a person s age, unstructured data is most often continuous text such as e-mails, comments of call center agents, or textual reports. 2 Churners are customers who moved to a competitor.

Page 4 The creation of models from data is called modeling, and using these models to perform predictions or segmentations is called scoring. Creating a model often requires analyzing a lot of data and find out relationships, which is a computing-intensive task. By contrast, scoring involves matching other data (each record can be matched independently) against a model. In the case of real-time applications, the data to match is often just a single record. Thus, you can easily embed scoring into business processes, such as by using SOA-based Web services [8]. Data mining is an iterative process that is described in the CRoss Industry Standard Process for Data Mining (CRISP-DM) [9]. CRISP-DM comprises five steps, as shown in Figure 1: 1. Business understanding. A business analyst defines what he or she wants to investigate and which information might help in making business decisions and business improvements: that is, which information might help to achieve a business goal. 2. Data understanding. After the business goal is defined, the data necessary to reach that goal must be identified. First samples, statistics, and visualizations are used to get a feeling for the data and identify quality problems within the data. For this step, a business analyst, a data mining specialist, and a data warehouse administrator must work together. 3. Data preparation. This is often the most time-consuming step. Based on the outcome of step 2, the existing data needs to be merged and transformed into the right format for the data mining methods. This task is usually performed by a data warehouse administrator or a data warehouse application developer. 4. Modeling. The data mining model is created. This task is performed by the data mining specialist and the data warehouse application developer. 5. Evaluation. Results are evaluated against the business goals; if the results are unsatisfactory, one or more steps in the iterative process are repeated. 6. Deployment. After the results are satisfactory, the data mining flow can be deployed as a data warehouse application and made available to users. For the users, most of the complexity is hidden; only a few people outside the user groups must understand the whole data mining process in detail.

Page 5 Figure 1. CRISP-DM process model. All these steps are supported by functions available in InfoSphere Warehouse, as described in the next section. Creating a data mining application with InfoSphere Warehouse InfoSphere Warehouse includes an Eclipse-based [10] tool called Design Studio (see Figure 2), which enables you to perform all data warehouse design tasks. You can define and deploy data models, compose SQL queries using wizards that simplify their creation, and create and deploy whole OLAP cubes. You can even design and deploy dashboards without writing a single line of code. You can also perform data mining within Design Studio. You can gather and transform data with minimal SQL knowledge, and you can easily create and visualize mining models. Furthermore, you can deploy mining flows that you intend to use in a business process from

Page 6 the Design Studio workbench, which greatly simplifies life cycle management. Statistical functions and integrated visualizers such as multivariate distribution views help during the data-understanding step. Figure 2. Design Studio All the pre-processing and mining functionality in Design Studio is encapsulated in operators that you can drag onto a flow editor. You can then connect the operators according to the specific business scenario. Figure 3 shows a simple data mining flow for customer segmentation in the banking industry. The flow consists of five operators connected from left to right.

Page 7 Figure 3. Simple data mining flow The upper table source processes customer bank transactions from the corporate transaction table, while the lower table source processes the customer master data. Both tables are joined to a single table that is used to feed the clustering operator. Finally, the cluster model is propagated to the visualizer, which displays the learned customer segmentation model shown in Figure 4. The visualizer shows the discovered segments and the characteristic data distribution of each cluster. Figure 4. Visualized clustering model

Page 8 InfoSphere Warehouse provides many more operators for preprocessing and data mining. Table 1 lists the data mining methods available in InfoSphere Warehouse 3. The right column shows some typical problem scenarios that can be solved by the corresponding mining method. Table 1. Data mining methods of InfoSphere Warehouse and typical scenarios Data mining Methods Typical scenarios Clustering Perform customer segmentation for selective marketing Group similar types of houses, and use the data in city planning Detect fraudulent users Classification Understand which customers are valuable Predict which customers are valuable Perform churn analysis Predict heart attacks Value prediction - regression Predict claim amounts of insurance customers Predict cholesterol values Association rule mining Perform market basket analysis Support cross-selling Sequential pattern mining Perform market basket analysis Market selectively Plan purchases Note: Design Studio is not a full-blown statistics workbench. Design Studio is a tool for data warehouse application developers with data mining expertise, not for statisticians. In most cases, the Design Studio functionality is sufficient; however, in case you need a full-blown statistics workbench, InfoSphere Warehouse offers integration points with workbenches from non-ibm vendors. For this purpose, InfoSphere Warehouse Data Mining supports the standardized Predictive Model Markup Language (PMML) [11] to exchange models. Thus, you can 3 Each category provides one or more algorithms, depending on the specific analysis problem.

Page 9 easily integrate models created by other tools, such as into a scoring flow that itself is used inside a business process. Although Design Studio is mainly a tool for data warehouse application developers, data mining experts, and IT-savvy business analysts, you can make the data mining flows that you deploy from Design Studio accessible to a variety of users. You can make the data mining flows accessible to managers, financial analysts, marketing staff, sales staff, consultants in a banking call center, or even cashiers using a cash desk computer (see the Retail business scenario in the next section). After you deploy a data mining flow to an application server, you can use that data mining flow to visualize data mining results, such as by using a Web application using Alphablox [7] and Miningblox. You can also use an SOA architecture that includes Web services to perform real-time scoring. Another option is to use other front-end applications such as Cognos to access deployed data mining applications and incorporate those applications into the front-end application s dashboards. The following section describes some business scenarios in more detail. However, the possibilities of data mining in the context of dynamic warehousing are much broader than those described in these scenarios. Business scenarios In this section, we describe scenarios for four types of businesses (retail, banking, insurance, and mobile network operator), representing parts of real-world business processes. Each scenario is introduced with information about the business requirement. Next, a data mining approach is described that can improve the business process in terms of customer satisfaction, revenue, and profit. Finally, there is a description of a scenario that uses the data mining approach. Retail Support product selling creating discount coupons on the fly Business requirement A consumer electronics retailer is planning to support sales of specific products by providing coupons for special offers at the cash desk. The coupons must be created on the fly, taking into consideration the customer s buying behavior. If the customer can be identified with a credit or customer card, the system should consider historical

Page 10 shopping patterns. If the customer cannot be identified, only the justbought products should be considered. Data mining approach The retailer records the purchase transactions in its data warehouse. Using Design Studio, the retailer creates a data mining flow that extracts association rules and sequential patterns from those transactions (that is, the retailer performs market basket analysis; for details, refer to [5]). The retailer deploys the flow on an application server that runs the analysis regularly so that results are up-to-date. Furthermore, the retailer installs a scoring Web service that predicts the products that a customer is most likely to buy according to the current products in his or her market basket and according to his or her previous shopping history. Scenario using the data mining approach The sales manager obtains a monthly report on sales figures through e- mail from the corporate business intelligence system. He recognizes that a newly introduced product is not selling as well as expected, even though it was advertised the week before its introduction. He logs on to the business intelligence system using a Web browser. He selects the product of concern and has a closer look at its sales figures using the integrated OLAP capabilities. He realizes that the product is especially poorly sold in the western region of the company s sales area. To support sales of this specific product, the system provides the ability to automatically create coupons for selected regions and products. The manager selects the effected region and product and the system automatically takes all those rules from the regular market basket analysis that contains the poorly selling product in the rule s consequence (for more details on rules, refer to [5]). Using those rules, the system will issue a coupon for those customers who are most likely to be interested in the poor selling product. For example, an association rule might state that customers who bought products X and Y also bought the poorly selling product in 40% of the cases. According to this rule, a coupon promoting the poorly selling product will be created whenever a customer buys products X and Y. After the manager confirms his selection, the corresponding rules are deployed in real time. This deployment sets up a scoring service that can be accessed using an SOA-based Web service. Inputs for the Web service are information about just-bought products and, if available, the customer card or credit card ID that is necessary to

Page 11 refer to previously bought products. The service returns information about the products for cross-selling according to previously deployed rules. Now, the rules can be automatically accessed by the cash desk computer using a call to the Web service in the central data warehouse. Whenever a customer with the promising buying behavior is paying, the cash desk automatically creates a coupon with a price reduction on the poorly selling product. This process allows for a selective, cost-effective promotion in real time using an underlying Web service that has been deployed in real time. Thus, the time to react is drastically reduced, which avoids large opportunity costs increasing customer retention and the sales figures. Banking Detect reliable loan customers Business requirement A bank decides to start providing a new consumer loan product. To avoid profit losses due to non-payments by customers, the bank wants to learn about the reliability of its customers by using records of their other loans, thus improving decision-making. The bank also wants to record new consumer loan contracts. Information about loan decisions must be comprehensible both for the customer and for the customer consultant. Data mining approach The sales manager and the internal data analyst use Design Studio to create a data mining flow that produces a classification model from the historical data. The historical data comprises not only demographic data about the customer (such as age, gender, and family status) but also transactional data (such as account balances and transfers). Additionally, after each loan process is closed, a customer consultant classifies each loan record as to whether the customer was reliable. This classification data is used to create a decision tree model (for details, refer to [5]) to classify new customers as reliable or unreliable. Additionally, the decision tree model provides confidence levels for each decision (0 100%). If a loan request is refused because there is a high degree of confidence that a customer is unreliable, the decision tree provides a comprehensible explanation that can be given to the customer.

Page 12 The mining flow is deployed to an application server that creates a new, updated decision tree every month to incorporate newly available customer classification data that improves the decisionmaking process. Additionally, the decision tree model is made available as a Web service to score new customers. A customer ID from a previously created customer record is used as input to the Web service to retrieve customer-related data. The service classifies the reliability of a customer with a certain degree of confidence. Scenario using the data mining approach A customer wants to take out a loan. He enters the bank and asks for the new consumer loan product. First, the consultant collects general information about the customer and enters it into the banking software system. The system automatically invokes the Web service running on the bank s application server using the customers data as input. The Web service immediately classifies the customer and the consultant can review the classification with the corresponding confidence value and the reason for the classification. This information supports the consultant to make better decisions about which customers are reliable and, thus, to reduce profit losses. Insurance Determine insurance rates in real time Business requirement An insurance company offers car insurance that it distributes using the Internet and call centers. The company wants to give a customer using the company s Web site or a call center agent the possibility of determining an individual insurance rate by providing data such as age, gender, and car type. Data mining approach The company s car insurance manager uses Design Studio to create a prediction model. The prediction model incorporates customer demographic data, data about his car(s) and data about former claims. Additionally, the cost and profit margin for a single insurance contract is incorporated into the model, allowing the individual insurance rate to be predicted. The manager creates a prediction model according to the customer characteristics which is able to predict the necessary insurance rates. After the manager has created the model, he deploys it to an application server, using Web services to make the results available to other applications. Call center agents and customers using the Internet can access the prediction model in real time.

Page 13 Scenario using the data mining approach A customer calls the call center and asks for a car insurance quotation. The call center agent asks for the customer s necessary data and uses it as input for the previously deployed Web service to get a precise rate for the insurance, without the need for detailed knowledge of the insurance industry. Call center Improve churn analysis Business requirement The call center of a mobile network operator is responsible for new customer contracts, customer questions, complaints, and general service support. An important focus is customers who are considering canceling their contracts and switching to another operator. The call center agents must recognize those customers and avoid the cancellations by making them special offers. Data mining approach Often, it is not easy to detect customers who are likely to switch to another company. Data mining can provide ways to create models that explain why and predict which customers tend to cancel their contracts. In the data mining context, this type of analysis is called churn analysis. As in the previous banking scenario, customers in the insurance scenario can be classified, using a classification model, into likely to switch and likely to stay customers. The insurance company also uses existing automated segmentation of customers into similar groups to create special offers related to the characteristics of those groups. The company also has a large potential source of information about customers: namely, the unstructured textual information recorded by the call center agents during calls. The information is stored in a database, but ordinary methods cannot access the data, so the company hadn t been using that data to improve automatic identification of customers who might switch to a different operator. However, using InfoSphere Warehouse, the company can now exploit the textual information. InfoSphere Warehouse provides analysis methods for unstructured data in a call center application context, allowing the unstructured data to be automatically converted into structured data. To implement this solution, a data analyst creates domain-specific dictionaries to extract structured information from the

Page 14 unstructured information. InfoSphere Warehouse provides support for creating the dictionaries and revising them over time. The additional information from the unstructured text can significantly improve the prediction accuracy of the classification model and thus increases the probability of preventing customers from switching operators. The created dictionaries, the classification model, and the customer group model are deployed as a Web service on the company s application server to make them accessible from other business applications in real time. Scenario using the data mining approach A call center agent talks to a customer asking for price information. During the call, the agent retrieves the customer s record and adds a comment to the record that states Customer asked for fixed-price offers. After committing the comment, the system automatically submits the information to the deployed Web service. The Web service extracts fixed-price offers from the comment automatically and requests a prediction about whether the customer is likely to switch. To make the prediction, the system uses the extracted information and structured data about the customer, such as city, age, current contract details, and call history. The system identifies the customer as likely to switch and retrieves the special offer for the corresponding customer group (here, persons often performing foreign calls). In this case, the system immediately suggests a fixed-price contract with an additional initial credit for foreign calls, which the agent offers to the customer. The customer is attracted by this offer and requests a contract change. The reason that the customer asked for information about fixed-price options was that he was unsatisfied with his current contract and planned to compare different offers from different competitors. By analyzing the classification model, it is possible to understand the underlying intention of the customer. In this case, the customer lives in the southern part of the company s territory, where a local competitor had attracted other customers from the company by using a special offer for fixed-price contracts. This fact was detected while building the classification model. Thus, the special offer with an initial credit convinced the customer to stay.

Page 15 Summary and conclusions This paper provided a short introduction to data mining in the context of dynamic warehousing and business intelligence. It also showed how InfoSphere Warehouse supports the development of data mining applications. A selection of business scenarios gave an idea of the enormous potential of data mining with real-time access. Using the powerful data warehouse technology of InfoSphere Warehouse, including tooling (Design Studio) and in-line analytics (such as those provided by Alphablox and Miningblox), you can create many solutions with just a small amount of custom development effort. As a result, you can reduce both the time and cost needed to install business intelligence solutions, such as those described in the business scenarios. Further reading 1. IBM Data Warehousing and Business Intelligence http://www.ibm.com/software/data/db2bi/ 2. InfoSphere Warehouse documentation http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.js p?topic=/com.ibm.dwe.welcome.doc/dwev9welcome.html 3. DB2 9.5 for Linux, UNIX, and Windows manuals http://www-1.ibm.com/ support/docview.wss?rs=71&uid=swg27009727 4. W. Frawley and G. Piatetsky-Shapiro and C. Matheus, "Knowledge Discovery in Databases: An Overview". AI Magazine: pp. 213-228, ISSN 0738-4602, 1992. 5. C. Ballard, J. Rollins, J. Ramos, A. Perkins, R. Hale, A. Dorneich, E. C. Milner and J. Chodagam, Dynamic Warehousing: Data Mining Made Easy. International Technical Support Organization Redbooks publication (IBM), ISBN 0738488860, September 2007. Available at http://www.redbooks.ibm.com/abstracts/sg247418.html; last accessed date: 03/17/2008. 6. M. Alcorn, M. Flasza, OLAP and Cubing Services., to be published 2008.

Page 16 7. C. Ballard, A. Beaton, D. Chiou, J. Chodagam, M. Lowry, A. Perkins, R. Phillips and J. Rollins, Leveraging DB2 Data Warehouse Edition for Business Intelligence. International Technical Support Organization Redbooks publication (IBM), ISBN 0738488860, published November 2006, last updated September 2007. Available at http://www.redbooks.ibm.com/abstracts/sg247274.html; last accessed date: 03/17/2008. 8. M. Endrei, J. Ang, A. Arsanjani, S. Chua, P. Comte, P. Krogdahl, M. Luo and T. Newling, Patterns: Service-Oriented Architecture and Web Services. International Technical Support Organization Redbooks publication (IBM), ISBN 0738496685, published April 2004, last updated July 2004. Available at http://www.redbooks.ibm.com/abstracts/sg246303.html; last accessed date: 03/17/2008. 9. CRoss Industry Standard Process for Data Mining, http://www.crisp-dm.org/process/index.htm; last accessed date: 03/17/2008. 10. The Eclipse Foundation, The Eclipse Project, available at http://www.eclipse.org/; last accessed date: 03/17/2008. 11. Data Mining Group, PMML Standard, available at http://www.dmg.org/; last accessed date: 03/17/2008.

Page 17 7 Copyright IBM Corporation, 2008 03-08 All Rights Reserved IBM, the IBM logo, Alphablox, DB2, InfoSphere, and Redbooks are registered trademarks or trademarks of International Business Machines Corporation in the United States, other countries, or both. Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates. Neither this documentation nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of all of the above mentioned copyright owners. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. The information in this document concerning non-ibm products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility or any other claims related to non-ibm products. Questions about the capabilities of non-ibm products should be addressed to the supplier(s) of those products.