Pentaho Data Mining Last Modified on January 22, 2007



Similar documents
An Introduction to WEKA. As presented by PACE

Pentaho Reporting Overview

Introduction Predictive Analytics Tools: Weka

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

Get to Know the IBM SPSS Product Portfolio

2015 Workshops for Professors

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Beyond Traditional Management Reporting IBM Corporation

Azure Machine Learning, SQL Data Mining and R

Harnessing the power of advanced analytics with IBM Netezza

The Future of Business Analytics is Now! 2013 IBM Corporation

Predictive Analytics

Make Better Decisions Through Predictive Intelligence

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

An In-Depth Look at In-Memory Predictive Analytics for Developers

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Tax Fraud in Increasing

Integrated Financial Forecasting and Planning Performance Blueprint Powered by TM1

Open Source meets Business Intelligence Seminar Business Intelligence Winter Term 06/07

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Introduction to Data Mining

Model Deployment. Dr. Saed Sayad. University of Toronto

IBM SPSS Modeler Professional

How To Turn Big Data Into An Insight

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Maximizing the ROI Of Visual Rules

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

How To Understand Business Intelligence

Expense Planning and Control Performance Blueprint Powered by TM1

Introduction to Data Mining

Making confident decisions with the full spectrum of analysis capabilities

Oracle Business Intelligence 11g Business Dashboard Management

IBM SPSS Modeler Professional

Database Marketing, Business Intelligence and Knowledge Discovery

BUSINESSOBJECTS PREDICTIVE WORKBENCH XI 3.0

Big Data for Investment Research Management

Better planning and forecasting with IBM Predictive Analytics

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

An Introduction to Data Mining

Data Mining Applications in Higher Education

IBM's Fraud and Abuse, Analytics and Management Solution

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter 6 - Enhancing Business Intelligence Using Information Systems

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Hurwitz ValuePoint: Predixion

ASSET ARENA PROCESS MANAGEMENT. Frequently Asked Questions

Data Mining mit der JMSL Numerical Library for Java Applications

Analance Data Integration Technical Whitepaper

Data Mining: Overview. What is Data Mining?

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

The basic data mining algorithms introduced may be enhanced in a number of ways.

Data Mining Solutions for the Business Environment

A Guide Through the BPM Maze

Business Value Reporting and Analytics

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Working with telecommunications

Symantec Control Compliance Suite. Overview

Oracle Data Miner (Extension of SQL Developer 4.0)

Viewpoint ediscovery Services

Oracle Business Intelligence EE. Prab h akar A lu ri

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

Hexaware E-book on Predictive Analytics

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Pentaho Enterprise and Community Editions Feature Comparison

Data Mining + Business Intelligence. Integration, Design and Implementation

Using Predictive Analytics to Increase Profitability Part III

Analance Data Integration Technical Whitepaper

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

IBM SPSS Direct Marketing

Databricks. A Primer

An Introduction to Advanced Analytics and Data Mining

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER

Empowering the Masses with Analytics

from Larson Text By Susan Miertschin

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

Databricks. A Primer

Data Warehousing and Data Mining in Business Applications

not possible or was possible at a high cost for collecting the data.

Easily Identify the Right Customers

B2B opportunity predictiona Big Data and Advanced. Analytics Approach. Insert

Three proven methods to achieve a higher ROI from data mining

Improving Business Insight

ProClarity Analytics Family

Numerical Algorithms Group

Transcription:

Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org Last Modified on January 22, 2007

Pentaho Data Mining Overview Once you ve got analysis, reporting, and dashboards deployed, it s time to take your Business Intelligence (BI) to the next level by adding data mining and workflow to the mix. This is a level of BI excellence that many organizations never manage to evolve to, however the importance of pushing ahead with advanced capabilities cannot be underestimated they can provide a truly sustainable competitive advantage and enable your organization to maximize both its efficiency and effectiveness. Data Mining is the process of running data through sophisticated algorithms to uncover meaningful patterns and correlations that may otherwise be hidden. These can be used to help you understand the business better and also exploited to improve future performance through predictive analytics. For example, data mining can warn you there s a high probability a specific customer won t pay on time based on an analysis of customers with similar characteristics. To help you fully utilize data mining for organizational advantage, the Pentaho BI Project team has worked in conjunction with the development and business communities to integrate mainstream BI capabilities with advanced data mining. Pentaho Data Mining is differentiated by its open, standardscompliant nature, use of the Weka data mining project, and tight integration with core business intelligence capabilities including data integration, reporting, analysis and dashboards. Other data mining offerings lack this level of sophistication and integration. Pentaho Data Mining Provides Sophisticated Analytical Insight Into Trends and Opportunities. Pentaho TM Pentaho Data Mining 2

Uncover Hidden Patterns and Relationships A classic example of data mining is a retailer who uncovers a relationship between sales of diapers and beer on Sunday afternoons two items you wouldn t normally consider as linked. The explanation is that husbands who are sent out to pick up a fresh supply of diapers are also likely to pick up some beer while they happen to be in the store something that hadn t been recognized as a significant sales driver before data mining uncovered it. Exploit Insights to Improve Performance Continuing the example above, very often retailers act on the relationships they discover by using tactics such as placing linked items together on end-of-isle displays as a way to spur additional purchases. All organizations can benefit from acting in a similar way using newly discovered patterns and correlations as the basis for taking action to improve their efficiency and effectiveness. Predict Future Performance Those who do not learn from history are doomed to repeat it is a famous quote from philosopher George Santayana. In the case of data mining, being able to predict outcomes based on historic data can dramatically improve the quality and outcomes of decision making in the present. As a simple example, if the best indicator of whether a customer will pay on time turns out to be a combination of their market segment and whether or not they have paid previous bills on time, then this is information you can usefully benefit from in making current credit decisions. Embed Insights Into Your Applications You can use the data mining results to display a simple summary statement and recommendations within operational applications. For example, on a credit screen you could add: Based on this new account profile there is an 85% chance this customer will pay late. It is therefore recommended you require a 50% prepayment on this order. Reporting on aggregate results such as Days Sales Outstanding (DSO) enables you to measure business improvements based on when recommendations were followed and when they weren t so that you can fine-tune your model and recommendations over time for optimal effect. Pentaho TM Pentaho Data Mining 3

Wide Range of Algorithms No algorithm is likely to be optimal in all situations. For this reason it s important that you re able to try out a range to find the algorithm that fits a particular set of data the best. If you find several data mining algorithms that fit well, you can use all of them - for example: Based on analysis of 3 predictive models, the chances this customer will pay late are; Model A: 95% (96% correct), Model B: 89% (92% correct), Model C: 76% (97% correct). Knowledge Explorer lets you explore your data and prepare it for data mining. Apply Insights to Business Processes Integration with other components of the Pentaho BI Platform enable you to easily apply data mining to any workflow defined in the system (such as your cash cycle) and BI processes (such as mining report generation, receipt, and actions for compliance irregularities). This application is facilitated by the fact that every BI process (for example, report bursting) uses workflow to execute. Incorporate Additional Data for Greater Insight Extra data can be included, created or derived to add greater insight to your analyses This can occur when data is generated, or as part of the data preparation process. For example, when a sales report is burst you can choose to include the geographic region as an attribute so that you can use it in data mining later. Alternatively, you can add data during the process of preparing it for data mining - for example, calculating variances, averages, or other metrics. Pentaho TM Pentaho Data Mining 4

How Data Mining Works Choosing a Model Analysts can work with a range of models graphically. These include many advanced forms of data mining such as clustering, segmentation, decision trees, random forests, neural nets, and principal component analysis. Adding Data Value-added features can be added to the data. For example, you can specify thresholds and have the system automatically bucket or derive data to create new columns for analysis. Adapting Each model works to adapt its parameters to attempt a best fit to the sample data. Analysts can let this happen automatically, or manually adjust parameters (depending on the model). KnowledgeFlow shows you the flow of data through the system and the processes that it goes through. Evaluating Results can be evaluated by applying the model to historical data to test its predictive power compared to actual results. Perfecting The cycle of adapting the model until it is optimized is known as training the model. Once properly trained, the model will reliably yield the best results for the specific business purpose it is being applied to. Pentaho TM Pentaho Data Mining 5

Delivering Output can be in a multitude of forms. For example, you might choose to include a simple statement within another application, or output a graphical decision tree that users can navigate. Technology Powerful Data Mining Engine Provides a comprehensive set of machine learning algorithms from the Weka project including clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis. Pentaho has added integration with the Pentaho BI Platform and automated the process of transforming data into the format the data mining engine needs. Algorithms can either be applied directly to a dataset or called from Java code. Output can be viewed graphically, interacted with programmatically, or used data source for reports, further analysis, and other processes. Filters are provided for discretization, normalization, re-sampling, attribute selection, and transforming and combining attributes. Classifiers provide models for predicting nominal or numeric quantities. Learning schemes include decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, and other advanced techniques. The data mining engine is also well-suited for developing new machine learning schemes, enabling customers to incorporate their own models. Inputs and outputs can be controlled programmatically, enabling developers to create completely custom solutions using the components provided. Graphical Design Tools Graphical data mining design and administration tools are integrated as part of the Pentaho Workbench and delivered inside Eclipse. Graphical user interfaces are provided for data pre-processing, classification, regression, clustering, association rules, and visualization. Security and Compliance Provides role-based security and business rules. Supports Java Single Sign-On/JOSSO and LDAP to integrate with existing enterprise security. Audit trail for compliance purposes. Audit data can be readily reported on and integrated with workflow features included in the Pentaho BI Platform. Pentaho TM Pentaho Data Mining 6

Web Services, Repositories, XML-based definitions All components are provided with a Web Service interface so they can be used flexibly. Centralized repositories store the definitions of reports, templates, queries, and other content. All content definitions stored as XML to enable creation and modification by means other than the graphical user interfaces provided for example, manually or programmatically editing the XML. Scalability and Performance Designed for enterprise-scale deployments, facilitated by features such as application scalability by running on J2EE-compliant application servers such as JBoss Application Server (included), in addition to taking advantage of specific scalability features such as clustering. Pentaho TM Pentaho Data Mining 7