Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining

Size: px
Start display at page:

Download "Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining"

Transcription

1 Tom Khabaza Hard Hats for Data Miners: Myths and Pitfalls of Data Mining

2 Hard Hats for Data Miners: Myths and Pitfalls of Data Mining By Tom Khabaza The intrepid data miner runs many risks, including being buried under mountains of data. Some risks are just myths that need to be debunked. Others, however, are real. In this article, I will debunk several of these myths and misconceptions and then describe some problems and pitfalls commonly encountered when conducting data mining, along with steps that you can take to protect yourself from them. A critical point to note is that data mining is a business process-a way of finding patterns in your data that provide insight you can use to conduct your business more effectively. Data mining also makes predictions to guide customer interactions and other business decisions. You'll see these points reinforced numerous times in the information that follows. Myths and misconceptions about data mining Myth #1: Data mining is all about algorithms A businessperson attending a typical data mining conference or reading its proceedings might form the impression that data mining is all about advanced data analysis algorithms. This misconception might be summarized as follows: "All you need for data mining is good algorithms. The better your algorithms, the better your data mining; advancing the effectiveness of data mining means advancing our knowledge of algorithms." To hold this view is to misunderstand the data mining process. Data mining is a process consisting of many elements, such as formulating business goals, mapping business goals to data mining goals, acquiring, understanding, and pre-processing the data, evaluating and presenting the results of analysis and deploying these results to achieve business benefits. This is not to minimize the importance of new or improved data mining algorithms. The problem occurs when data miners focus too much on the algorithms and ignore the other percent of the data mining process. The consequences this misconception can be disastrous for a data mining project, possibly resulting in a failure to produce any useful results. Experienced data miners recognize the need for a broader view of the data mining process.

3 Myth #2: Data mining is all about predictive accuracy While data mining is not all about data analysis algorithms, there is a part of data mining that is about algorithms. This raises the question, "How can you judge the quality of an algorithm?" You might think that the main criterion would be the predictive accuracy of the models it generates. This view, however, misrepresents the role of algorithms in the data mining process. It is true that a predictive model should have some degree of accuracy, because this demonstrates that it has truly discovered patterns in the data. However, the usefulness of an algorithm or model is also determined by a number of other properties, one of which is whether understanding the resulting model requires deep technical knowledge or is something that can be understood by a typical analyst. Data miners who believe that predictive accuracy is the primary criterion of algorithm evaluation might use algorithms that can only be used by technology experts. These algorithms will then play only the most limited role, because data mining is a process that is driven by business expertise; it relies on the input and involvement of non-technical business professionals in order to be successful. Myth #3: Data mining requires a data warehouse Business people often think that a data warehouse is a prerequisite for data mining. This is a subtle misconception about the relationship between the two technologies. It is true that data mining can benefit from warehoused data that is well organized, relatively clean, and easy to access. This is particularly true if the warehouse has been constructed with data mining specifically in mind and with knowledge of the requirements of the data mining project. If this has not been the case, however, the warehoused data may be less useful for data mining than the source or operational data. In the worst case, warehoused data may be completely useless (for example, if only summary data are stored). A more accurate depiction of the relationship between the two would be that data mining benefits from a properly designed data warehouse; and that constructing such a warehouse often benefits from first doing some exploratory data mining. Myth #4: Data mining is all about vast quantities of data Early explanations of data mining often began with statements like, "We now collect more data than ever, yet how are we to benefit from these vast data stores?" Focusing on the size of data stores provided a convenient introduction to the topic of data mining, but subtly misrepresented its nature. While there are many large datasets that organizations can benefit from mining, it would be a mistake to believe that these should be the sole focus of data mining. Many useful data mining projects are performed on small or medium-sized datasets-some, for example, containing only a few hundreds or thousands of records. Subscribing to the erroneous belief that data mining is only appropriate for vast data stores would lead organizations to choose tools that sacrifice usability for scalability when, in fact, both attributes are essential. To quote a customer of a leading data mining tool: "Other data mining tools optimize machine time, but this tool optimizes my time." Whether the datasets are large or small, organizations should choose a data mining tool that optimizes the user's time.

4 Myth #5: Data mining should be done by a technology expert full million examples, or even 500,000. Consider the following questions and answers: Data mining uses advanced technology, and its workings, particularly those of modeling techniques, are unlikely to be understood by the wider IT community. Does this mean that data mining should be conducted only by those who understand every nuance of the technology that is involved? Quite the opposite is true, due to the paramount importance of business knowledge in data mining. When performed without business knowledge, data mining can produce nonsensical or useless results (see pitfall #3, below), so it is essential that data mining be performed by someone with extensive knowledge of the business problem. Very seldom is this the same person with extensive knowledge of the data mining technology. It is the responsibility of data mining tool providers to ensure that tools are accessible to business users. Pitfalls of data mining and how to avoid them Pitfall #1: Buried under mountains of data Data mining should be an interactive, iterative process in which the analyst applies substantial business knowledge and is "engaged" with the data. However, those who hold myth #4 (that data mining is about vast quantities of data) often suppose that this process must be applied to all of the available data. This can lead to attempts to mine volumes of data for which the available hardware and software cannot provide an acceptable interactive response. In these situations, the data mining process becomes sluggish, and by the time a question is answered, the analyst cannot remember why it was asked. The way to avoid this pitfall is to employ some form of sampling. For example, if we have a million customers and a 20 percent annual attrition (or "churn") rate, we need not plot our graphs or build our models using the Q: How many churn profiles do we expect to find? A: Maybe ten Q: How many examples of each profile do we need? A: Maybe a thousand Therefore, a sample of ten or twenty thousand churners and an equivalent number of non-churners is likely to be sufficient for this analysis. Note that this does not mean that data miners will never encounter the need to build models from millions of examples; only that they should not assume that they must do so, just because the data are available. Pitfall #2: The Mysterious Disappearing Terabyte This is a common phenomenon, but not always a pitfall. It refers to the fact that, for a given data mining problem, the amount of available and relevant data may be much less than initially supposed. Consider the following scenario: You are a data mining consultant, and your client is a large bank, which wishes to mine its customer data to determine credit risk. The bank holds terabytes of data on its customers and is concerned that the available computing resources may be inadequate to mine this volume of data. Here's how the situation might unfold. Different types of credit (personal loans, business loans, overdrafts) present different patterns of credit risk, so each data mining project will concentrate on just one type of borrower. The bank's domain experts judge a number of factors to be relevant, and the bank, planning ahead, began collecting data on these factors about 18 months ago. Since then, almost a thousand cases of bad debt have occurred. Thus, the relevant data consist of less than a thousand cases of bad debt plus a sample from a plentiful supply of cases of good debt-let's say 3,000 records in all. Somehow, the need to mine terabytes of data has disappeared "mysteriously".

5 Pitfall #3: Disorganized data mining Data mining can occasionally, despite the best of intentions, take place in an ad hoc manner, with no clear goals and no idea of how the results will be used. This leads to wasted time and unusable results. To produce useful results, it is critical to have clearly defined business and data mining goals, formulated early in the project, and clearly articulated deployment plans. A simple way of ensuring this is to use a standard process such as the CRoss-Industry Standard Practice for Data Mining (CRISP-DM) [1]. Such a process ensures the correct preparation for data mining and provides a common language for communicating methods and results. Data mining tools should support standard process models. Pitfall #4: Insufficient business knowledge surprisingly hard to come by. It might be that the data expert has left the organization or moved to another department or, in the case of legacy systems, there may be no data expert at all. This problem is exacerbated when the database or data warehouse management is outsourced: the external supplier is even less motivated than the user organization to maintain this information "just in case it might be needed in future." There is no simple resolution to this problem. IT departments should be made aware of the need to maintain information about their organization's databases. Also, when a data mining project is proposed, data miners should consider how much data knowledge is available and evaluate any risks caused by its absence or scarcity. Pitfall #6: Erroneous assumptions, courtesy of the experts On a number of occasions this article has mentioned the crucial role that business knowledge plays in data mining. Without it, organizations can neither achieve useful results nor guide the data mining process towards them. It is sometimes supposed that the end user can reasonably tell the data miner: "Here are the data, please go away, do your data mining, and come back with the answers." If this were to happen, the project would, at best, take many long and costly iterations to produce useful results. At worst, the results would be gibberish, and the project would fail. This pitfall can only be avoided by involving, at every stage of the data mining process, both the end user and someone with a detailed knowledge of the business. Ideally, the data miner or data mining consultant would have the business knowledge. Lacking it, the data miner should literally sit next to someone with the required business knowledge who understands the question under consideration. For this to work effectively, a highly interactive data mining environment with good response time is required. Pitfall #5: Insufficient data knowledge In order to perform data mining, we must be able to answer questions like "What do the codes in this field mean?" and "Can there be more than one record per customer in this table?". In some cases, this information is Business and data experts are crucial resources, but this does not mean that the data miner should unquestioningly accept every statement they make. The data miner should seek to confirm the validity of experts' statements. Typical examples of erroneous or misleading statements might include: No customer can hold accounts of both these types No case will include more than one event of this type Only the following codes will be present in this field Data miners should verify statements like these by examining the data. This is particularly important when processing of the data will depend on their accuracy. Ideally, mistakes in assumptions about data can be spotted before they lead to errors in the treatment of data. Data mining tools should make this easy to accomplish. Pitfall #7: Incompatibility of data mining tools The data mining process requires a wide range of capabilities, so it's not unusual that during a single project a wide variety of tools might be used. This can, however, lead to high overhead costs due to the time and resources required to switch contexts and convert data from one format to another. At its worst, this can lead to the omission of necessary steps in the data mining process and can seriously interfere with the exploratory character of data mining.

6 The best solution is to use a data mining toolkit that integrates all the required capabilities. However, no toolkit will provide every possible capability, especially when the individual preferences of analysts are taken into account, so the toolkit should also be "open"-that is, able to interface easily with other available tools and third-party options. Pitfall #8: Locked in the data jail-house In addition to openness with regard to tools, data mining solutions should also be open with regard to data. Some data mining tools require the data to be held in a proprietary format that is not compatible with commonly used database systems. (This is sometimes referred to as the "data jail-house.") This can result in high overhead costs, due to the need for transferring data into the required format, and lead to difficulty in deploying the results into an organization's operational systems. A good data mining tool will interface with your data via common standards. Conclusion Data mining is a business process, requiring extensive business knowledge. It is best practiced by business experts or by data mining experts in close collaboration with business experts. Data mining uses a variety of techniques and should not focus only on modeling algorithms and their predictive accuracy. Each technique can play a variety of roles. During the data mining process, data miners interact and engage with the data in an iterative fashion. A standard data mining process model, such as CRISP-DM [1], helps to ensure the correct preparation for and use of data mining. Data mining tools should be evaluated based on their accessibility to business users, their scalability and usability, and their support for standard processes. Data miners should make intelligent decisions about the amount of data required, assuming neither that all of an organization's data will be relevant nor that all the available data will be required. Effective data mining requires flexible and interoperable techniques. This requirement is best met by integrated, open toolkits that can interface to data by means of open standards. References [1] Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. CRISP-DM 1.0 Step-by-step data mining guide, CRISP-DM Consortium, 2000, available at Weitere Information über SPSS erhalten Sie unter SPSS Schweiz AG, Schneckenmannstrasse 25, 8044 Zürich Telefon +41 (0) , Fax +41 (0) SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners SPSS Inc. All rights reserved. DamiD/0404

Hard hats for data miners: Myths and pitfalls of data mining

Hard hats for data miners: Myths and pitfalls of data mining Hard hats for data miners: Myths and pitfalls of data mining T. Khabaza SPSS Advanced Data Mining Group Abstract The intrepid data miner runs many risks, such as being buried under mountains of data or

More information

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining

More information

Planning successful data mining projects

Planning successful data mining projects IBM SPSS Modeler Planning successful data mining projects A practical, three-step guide to planning your first data mining project and selling it internally Contents: 1 Executive summary 2 One: Start with

More information

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. As a methodology, it includes descriptions of the typical phases

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: combines powerful analytical techniques with existing fraud detection and prevention efforts

More information

The top 10 secrets to using data mining to succeed at CRM

The top 10 secrets to using data mining to succeed at CRM The top 10 secrets to using data mining to succeed at CRM Discover proven strategies and best practices Highlights: Plan and execute successful data mining projects using IBM SPSS Modeler. Understand the

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: Combine powerful analytical techniques with existing fraud detection and prevention efforts Build

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Solve your toughest challenges with data mining

Solve your toughest challenges with data mining IBM Software IBM SPSS Modeler Solve your toughest challenges with data mining Use predictive intelligence to make good decisions faster Solve your toughest challenges with data mining Imagine if you could

More information

The Top 10 Secrets to Using Data Mining to Succeed at CRM

The Top 10 Secrets to Using Data Mining to Succeed at CRM The Top 10 Secrets to Using Data Mining to Succeed at CRM Discover proven strategies and best practices Highlights: Plan and execute successful data mining projects. Understand the roles and responsibilities

More information

Solve Your Toughest Challenges with Data Mining

Solve Your Toughest Challenges with Data Mining IBM Software Business Analytics IBM SPSS Modeler Solve Your Toughest Challenges with Data Mining Use predictive intelligence to make good decisions faster Solve Your Toughest Challenges with Data Mining

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

Step-by-step data mining guide

Step-by-step data mining guide Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth (DaimlerChrysler)

More information

CRISP-DM 1.0. Step-by-step data mining guide

CRISP-DM 1.0. Step-by-step data mining guide CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth

More information

From Cognitive Science to Data Mining: The first intelligence amplifier

From Cognitive Science to Data Mining: The first intelligence amplifier From Cognitive Science to Data Mining: The first intelligence amplifier Tom Khabaza Abstract This paper gives a brief account of two hypotheses. First that data mining is a kind of intelligence amplifier,

More information

How to Choose a Social Media Monitoring and Review Analytics Tool. Make sure the greatest possible range of data is indexed

How to Choose a Social Media Monitoring and Review Analytics Tool. Make sure the greatest possible range of data is indexed How to Choose a Social Media Monitoring and Review Analytics Tool by Josiah Mackenzie, ReviewPro Over the past year, a lot has changed in the hotel reputation management industry. And these changes require

More information

Successful Outsourcing of Data Warehouse Support

Successful Outsourcing of Data Warehouse Support Experience the commitment viewpoint Successful Outsourcing of Data Warehouse Support Focus IT management on the big picture, improve business value and reduce the cost of data Data warehouses can help

More information

IBM SPSS Modeler Premium

IBM SPSS Modeler Premium IBM SPSS Modeler Premium Improve model accuracy with structured and unstructured data, entity analytics and social network analysis Highlights Solve business problems faster with analytical techniques

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

Testing, What is it Good For? Absolutely Everything!

Testing, What is it Good For? Absolutely Everything! Testing, What is it Good For? Absolutely Everything! An overview of software testing and why it s an essential step in building a good product Beth Schechner Elementool The content of this ebook is provided

More information

Junk Research Pandemic in B2B Marketing: Skepticism Warranted When Evaluating Market Research Methods and Partners By Bret Starr

Junk Research Pandemic in B2B Marketing: Skepticism Warranted When Evaluating Market Research Methods and Partners By Bret Starr Junk Research Pandemic in BB Marketing: Skepticism Warranted When Evaluating Market Research Methods and Partners By Bret Starr Junk Research Pandemic in BB Marketing: Skepticism Warranted When Evaluating

More information

CRISP-DM: Towards a Standard Process Model for Data Mining

CRISP-DM: Towards a Standard Process Model for Data Mining CRISP-DM: Towards a Standard Process Model for Mining Rüdiger Wirth DaimlerChrysler Research & Technology FT3/KL PO BOX 2360 89013 Ulm, Germany ruediger.wirth@daimlerchrysler.com Jochen Hipp Wilhelm-Schickard-Institute,

More information

Three proven methods to achieve a higher ROI from data mining

Three proven methods to achieve a higher ROI from data mining IBM SPSS Modeler Three proven methods to achieve a higher ROI from data mining Take your business results to the next level Highlights: Incorporate additional types of data in your predictive models By

More information

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,

More information

MIPRO s Business Intelligence Manifesto: Six Requirements for an Effective BI Deployment

MIPRO s Business Intelligence Manifesto: Six Requirements for an Effective BI Deployment MIPRO s Business Intelligence Manifesto: Six Requirements for an Effective BI Deployment Contents Executive Summary Requirement #1: Execute Dashboards Effectively Requirement #2: Understand the BI Maturity

More information

Eleven Steps to Success in Data Warehousing

Eleven Steps to Success in Data Warehousing A P P L I C A T I O N S A WHITE PAPER SERIES BUILDING A DATA WAREHOUSE IS NO EASY TASK... THE RIGHT PEOPLE, METHODOLOGY, AND EXPERIENCE ARE EXTREMELY CRITICAL Eleven Steps to Success in Data Warehousing

More information

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power The Big Data Deluge: Creating Serious Business Problems Analytics: Harnessing Big Data Deluge to Acquire Business Power Predictive Analytics: The Holy Grail of Big Data Analytics The Predictive Analytics

More information

Why Data Mining Research Does Not Contribute to Business?

Why Data Mining Research Does Not Contribute to Business? Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy 1, Seppo Puuronen 1, Alexey Tsymbal 2 1 Dept. of Computer Science and Inf. Systems, University of Jyväskylä, Finland {mpechen,sepi}@cs.jyu.fi

More information

CS590D: Data Mining Chris Clifton

CS590D: Data Mining Chris Clifton CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to

More information

Data Project Extract Big Data Analytics course. Toulouse Business School London 2015

Data Project Extract Big Data Analytics course. Toulouse Business School London 2015 Data Project Extract Big Data Analytics course Toulouse Business School London 2015 How do you analyse data? Project are often a flop: Need a problem, a business problem to solve. Start with a small well-defined

More information

Is Cloud ERP Really Cheaper?

Is Cloud ERP Really Cheaper? Is Cloud ERP Really Cheaper? A Simple Guide to Understanding the Differences Between Cloud and On- Premise Distribution Software This guide attempts to outline all of the principal considerations that

More information

The Power of Business Intelligence in the Revenue Cycle

The Power of Business Intelligence in the Revenue Cycle The Power of Business Intelligence in the Revenue Cycle Increasing Cash Flow with Actionable Information John Garcia August 4, 2011 Table of Contents Revenue Cycle Challenges... 3 The Goal of Business

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

Top Seven Things Servicemembers Do To RUIN Their Credit (And What They Can Do To Prevent It).

Top Seven Things Servicemembers Do To RUIN Their Credit (And What They Can Do To Prevent It). Top Seven Things Servicemembers Do To RUIN Their Credit (And What They Can Do To Prevent It). By Peter G. Bielagus The Go To Guy For Young People and Their Money. www.peterbspeaks.com 1 Top Seven Things

More information

How Leverage Really Works Against You

How Leverage Really Works Against You Forex Trading: How Leverage Really Works Against You By: Hillel Fuld Reviewed and recommended by Rita Lasker 2012 Introduction: The Forex market is an ideal trading arena for making serious profits. However,

More information

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

More information

White Paper. Data Quality: Improving the Value of Your Data

White Paper. Data Quality: Improving the Value of Your Data White Paper Data Quality: Improving the Value of Your Data This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information ) of Informatica Corporation and may

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1. Introduction 1.1 Data Warehouse In the 1990's as organizations of scale began to need more timely data for their business, they found that traditional information systems technology

More information

White Paper. Self-Service Business Intelligence and Analytics: The New Competitive Advantage for Midsize Businesses

White Paper. Self-Service Business Intelligence and Analytics: The New Competitive Advantage for Midsize Businesses White Paper Self-Service Business Intelligence and Analytics: The New Competitive Advantage for Midsize Businesses Contents Forward-Looking Decision Support... 1 Self-Service Analytics in Action... 1 Barriers

More information

The Analysis of Quality Escapes in the Aerospace & Defense Industry

The Analysis of Quality Escapes in the Aerospace & Defense Industry The Analysis of Quality Escapes in the Aerospace & Defense Industry White Paper November 1, 2012 1825 Commerce Center Blvd Fairborn, Ohio 45324 937-322-3227 www.ren-rervices.com The Analysis of Quality

More information

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING Critical and creative thinking (higher order thinking) refer to a set of cognitive skills or strategies that increases the probability of a desired outcome. In an information- rich society, the quality

More information

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Start-up Companies Predictive Models Analysis Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Abstract: A quantitative research is performed to derive a model for predicting the success of Bulgarian start-up

More information

Analyzing the Customer Experience. With Q-Flow and SSAS

Analyzing the Customer Experience. With Q-Flow and SSAS Q.nomy Analyzing the Customer Experience With Q-Flow and SSAS Using Microsoft SQL Server Analysis Service to analyze Q-Flow data, and to gain an insight of customer experience. July, 2012 Analyzing the

More information

AD INSERTION STORAGE REQUIREMENTS AND CACHING WHITE PAPER

AD INSERTION STORAGE REQUIREMENTS AND CACHING WHITE PAPER AD INSERTION STORAGE REQUIREMENTS AND CACHING WHITE PAPER TABLE OF CONTENTS Introduction... 3 Ad Storage storage capacity limits, preload bandwidth, and caching... 3 Ad-spot lifetime... 4 Convenience of

More information

& ENTERPRISE DATA COST AND SCALE WAREHOUSE AUGMENTATION BIG DATA COST, SCALABILITY

& ENTERPRISE DATA COST AND SCALE WAREHOUSE AUGMENTATION BIG DATA COST, SCALABILITY COST AND SCALE BIG DATA COST, SCALABILITY & ENTERPRISE DATA 1 WAREHOUSE AUGMENTATION To derive the most value from Big Data technologies, enterprises must solve the cost and scalability problems inherent

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

IBM SPSS Data Mining Tips

IBM SPSS Data Mining Tips IBM SPSS Data Mining Tips A handy guide to help you save handy guide to help you save time and money as you plan and time and money as you plan and execute your data mining projects execute your data mining

More information

The Top 9 Ways to Increase Your Customer Loyalty

The Top 9 Ways to Increase Your Customer Loyalty Follow these and enjoy an immediate lift in the loyalty of your customers By Kyle LaMalfa Loyalty Expert and Allegiance Best Practices Manager What is the Key to Business Success? Every company executive

More information

Data Mining: An Introduction

Data Mining: An Introduction Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted

More information

The Analytics COE: the key to Monetizing Big Data via Predictive Analytics

The Analytics COE: the key to Monetizing Big Data via Predictive Analytics www.hcltech.com The Analytics COE: the key to Monetizing Big Data via Predictive Analytics big data & business analytics AuthOr: Doug Freud Director, Data Science WHITEPAPER AUGUST 2014 In early 2012 Ann

More information

analytics+insights for life science Descriptive to Prescriptive Accelerating Business Insights with Data Analytics a lifescale leadership brief

analytics+insights for life science Descriptive to Prescriptive Accelerating Business Insights with Data Analytics a lifescale leadership brief analytics+insights for life science Descriptive to Prescriptive Accelerating Business Insights with Data Analytics a lifescale leadership brief The potential of data analytics can be confusing for many

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Prescriptive Analytics. A business guide

Prescriptive Analytics. A business guide Prescriptive Analytics A business guide May 2014 Contents 3 The Business Value of Prescriptive Analytics 4 What is Prescriptive Analytics? 6 Prescriptive Analytics Methods 7 Integration 8 Business Applications

More information

Analytics For Everyone - Even You

Analytics For Everyone - Even You White Paper Analytics For Everyone - Even You Abstract Analytics have matured considerably in recent years, to the point that business intelligence tools are now widely accessible outside the boardroom

More information

IBM Cognos TM1 Enterprise Planning, Budgeting and Analytics

IBM Cognos TM1 Enterprise Planning, Budgeting and Analytics Data Sheet IBM Cognos TM1 Enterprise Planning, Budgeting and Analytics Overview Highlights Reduces planning cycles by 75% and reporting from days to minutes Owned and managed by Finance and lines of business

More information

SPSS Data Mining Tips

SPSS Data Mining Tips SPSS Data Mining Tips A handy guide to help you save time and money as you plan and execute your data mining projects www.spss.com Table of contents Introduction...........................2 What is data

More information

WHITE PAPER. Unified Monitoring Drives High- Performance Business Results

WHITE PAPER. Unified Monitoring Drives High- Performance Business Results WHITE PAPER Unified Monitoring Drives High- Performance Business Results Table of Contents EXEC SUMMARY... 1 INTRODUCTION... 1 THINK BEFORE YOU BUY... 2 The Pitfalls of Silos...2 Monitoring Tools: Less

More information

Predictive Analytics for Retail: Understanding Customer Behaviour

Predictive Analytics for Retail: Understanding Customer Behaviour Predictive Analytics for Retail: Understanding Customer Behaviour Jarlath Quinn Analytics Consultant Rachel Clinton Business Development www.sv-europe.com FAQ s Is this session being recorded? No Can I

More information

How To Measure Quality

How To Measure Quality Introduction Metrics for Software Testing: Managing with Facts Part 4: Product Metrics In the previous article in this series, we moved from a discussion of process metrics to a discussion of how metrics

More information

The Unfortunate Little Secret About Current CRM Data Cleansing. (And how it destroys your bottom line.)

The Unfortunate Little Secret About Current CRM Data Cleansing. (And how it destroys your bottom line.) The Unfortunate Little Secret About Current CRM Data Cleansing. (And how it destroys your bottom line.) Until now clean data was more myth than fact. That s because there is a crucial difference between

More information

5 Ways To Avoid Cash Flow Problems In Your Business

5 Ways To Avoid Cash Flow Problems In Your Business 5 Ways To Avoid Cash Flow Problems In Your Business The old maxim revenue is vanity, cash is reality is easy to forget in the whirlwind life of a business owner especially when things are going well. Unfortunately

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM IBM Global Business Services Microsoft Dynamics CRM solutions from IBM Power your productivity 2 Microsoft Dynamics CRM solutions from IBM Highlights Win more deals by spending more time on selling and

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

C A S E S T UDY The Path Toward Pervasive Business Intelligence at an Asian Telecommunication Services Provider

C A S E S T UDY The Path Toward Pervasive Business Intelligence at an Asian Telecommunication Services Provider C A S E S T UDY The Path Toward Pervasive Business Intelligence at an Asian Telecommunication Services Provider Sponsored by: Tata Consultancy Services November 2008 SUMMARY Global Headquarters: 5 Speen

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE DATA VISUALIZATION: When Data Speaks Business Jorge García, TEC Senior BI and Data Management Analyst Technology Evaluation Centers Contents About

More information

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi Judith Hurwitz President and CEO Sponsored by Hitachi Introduction Only a few years ago, the greatest concern for businesses was being able to link traditional IT with the requirements of business units.

More information

Frequency Matters. The keys to optimizing email send frequency

Frequency Matters. The keys to optimizing email send frequency The keys to optimizing email send frequency Email send frequency requires a delicate balance. Send too little and you miss out on sales opportunities and end up leaving money on the table. Send too much

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

Four Things You Must Do Before Migrating Archive Data to the Cloud

Four Things You Must Do Before Migrating Archive Data to the Cloud Four Things You Must Do Before Migrating Archive Data to the Cloud The amount of archive data that organizations are retaining has expanded rapidly in the last ten years. Since the 2006 amended Federal

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

www.moxonsolutions.com

www.moxonsolutions.com www.moxonsolutions.com Introduction Moxon Intelligence Systems is a specialist predictive analytics development company. We focus on delivering software, consulting and training solutions that enable the

More information

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. Peter R. Welbrock Smith-Hanley Consulting Group Philadelphia, PA ABSTRACT Developing

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Industry models for insurance. The IBM Insurance Application Architecture: A blueprint for success

Industry models for insurance. The IBM Insurance Application Architecture: A blueprint for success Industry models for insurance The IBM Insurance Application Architecture: A blueprint for success Executive summary An ongoing transfer of financial responsibility to end customers has created a whole

More information

Con-way Freight. Leveraging best-of-breed business intelligence for customer satisfaction. Overview. Before: a company with a vision

Con-way Freight. Leveraging best-of-breed business intelligence for customer satisfaction. Overview. Before: a company with a vision Con-way Freight Leveraging best-of-breed business intelligence for customer satisfaction Overview The need To analyze transaction-level details on an ad hoc basis to optimize efficiencies based on outlier

More information

Optimizing Enrollment Management with Predictive Modeling

Optimizing Enrollment Management with Predictive Modeling Optimizing Enrollment Management with Predictive Modeling Tips and Strategies for Getting Started with Predictive Analytics in Higher Education an ebook presented by Optimizing Enrollment with Predictive

More information

Increasing marketing campaign profitability with predictive analytics

Increasing marketing campaign profitability with predictive analytics Executive report Increasing marketing campaign profitability with predictive analytics Table of contents Introduction..............................................................2 Focusing on the customer

More information

Creating an Effective Mystery Shopping Program Best Practices

Creating an Effective Mystery Shopping Program Best Practices Creating an Effective Mystery Shopping Program Best Practices BEST PRACTICE GUIDE Congratulations! If you are reading this paper, it s likely that you are seriously considering implementing a mystery shop

More information

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk WHITEPAPER Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk Overview Angoss is helping its clients achieve significant revenue growth and measurable return

More information

Requirements Elicitation in Data Mining for Business Intelligence Projects

Requirements Elicitation in Data Mining for Business Intelligence Projects Requirements Elicitation in Data Mining for Business Intelligence Projects Paola Britos 1, Oscar Dieste 2 and Ramón García-Martínez 3 1 Software and Knowledge Engineering Center. Buenos Aires Institute

More information

Part II Management Accounting Decision-Making Tools

Part II Management Accounting Decision-Making Tools Part II Management Accounting Decision-Making Tools Chapter 7 Chapter 8 Chapter 9 Cost-Volume-Profit Analysis Comprehensive Business Budgeting Incremental Analysis and Decision-making Costs Chapter 10

More information

Predicting Churn. A SAS White Paper

Predicting Churn. A SAS White Paper A SAS White Paper Table of Contents Introduction......................................................................... 1 The Price of Churn...................................................................

More information

Deciding whether to purchase a tool or develop it in-house. by Elisabeth Hendrickson

Deciding whether to purchase a tool or develop it in-house. by Elisabeth Hendrickson Tools & Automation QUICK LOOK BuildIt Dispelling the myths surrounding both approaches Weighing your options or? BuyIt Deciding whether to purchase a tool or develop it in-house 32 You ve discovered that

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Business Case for Smart Care Software Product Portfolio

Business Case for Smart Care Software Product Portfolio Business Case for Smart Care Software Product Portfolio Contents Company Overview... 3 Growing Challenges with Mobile Device Support... 3 Solution... 4 Privacy and Security... 6 Financial Benefits... 7

More information

Exploratory Testing Dynamics

Exploratory Testing Dynamics Exploratory Testing Dynamics Created by James Bach, Jonathan Bach, and Michael Bolton 1 v2.2 Copyright 2005-2009, Satisfice, Inc. Exploratory testing is the opposite of scripted testing. Both scripted

More information

Data Mining with Microsoft SQL Server 2005

Data Mining with Microsoft SQL Server 2005 International DSI / Asia and Pacific DSI 2007 Full Paper (July, 2007) Data Mining with Microsoft SQL Server 2005 Henning Stolz 1), Peter Lehmann 1),Waranya Poonnawat 3) 1) Institute for Business Intelligence,

More information

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY Sérgio Moro and Raul M. S. Laureano Instituto Universitário de Lisboa (ISCTE IUL) Av.ª das Forças Armadas 1649-026

More information

Retail s Complexity: The Information Technology Solution

Retail s Complexity: The Information Technology Solution A P P L I C A T I O N S A WHITE PAPER SERIES COMPLEXITY OF PRODUCTS, SCALE AND PROCESSES, ALONG WITH SUPPLY CHAIN CHALLENGES, PLACE EVER GREATER DEMANDS ON RETAILERS. IT SYSTEMS ARE AT THE HEART OF RETAIL

More information

The Role of Knowledge Based Systems to Enhance User Participation in the System Development Process

The Role of Knowledge Based Systems to Enhance User Participation in the System Development Process The Role of Knowledge Based Systems to Enhance User Participation in the System Development Process Gian M Medri, PKBanken, Stockholm Summary: Computers are a fact of life today, even for the public in

More information

DATA MINING AND CRM IN TELECOMMUNICATIONS

DATA MINING AND CRM IN TELECOMMUNICATIONS www.sjm.tf.bor.ac.yu Serbian Journal of Management 3 (1) (2008) 61-72 Serbian Journal of Management Abstract DATA MINING AND CRM IN TELECOMMUNICATIONS D. Ćamilović* BK Faculty of Management, Palmira Toljatija

More information

An Enterprise Framework for Business Intelligence

An Enterprise Framework for Business Intelligence An Enterprise Framework for Business Intelligence Colin White BI Research May 2009 Sponsored by Oracle Corporation TABLE OF CONTENTS AN ENTERPRISE FRAMEWORK FOR BUSINESS INTELLIGENCE 1 THE BI PROCESSING

More information

Database Marketing simplified through Data Mining

Database Marketing simplified through Data Mining Database Marketing simplified through Data Mining Author*: Dr. Ing. Arnfried Ossen, Head of the Data Mining/Marketing Analysis Competence Center, Private Banking Division, Deutsche Bank, Frankfurt, Germany

More information

Data Quality Assessment. Approach

Data Quality Assessment. Approach Approach Prepared By: Sanjay Seth Data Quality Assessment Approach-Review.doc Page 1 of 15 Introduction Data quality is crucial to the success of Business Intelligence initiatives. Unless data in source

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information