Database Marketing, Business Intelligence and Knowledge Discovery



Similar documents
DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

The Knowledge Discovery Process

SPATIAL DATA CLASSIFICATION AND DATA MINING

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Introduction. A. Bellaachia Page: 1

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Introduction to Data Mining

Fluency With Information Technology CSE100/IMT100

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

A Review of Data Mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques

DATA MINING AND WAREHOUSING CONCEPTS

Data Warehousing and Data Mining in Business Applications

Web Data Mining: A Case Study. Abstract. Introduction

Data Mining: An Introduction

CS590D: Data Mining Chris Clifton

Introduction to Data Mining

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

A Knowledge Management Framework Using Business Intelligence Solutions

Big Data. Introducción. Santiago González

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Data Mining + Business Intelligence. Integration, Design and Implementation

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Transforming the Telecoms Business using Big Data and Analytics

Foundations of Business Intelligence: Databases and Information Management

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

Data Mining for Fun and Profit

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Dynamic Data in terms of Data Mining Streams

not possible or was possible at a high cost for collecting the data.

DATA WAREHOUSING AND OLAP TECHNOLOGY

Welcome to the Era of Big Data and Predictive Analytics in Higher Education

Business Intelligence and Decision Support Systems

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Mining for Successful Healthcare Organizations

Prerequisites. Course Outline

Foundations of Business Intelligence: Databases and Information Management

Data Warehouse design

Subject Description Form

Research of Postal Data mining system based on big data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Data Mining and Soft Computing. Francisco Herrera

Framework for Data warehouse architectural components

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining Algorithms and Techniques Research in CRM Systems

Data Mining: Overview. What is Data Mining?

Cleaned Data. Recommendations

Performing a data mining tool evaluation

Course MIS. Foundations of Business Intelligence

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Foundations of Business Intelligence: Databases and Information Management

Importance or the Role of Data Warehousing and Data Mining in Business Applications

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

[callout: no organization can afford to deny itself the power of business intelligence ]

Data Warehouse: Introduction

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Introduction to Data Mining

How To Use Neural Networks In Data Mining

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

The University of Jordan

Pentaho Data Mining Last Modified on January 22, 2007

Business Intelligence: Effective Decision Making

Data Warehousing and Data Mining

Integrated Data Mining and Knowledge Discovery Techniques in ERP

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

An Overview of Database management System, Data warehousing and Data Mining

Knowledge Discovery from patents using KMX Text Analytics

Transcription:

Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski / Kurgan (2007) Data Mining: A Knowledge Discovery Approach,, Springer. 1

Database Marketing Database marketing is a form of direct marketing using databases of customers or potential customers to generate personalized communications in order to promote a product or service for marketing purposes. The distinction between direct and database marketing stems primarily from the attention paid to the analysis of data. Database marketing emphasizes the use of statistical techniques to develop models of customer behavior, which are then used to select customers for communications. 2

Database Marketing Classic database marketing Customer list (in-house or bought) Simple model based on past data E-mails, coupons, offers Database marketing 2.0 Integrated data source (internal, external) and warehouses Complex models (data mining, social network analysis) Communication channels include social media, direct web interactions (recommender systems), and many more 3

Business Intelligence Encompasses architectures, tools, applications, databases and methodologies for the collection, integration, analysis, and presentation of business information. The purpose of business intelligence is to support better business decision making. 4

BI Components and Architecture 5

Transactional vs. Analytical Data Processing Transactional processing takes place in operational systems that provide the organization with the capability to perform business transactions and produce transaction reports. This is done primarily for fast and efficient processing of routine, repetitive data. Supplementary activity to transaction processing is called analytical processing, which involves the analysis of accumulated data. Analytical processing, sometimes referred to as business intelligence, includes data mining, decision support systems (DSS), querying, and other analysis activities. These analyses place strategic information in the hands of decision makers to enhance productivity and make better decisions, leading to greater competitive advantage. 6

Business Analytics Business analytics is how organizations gather and interpret data in order to make better business decisions and to optimize business processes. In businesses, analytics (alongside data access and reporting) represents a subset of business intelligence (BI). Analytics are defined as the extensive use of data, statistical and quantitative analysis, explanatory and predictive modeling, and fact-based decision-making. Analytics may be used as input for human decisions, but there are also examples of fully automated decisions that require minimal human intervention. 7

Business Analytics 8

Knowledge Discovery The process of automatically searching large volumes of data for patterns that can be considered knowledge about the data Evolutionary stage Business question enabling technologies characteristic Data collection (1980s) What was my total revenue in the last 5 years? Computers,tapes, disks Retrospective, static data delivery Data access (1980s) What were unit sales in new England last March? Relational databases (RDBMS), structured query language (SQL) Retrospective, dynamic data delivery at record level Data warehousing and decision support (early 1990s) What were the sales in region A by product, by salesperson? OLAP, multidimensional databases, data warehouses Retrospective, proactive data delivery at multiple level Intelligent data mining (late 1990s) What s likely to happen to the Boston unit s sales next month? Why? Advanced algorithms, multiprocessor computers, massive databases Prospective, proactive information delivery Advanced intelligent systems; complete integration (2000-2004) What is the best plan to follow? How did we perform compared to metrics? Neural computing advanced Al models, complex optimization, web services Proactive, integrative ; multiple business partners 9

Data Mining Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns Prediction Methods: Use some variables to predict unknown or future values of other variables. Description Methods: Find human-interpretable patterns that describe the data. 10

Text Mining The application of data mining to non- structured or less-structured text files. Text mining helps organizations to do the following (1) find the hidden content of documents, including additional useful relationship and (2) group documents by common themes (e.g., identity all the customers of an insurance firm who have similar complaints). 11

Web Mining The application of data mining techniques to discover actionable and meaningful patterns, profiles, and trends from web resources. Web mining is used in the following areas: information filtering, mining of web- access logs for analyzing usage, assisted browsing,... 12

Data Life Cycle Process 13

Knowledge Discovery Process The knowledge discovery process (KDP) forms the overall process for extracting new knowledge from data. a sequence of steps (with feedback loops) that should be followed to discover new knowledge (e.g. patterns) a well-defined KDP model is a logical, cohesive, wellthought-out structure and approach that is presented to decision-makers who may have difficulty understanding the need, value, and mechanics behind a KDP to ensure the end product is useful for the user/owner of the data KD projects require a significant project management effort that needs to be grounded in a solid framework KD should follow other disciplines that have established models 14

Knowledge Discovery Process KDP is defined as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data: consists of many steps (one is Data Mining), each attempting at the completion of a particular discovery task, and accomplished by the application of a DM method concerns the entire KD process, including how the data is stored and accessed, how to use efficient and scalable algorithms to analyze large datasets, how to interpret and visualize the results, and how to model and support interaction between human and machine concerns support for learning and analyzing the application domain 15

Overview of the Knowledge Discovery Process consists of multiple steps, which are executed in a sequence the next step is initiated upon successful completion of the previous step, and requires the result generated by the previous step as its input. it stretches between the task of understanding the project domain and data, through data preparation and analysis, to evaluation, understanding and application of the generated results it is iterative, i.e. includes feedback loops that are triggered by revisions Input data (database, images, video, semi-structured data, etc.) STEP 1 STEP 2 STEP n- 1 STEP n Knowledge (patterns, rules, clusters, classification, associations, etc.) 16

Knowledge Discovery Process Models Popular KDP models include Nine-step model by Fayyad and colleagues academic CRISP-DM (CRoss-Industry Standard Process for Data Mining) model industrial Six-step KDP model by Cios and colleagues hybrid (academic/industrial) 17

Knowledge Discovery Process Models Nine-step model by Fayyad and colleagues Developing and Understanding of the Application Domain It includes learning the relevant prior knowledge, and the goals of the end-user of the discovered knowledge. Creating a Target Data Set It selects a subset of variables (attributes) and data points (examples), which will be used to perform discovery tasks. It usually includes querying the existing data to select the desired subset. Data Cleaning and Preprocessing It consists of removing outliers, dealing with noise and missing values in the data, and accounting for time sequence information and known changes. Data Reduction and Projection It consists of finding useful attributes by applying dimension reduction and transformation methods, and finding invariant representation of the data. 18

Knowledge Discovery Process Models Choosing the Data Mining Task It matches the goals defined in step 1 with a particular DM method, such as classification, regression, clustering, etc. Choosing the Data Mining Algorithm It selects methods for searching patterns in the data, and decides which models and parameters of the used methods may be appropriate. Data Mining It generates patterns in a particular representational form, such as classification rules, decision trees, regression models, trends, etc. Interpreting Mined Patterns It usually involves visualization of the extracted patterns and models, and visualization of the data based on the extracted models. Consolidating Discovered Knowledge It consists of incorporating the discovered knowledge into the performance system, and documenting and reporting it to the interested parties. It also may include checking and resolving potential conflicts with previously believed knowledge. 19

Knowledge Discovery Process Models CRISP-DM (CRoss-Industry Standard Process for Data Mining) model designed in late 1990s by four companies: Integral Solutions Ltd. (provider of commercial Data Mining solutions), NCR (database provider), Daimler Chrysler (automobile manufacturer), and OHRA (insurance company) CRISP-DM Special Interest Group was created to support the developed process model it includes over 300 users and tool/service providers the model consists of six steps 20

Knowledge Discovery Process Models CRISP-DM model Business Understanding It focuses on understanding objectives and requirements from a business perspective. It also converts them into a DM problem definition, and designs a preliminary project plan to achieve the objectives. It is further broken into several sub-steps: determination of business objectives assessment of situation determination of DM goals, and generation of project plan. Data Understanding It starts with an initial data collection and familiarization with the data. Specific aims include identification of data quality problems, discovery of initial insights into the data, and detection of interesting data subsets. It is further broken down into: collection of initial data description of data exploration of data, and verification of data quality 21

Knowledge Discovery Process Models CRISP-DM model Data Preparation It covers all activities to construct the final dataset, which constitutes the data that will be fed into DM tool(s) in the next step. It includes table, record, and attribute selection, data cleaning, construction of new attributes, and data transformation. This step is divided into: selection of data cleansing of data construction of data integration of data, and formatting of data sub-steps. 22

Knowledge Discovery Process Models CRISP-DM model Modeling It selects and applies various modeling techniques. It usually involves use of several methods for the same DM problem type, and calibration of their parameters to optimal values. Since some methods may require a specific format for input data, often reiteration into the previous step is necessary. This step is subdivided into: selection of modeling technique(s) generation of test design creation of models, and assessment of generated models. 23

Knowledge Discovery Process Models CRISP-DM model Evaluation After building one or more models that have high quality from a data analysis perspective, the model is evaluated from business objective perspective. The model is thoroughly evaluated, and review of the steps executed to construct the model is performed. A key objective is to determine if there are important business issues that have not been sufficiently considered. At the end of this phase, a decision on the use of the DM results should be reached. The key sub-steps in this step include: evaluation of the results process review, and determination of the next step. 24

Knowledge Discovery Process Models CRISP-DM model Deployment It involves organization and presentation of the discovered knowledge in a way that the customer can use. Depending on the requirements, this can be as simple as generating a report or as complex as implementing a repeatable KDP. This step is further divided into: planning of the deployment planning of the monitoring and maintenance generation of final report, and review of the process sub-steps. 25

Knowledge Discovery Process Models CRISP-DM model is characterized by an easy to understand vocabulary and good documentation acknowledges the strong iterative nature of the process with loops between several of the steps successful and extensively applied model, which is mainly because of its grounding in practical, industrial, real-world Knowledge Discovery experience 26

Knowledge Discovery Process Models Six-step model by Cios and colleagues developed based on the CRISP-DM model by adopting it to academic research; main differences and extensions include: providing more general, research-oriented description of the steps introducing the Data Mining step instead of the Modeling step introducing several new explicit feedback mechanisms. The CRISP-DM model has only three major feedback sources, while this model has more detailed feedback mechanisms modification of the last step; the discovered for a particular domain may be applied in other domains includes six steps 27

Knowledge Discovery Process Models Six-step model Understanding of the Problem Domain Understanding of the Data input data (database, images, video, semistructured data, etc.) Preparation of the Data Data Mining Evaluation of the Discovered Knowledge knowledge (patterns, rules, clusters, classifica- -tion, associations, etc.) Use of the Discovered Knowledge Extend knowledge to other domains 28

Knowledge Discovery Process Models Six-step model by Cios and colleagues Understanding of the Problem Domain It involves working closely with domain experts to define the problem and determine the project goals, identifying key people, and learning about current solutions to the problem. It also involves learning domainspecific terminology. A description of the problem, including its restrictions, is prepared. Finally, project goals are translated into the DM goals and initial selection of DM tools to be used later in the process is performed. Understanding of the Data It includes collection of sample data and deciding which data, including its format and size, will be needed. Background knowledge can be used to guide these efforts. Data is checked for completeness, redundancy, missing values, plausibility of attribute values, etc. Finally, the step includes verification of the usefulness of the data in respect to the DM goals. 29

Knowledge Discovery Process Models Preparation of the Data It concerns deciding which data will be used as input for DM methods in the next step. It involves sampling, running correlation and significance tests, data cleaning that includes checking completeness of data records, removing or correcting for noise and missing values, etc. The cleaned data may be further processed by feature selection and extraction algorithms (to reduce dimensionality), by derivation of new attributes (say by discretization), and by summarization of data (data granularization). The end results are data that meet specific input requirements for the selected in step 1 DM tools. Data Mining It involves using various DM methods to derive knowledge from preprocessed data. 30

Knowledge Discovery Process Models Evaluation of the Discovered Knowledge It includes understanding the results, checking whether the discovered knowledge is novel and interesting, interpreting of the results by domain experts, and checking the impact of the discovered knowledge. Only the approved models are retained and the entire process is revisited to identify which alternative actions could have been taken to improve the results. A list of errors made in the process is prepared. Use of the Discovered Knowledge It consists of planning where and how the discovered knowledge will be used. The application area in the current domain may be extended to other domains. A plan to monitor the implementation of the discovered knowledge is created and the entire project documented. Finally the discovered knowledge is deployed. 31

Knowledge Discovery Process Models Six-step model by Cios and colleagues this model identifies and describes explicit feedback loops from Understanding of the Data to the Understanding of the Problem Domain step; the loop is caused by needing additional domain knowledge to better understand the data from the Preparation of the Data to the Understanding of the Data step; the loop is caused by need for additional or more specific information about the data to guide the choice of data preprocessing algorithms from the Data Mining to the Understanding of the Problem Domain step; the reason could be unsatisfactory results generated by selected DM methods, requiring modification of the project s goals from the Data Mining to the Understanding of the Data step; the most common reason is poor understanding of the data, which results in incorrect selection of DM method and its subsequent failure 32

Knowledge Discovery Process Models from the Data Mining to the Preparation of the Data step; the loop is caused by need to improve data preparation. This is often caused by the specific requirements of the used DM method, which may have not been known during the Data Preparation step, from the Evaluation of the Discovered Knowledge to the Understanding of the Problem Domain step; the most common cause is invalidity of the discovered knowledge. Several possible reasons include incorrect understanding or interpretation of the domain, incorrect design or understanding of problem restrictions, requirements, or goals from the Evaluation of the Discovered Knowledge to the Data Mining; this loop is executed when the discovered knowledge is not novel, interesting, or useful. The least expensive solution is to choose a different DM tool and repeat the DM step. 33

Comparison of Knowledge Discovery Process Models Model domain of origin # steps Steps Fayyad et al. academic 9 1. Developing and Understanding of the Application Domain 2. Creating a Target Data Set Cios et al. hybrid (academic/industry) 6 1. Understanding of the Problem Domain 2. Understanding of the Data CRISP-DM industry 6 1. Business Understanding 2. Data Understanding Notes supporting software 3. Data Cleaning and Preprocessing 4. Data Reduction and Projection 5. Choosing the Data Mining Task 6. Choosing the Data Mining Algorithm 7. Data Mining 8. Interpreting Mined Patterns 9. Consolidating Discovered Knowledge the most popular model; provides detailed technical description with respect to data analysis, but lacks business aspects commercial system MineSet TM 3. Preparation of the Data 4. Data Mining 5. Evaluation of the Discovered Knowledge 6. Use of the Discovered Knowledge draws from both academic and industrial models; emphasizes iterative aspects; identifies and describes explicit feedback loops N/A 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment uses easy to understand vocabulary; has good documentation; commercial system Clementine reported application domains medicine, engineering, production, e-business, software medicine, software medicine, engineering, marketing, sales 34

Comparison of the Knowledge Discovery Process Models A very important aspect of the KDP is the relative time spent to complete each of the steps it enables precise scheduling estimates proposed by both researchers and practitioners are shown below specific estimated values depend on many factors, such as existing knowledge about the considered project domain, skills level of human resources, complexity of the problem, etc. data preparation step is by far the most time consuming step relative effort [%] 70 60 Cabena et al. estimates Shearer estimates Cios and Kurgan estimates 50 40 30 20 10 0 Understanding of Domain Understanding of Data Preparation of Data Data Mining Evaluation of Results Deployment of Results KDDM steps 35