Data Mining Fundamentals



Similar documents
Introduction to Data Mining

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Classification and Prediction

Social Media Mining. Data Mining Essentials

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining for Successful Healthcare Organizations

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Classification algorithm in Data mining: An Overview

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques

Business Intelligence: Effective Decision Making

Introduction to Data Mining

An Introduction to Data Mining

Building Data Cubes and Mining Them. Jelena Jovanovic

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Chapter 12 Discovering New Knowledge Data Mining

Foundations of Artificial Intelligence. Introduction to Data Mining

A Review of Data Mining Techniques

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

BIG DATA What it is and how to use?

Introduction to Data Mining Techniques

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Knowledge Discovery and Data. Data Mining vs. OLAP

Data Mining: Overview. What is Data Mining?

Data Mining as Part of Knowledge Discovery in Databases (KDD)

A Survey on Web Research for Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining + Business Intelligence. Integration, Design and Implementation

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

14. Data Warehousing & Data Mining

NEURAL NETWORKS IN DATA MINING

Classification and Prediction techniques using Machine Learning for Anomaly Detection.

Prediction of Heart Disease Using Naïve Bayes Algorithm

LVQ Plug-In Algorithm for SQL Server

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

AMIS 7640 Data Mining for Business Intelligence

Foundations of Business Intelligence: Databases and Information Management

Principles of Data Mining by Hand&Mannila&Smyth

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Azure Machine Learning, SQL Data Mining and R

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

from Larson Text By Susan Miertschin

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Importance or the Role of Data Warehousing and Data Mining in Business Applications

How To Perform An Ensemble Analysis

The Data Mining Process

Machine Learning: Overview

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Environmental Remote Sensing GEOG 2021

Data Mining Techniques

Information Management course

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Mining for Knowledge Management. Classification

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining for Fun and Profit

Data Mining Part 5. Prediction

SPATIAL DATA CLASSIFICATION AND DATA MINING

S.Thiripura Sundari*, Dr.A.Padmapriya**

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining and Machine Learning in Bioinformatics

Fluency With Information Technology CSE100/IMT100

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

Foundations of Business Intelligence: Databases and Information Management

Efficient Integration of Data Mining Techniques in Database Management Systems

Data Mining Analytics for Business Intelligence and Decision Support

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

RESEARCH PAPERS FACULTY OF MATERIALS SCIENCE AND TECHNOLOGY IN TRNAVA SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA

A Brief Tutorial on Database Queries, Data Mining, and OLAP

Visualization methods for patent data

MS1b Statistical Data Mining

AMIS 7640 Data Mining for Business Intelligence

TIM 50 - Business Information Systems

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations


Chapter ML:XI. XI. Cluster Analysis

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Supervised Learning Evaluation (via Sentiment Analysis)!

not possible or was possible at a high cost for collecting the data.

Adobe Insight, powered by Omniture

Foundations of Business Intelligence: Databases and Information Management

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Transcription:

Part I Data Mining Fundamentals

Data Mining: A First View Chapter 1

1.11 Data Mining: A Definition

Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned.

Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.

1.2 What Can Computers Learn?

Four Levels of Learning Facts Concepts Procedures Principles i

Facts A fact is a simple statement of truth.

Concepts A concept is a set of objects, symbols, or events grouped together because they share certain characteristics.

Procedures A procedure is a step-by-step course of action to achieve a goal.

Principles A principles are general truths or laws that are basic to other truths.

Computers & Learning Computers are good at learning concepts. Concepts are the output of a data mining session.

Three Concept Views Classical View Probabilistic View Exemplar View

Classical View All concepts have definite defining gproperties. p

Probabilistic View People store and recall concepts as generalizations created by observations.

Exemplar View People store and recall likely concept exemplars that are used to classify unknown instances.

Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome for new instances of unknown origin.

Supervised Learning: A Decision Tree Example

Decision Tree A tree structure where non-terminal nodes represent testst on one or more attributes and terminal nodes reflect decision i outcomes.

Tbl Table 1.1 11 Hypothetical Training i Data for Disease Diagnosis i Patient Sore Swollen ID# Throat Fever Glands Congestion Headache Diagnosis 1 Yes Yes Yes Yes Yes Strep throat 2 No No No Yes Yes Allergy 3 Yes Yes No Yes No Cold 4 Yes No Yes No No Strep throat 5 No Yes No Yes No Cold 6 No No No Yes No Allergy 7 No No Yes No No Strep throat 8 Yes No No Yes Yes Allergy 9 No Yes No Yes Yes Cold 10 Yes Yes No Yes Yes Cold

Swollen Glands No Yes Diagnosis = Strep Throat Fever No Diagnosis = Allergy Yes Diagnosis = Cold Figure 1.1 A decision tree for the data in Table 1.1

Table 1.2 Data Instances with an Unknown Classification Patient Sore Swollen ID# Throat Fever Glands Congestion Headache Diagnosis 11 No No Yes Yes Yes? 12 Yes Yes No No Yes? 13 No No No No Yes?

Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy

Unsupervised Clustering A data mining method that builds models dl from dt data without t predefined dfi d classes.

The Acme Investors Dataset Table 1.3 Acme Investors Incorporated Customer Account Margin Transaction Trades/ Favorite Annual ID Type Account Mthd Method Month Sex Age Recreation Income 1005 Joint No Online 12.5 F 30 39 Tennis 40 59K 1013 Custodial No Broker 0.5 F 50 59 Skiing 80 99K 1245 Joint No Online 3.6 M 20 29 Golf 20 39K 2110 Individual Yes Broker 22.3 M 30 39 Fishing 40 59K 1001 Individual Yes Online 5.0 M 40 49 Golf 60 79K

The Acme Investors Dataset & Supervised dlearning 1. Can I develop a general profile of an online investor? 2. Can I determine if a new customer is likely to open a margin account? 3. Can I build a model predict the average number of trades per month for a new investor? 4. What characteristics differentiate female and male investors?

The Acme Investors Dataset & Unsupervised Clustering 1. What attribute similarities group customers of Acme Investors together? 2. What differences in attribute values segment tthe customer database?

1.3 Is Data Mining Appropriate for My Problem?

Data Mining or Data Query? Shallow Knowledge Multidimensional Knowledge Hidden Knowledge Deep Knowledge

Shallow Knowledge Shallow knowledge is factual. It can be easily stored and manipulated in a database.

Multidimensional Knowledge Multidimensional knowledge is also factual. On-line analytical Processing (OLAP) tools are used to manipulate multidimensional knowledge.

Hidden Knowledge Hidden knowledge represents patterns or regularities in data that cannot be easily found using database query. However, data mining algorithms can find such patterns with ease.

Deep Knowledge Deep knowledge is knowledge stored in a database that can only be found if we are given some direction about what we are looking for.

Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.

1.4 Expert Systems or Data Mining? i

Expert System A computer program that emulates the problem-solving skills of one or more human experts.

Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge.

Data Data Mining Tool If Swollen Glands = Yes Then Diagnosis = Strep Throat Human Expert Knowledge Engineer Expert System Building Tool If Swollen Glands = Yes Then Diagnosis = Strep Throat Figure 1.2 Data mining vs. expert systems

1.5 A Simple Data Mining Process Model

Operational Database SQL Queries Data Warehouse Data Mining Interpretation & Evaluation Result Application Figure 1.3 A simple data mining process model

Assembling the Data The Data Warehouse Relational Databases and Flat Files

The Data Warehouse The data warehouse is a historical dtb database designed dfor decision ii support.

Mining the Data

Interpreting the Results

Result Application

1.6 Why Not Simple Search? Nearest Neighbor Classifier K-nearest Neighbor Classifier

Nearest Neighbor Classifier Classification is performed by searching the training data for the instance closest in distance to the unknown instance.

1.7 Data Mining Applications

Customer Intrinsic Value

Intrinsic (Predicted) Value _ X X X X X X X X X Actual Value Figure 1.4 Intrinsic vs. actual customer value