Introduction. A. Bellaachia Page: 1



Similar documents
Introduction to Data Mining

Information Management course

Introduction to Data Mining

Data Mining: Concepts and Techniques

Data Warehousing and Data Mining

Data Mining. Introduction to Modern Information Retrieval from Databases and the Web. Administrivia

Knowledge Discovery Process and Data Mining - Final remarks

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Introduction to Data Mining

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Data Mining Introduction

Data Warehousing and Data Mining

What is Data Mining?

Database Marketing, Business Intelligence and Knowledge Discovery

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining: Concepts and Techniques Chapter 1 Introduction

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining System, Functionalities and Applications: A Radical Review

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Mining Solutions for the Business Environment

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining as Part of Knowledge Discovery in Databases (KDD)

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Chapter ML:XI. XI. Cluster Analysis

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Hexaware E-book on Predictive Analytics

DATA MINING - SELECTED TOPICS

Data Mining for Fun and Profit

A Review of Data Mining Techniques

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Sunnie Chung. Cleveland State University

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Chapter 3: Cluster Analysis

not possible or was possible at a high cost for collecting the data.

DATA MINING AND WAREHOUSING CONCEPTS

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

DATA MINING: AN OVERVIEW

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Dynamic Data in terms of Data Mining Streams

Customer Classification And Prediction Based On Data Mining Technique

An Overview of Database management System, Data warehousing and Data Mining

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Sanjeev Kumar. contribute

Data Warehousing and Data Mining. A.A Datawarehousing & Datamining 1

MBA Data Mining & Knowledge Discovery

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Data Warehousing and Data Mining in Business Applications

Data Mining: Concepts and Techniques

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Concept and Applications of Data Mining. Week 1

Foundations of Business Intelligence: Databases and Information Management

Big Data Text Mining and Visualization. Anton Heijs

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Foundations of Business Intelligence: Databases and Information Management

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

An Introduction to Data Mining

Data Mining: Overview. What is Data Mining?

Data Mining Analytics for Business Intelligence and Decision Support

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining Techniques

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

Hexaware Webinar Series Presents: The Presentation Will Begin Momentarily

Data Mining - Introduction

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Foundations of Business Intelligence: Databases and Information Management

Association rules for improving website effectiveness: case analysis

An Overview of Knowledge Discovery Database and Data mining Techniques

Web Data Mining: A Case Study. Abstract. Introduction

A SURVEY ON WEB MINING TOOLS

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Introduction to Spatial Data Mining

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

The Data Mining Process

Transforming the Telecoms Business using Big Data and Analytics

Course MIS. Foundations of Business Intelligence

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Data Mining: Concepts and Techniques

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

DATA MINING TECHNIQUES AND APPLICATIONS

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

DATA MINING - 1DL105, 1Dl111

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Databases and Information Management

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Knowledge Discovery in Databases (KDD) - Data Mining (DM)

Transcription:

Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7. Data Mining: On What kind of Data?... 10 8. Potential Applications... 11 8.1. Market analysis and management... 11 8.2. Web Mining... 13 8.3. Text Mining... 15 9. Data Mining Systems and Tools... 16 10. Data Mining Functionalities... 16 11. Data Mining: A multi-disciplinary area... 18 12. Are All of the Patterns Interesting?... 19 13. Major Issues in Data Mining... 21 14. Sample Datasets... 22 A. Bellaachia Page: 1

A. Bellaachia Page: 2

1. Objectives Large amount of data kept in data files, databases, and web servers: Structured data Unstructured data Users are expecting more information from these data Marketing managers are interested in customers purchase behavior Simple structured/query language queries are not adequate to extract hidden information: Traditional SQL statements only retrieve a subset of the database. Evolution of database technology: Hierarchical Network Relational Extended Relational Semantic DB (ORDBMS, OODBMS) Overall advancement of computing A. Bellaachia Page: 3

2. What is Data Mining? Mining Gold from Rocks Simple Definition: Extract or mine knowledge from large amount of data. Data mining (knowledge discovery from data) o Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Alternative names o Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Watch out: Is everything data mining? o (Deductive) query processing. o Expert systems or small ML/statistical programs A. Bellaachia Page: 4

3. Knowledge Discovery Process Knowledg Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Selection & Transformation Data Cleaning Data Integration Data Sources Figure 1.4 of the textbook (Modified) A. Bellaachia Page: 5

Learning the application domain o Relevant prior knowledge and goals of application Data Cleaning: Remove noise data and irrelevant data (stopwords in case of unstructured data) Data Integration: Combine multiple data sources Data Selection: Get data relevant to the task to be analyzed Data Reduction and Transformation: Prepare data in a form appropriate for mining: Represent a text file as a vector Find useful features Reduce your space (dimensionality/variable). Data Mining: a process to extract data patterns, e.g., summarization, classification, regression, association, clustering. Pattern Evaluation: Evaluate the output of the data mining process. Knowledge Representation: Techniques to visualize mined knowledge. A. Bellaachia Page: 6

4. KD Process Example Web Log Mining o Selection: Select log data (dates and location) to use o Preprocessing: Remove identifying URLs Remove error logs o Transformation: Sessionize (sort and group) o Data Mining: Construct data structure Create frequent sequences o Interpretation/Evaluation: Cache prediction Personalization A. Bellaachia Page: 7

5. Typical Data Mining Architecture Graphical user interface Pattern evaluation Data mining engine Database or data warehouse server Data cleaning, data integration, and filtering Data Sources A. Bellaachia Page: 8

6. Database vs. Data Mining Query: - Well defined SQL Data: - Operational data Output: - Precise Subset of database Query: - Poorly defined No precise query language Data: - Not operational data Output: - Fuzzy - Not a subset of database Query Example: o Database: Find all credit applicants with last name of Smith. Identify customers who have purchase more than $10,000 in last month. Find all customers who have purchased milk o Data Mining: Find all credit applicants who are poor credit risks. (Classification) Identify customers with similar buying habits. (Clustering) Find all items that are frequently purchased with milk. (Association rules) A. Bellaachia Page: 9

7. Data Mining: On What kind of Data? Database Data warehouse Transactional database Object-oriented database Object-relational database Spatial data Temporal data and Time-series data Multimedia database Text Collections WWW A. Bellaachia Page: 10

8. Potential Applications 8.1. Market analysis and management Where does the data come from? o Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing o Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc. o Determine customer purchasing patterns over time Cross-market analysis o Associations/co-relations between product sales, & prediction based on such association Customer profiling o What types of customers buy what products (clustering or classification) Customer requirement analysis o Identifying the best products for different customers o Predict what factors will attract new customers Provision of summary information o Multidimensional summary reports o Statistical summary information (data central tendency and variation) Risk analysis and management A. Bellaachia Page: 11

Forecasting Customer retention Improved underwriting Competitive analysis Fraud detection and detection of unusual patterns (outliers) Detect unusual patterns Anti-Terrorism Intrusion detection in network security. Detection of credit card fraud. Money Laundering: Detect suspicious money transactions. Example: Terrorist Network [Ted Senator 2001] A. Bellaachia Page: 12

8.2. Web Mining Web content: Text + Links Help web architects understand users needs User profiling Site structure A. Bellaachia Page: 13

Taxonomy: Social Network Analysis: The web is an example of a social network Other Applications Text mining (news group, email, documents) Web mining Stream data mining DNA and bio-data analysis Web Usage Mining Analyze web log to mine web users behavior (search engine, e-commerce, etc.) Web personalization / collaborative filtering Detection of new emerging research areas Re-structure web sites based on users needs e-business intelligence, e-crm, etc. Web Content Mining Information filtering / knowledge extraction Web document categorization Detection of web categories and topics on the Web Web Structure Mining Finding "Quality" or "authoritative" sites based on linkage and citation IBM CLEVER project Google A. Bellaachia Page: 14

8.3. Text Mining Message filtering (e-mail, newsgroups, etc.) Newspaper articles analysis Text and document categorization A. Bellaachia Page: 15

9. Data Mining Systems and Tools See www.kdnuggets.com o Oracle: Darwin o IBM: Intelligence Miner o SAS: Enterprise Miner o Business Objects o SPSS: Clementine o Xchange: e-crm o Microsoft: SQL Server 2000 o Weka o Etc. 10. Data Mining Functionalities Concept description: Characterization and discrimination o Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions Association (correlation and causality) o Diaper Beer [0.5%, 75%] Classification and Prediction o Construct models (functions) that describe and distinguish classes or concepts for future prediction E.g., classify countries based on climate, or classify cars based on gas mileage o Presentation: decision-tree, classification rule, neural network o Predict some unknown or missing numerical values Cluster analysis A. Bellaachia Page: 16

o Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns o Maximizing intra-class similarity & minimizing interclass similarity Outlier analysis o Outlier: a data object that does not comply with the general behavior of the data o Noise or exception? No! Useful in fraud detection, rare events analysis Trend and evolution analysis o Trend and deviation: regression analysis o Sequential pattern mining, periodicity analysis o Similarity-based analysis Other pattern-directed or statistical analyses A. Bellaachia Page: 17

11. Data Mining: A multi-disciplinary area Database Systems Statistics Machine Learning Data Mining Visualization Algorithm Other Disciplines A. Bellaachia Page: 18

12. Are All of the Patterns Interesting? Typically, thousands of patterns might be generated. How to get interesting patterns? What is an interesting pattern? o If it is easily understood by humans o Valid on new or test data with some degree of certainty, o Potentially useful o Novel, or validates some hypothesis that a user seeks to confirm Objective vs. subjective interestingness measures o Objective: based on statistics and structures of patterns, o Example: support and confidence Association rules: Given an association rule: X Y o Rule support represents the percentage of transactions from a transaction database that the given rule satisfies. o Formally, it is the following probability: P(XUY) Where X U Y indicates a transaction that contains both X and Y. Formally, it is denoted: support(x Y) = P(X U Y) A. Bellaachia Page: 19

Confidence rules: Given an association rule: X Y It assesses the degree of certainty of the detected association. o Formally, it is the conditional probability: P(Y X) = The probability that a transaction containing X also contains Y. Formally, it is denoted: confidence(x Y) = P(Y X) o Subjective: based on user s belief in the data, e.g., unexpectedness, novelty, actionability, etc. A. Bellaachia Page: 20

13. Major Issues in Data Mining Mining methodology Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web Performance: efficiency, effectiveness, and scalability Pattern evaluation: the interestingness problem Incorporation of background knowledge Handling noise and incomplete data Parallel, distributed and incremental mining methods Integration of the discovered knowledge with existing one: knowledge fusion User interaction Data mining query languages and ad-hoc mining Expression and visualization of data mining results Interactive mining of knowledge at multiple levels of abstraction Applications and social impacts Domain-specific data mining & invisible data mining Protection of data security, integrity, and privacy A. Bellaachia Page: 21

14. Sample Datasets - Document Collection: duc_nist_sample_text.txt - 20 newsgroup - Web Log: weblog_sample.txt - Enron Data - Intrusion data - Medical data: Cancer Data A. Bellaachia Page: 22