Introduction. Jun Du The University of Western Ontario

Size: px
Start display at page:

Download "Introduction. Jun Du The University of Western Ontario"

Transcription

1 Introduction Jun Du The University of Western Ontario

2 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 1

3 Why Data Mining? The Explosive Growth of Data: from terabytes to petabytes Hardware Data collection and data availability Automated data collection tools, database systems, Web Major sources of abundant data Business: Web, e-commerce, transactions, stocks, Science: Remote sensing, bioinformatics, Society and everyone: news, digital cameras, YouTube, facebook, We have everything ready for data But, data is useless, unless it becomes knowledge We are drowning in data, but starving for knowledge! 2

4 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 3

5 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, predictive modeling, data science, business intelligence, etc. Examples of data mining: Search engine (Google, Bing, Yahoo, ) Online shopping (Amazon, ebey, ) Social network (Facebook, LinkedIn, ) service (uwo, gmail, hotmail, ) 4

6 Knowledge Discovery (KDD) Process Pattern Evaluation Task-relevant Data Data Mining Data Warehouse Selection Data Cleaning Data Integration Databases 5

7 Data Mining in Business Intelligence Increasing potential to support business decisions Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery End User Business Analyst Data Analyst Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems DBA 6

8 KDD Process: A Typical View from ML and Statistics Input Data Data Pre- Processing Data Mining Post- Processing Data integration Normalization Feature selection Dimension reduction Pattern discovery Association & correlation Classification Clustering Outlier analysis Pattern evaluation Pattern selection Pattern interpretation Pattern visualization 7

9 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 8

10 Multi-Dimensional View of DM Data to be mined What kind of data can be mined? Knowledge to be mined What kind of pattern can be mined? Techniques utilized What technology are used? Applications adapted What kind of applications are targeted? 9

11 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 10

12 On What Kinds of Data? Most commonly used: Table data (in raw format or in relational database) Advanced data sets Transaction data Data streams and sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked data Spatial data and spatiotemporal data Multimedia data Text data The World-Wide Web Poll (June 2011) What data types you analyzed/mined in the past 12 months? 11

13 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 12

14 Association Rule Given a set of transaction records each of which contains some items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Story of Diaper and Beer Rules Discovered: {Milk} --> {Coke} {Diaper} --> {Beer} 13

15 Association Rule Application 1 Marketing and Sales Promotion: Let the rule discovered be {Bagels} --> {Potato Chips} If bagels are on sale, potato chips might go fast as well. If the store discontinues selling bagels, potato chips selling might be affected. 14

16 Association Rule Application 2 Supermarket shelf management Let the rule discovered be Diaper Beer Can put beer beside diaper, customers might feel convenient; Or, can put beer far away from diaper, customers might pick up some other items on their way from diaper to beer ; 15

17 10 10 Classification & Regression Construct models (functions) based on some existing data and make predictions on some future unseen data Tid Refund Marital Status Taxable Income 1 Yes Single 125K No Cheat Refund Marital Status Taxable Income Cheat 2 No Married 100K No No Single 75K? 3 No Single 70K No Yes Married 50K? 4 Yes Married 120K No No Married 150K? 5 No Divorced 95K Yes Yes Divorced 90K? 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes No Single 40K? No Married 80K? Test Set 9 No Married 75K No 10 No Single 90K Yes Training Set Learning Algorithm Model 16

18 Classification Application 1 Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. Approach: Use the data for a similar product introduced before. We know which customers decided to buy and which decided otherwise. This {buy, don t buy} decision forms the class attribute. Collect customer data (demographic, lifestyle, etc.) Use this information as input attributes to learn a classification model. New York Times article (Feb, 2012): How Companies Learn Your Secrets 17

19 Classification Application 2 Fraud Detection Goal: Predict fraudulent cases in credit card transactions. Approach: Use credit card transactions and the information on its accountholder as attributes. When, what and where does a customer buy, etc Label past transactions as fraud or fair transactions (class attribute). Learn a model for the class of the transactions. Use this model to detect fraud transactions. 18

20 Classification Application 3 Customer Attrition/Churn: Goal: To predict whether a cell-phone plan customer is likely to be lost to a competitor. Approach: Use detailed record of transactions with each of the past and present customers, to find attributes. How often the customer calls, where he calls, what time-of-the day he calls most, his financial status, marital status, etc. Label the customers as loyal or disloyal. Find a model for loyalty. 19

21 Clustering Given a set of data points, each having a set of attributes, group data points into different clusters. Data points in one cluster are more similar to each other. Data points in separate clusters are less similar to each other. 20

22 Clustering Application Market Segmentation: Goal: subdivide a market into distinct subsets of customers, which may be selected as market targets Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters. 21

23 Outlier Analysis Outlier: A data object that does not comply with the general behavior of the data Noise or exception? One person s garbage could be another person s treasure Methods: classification, regression, clustering, Application: Credit Card Fraud Detection Network Intrusion Detection 22

24 Other Patterns Recommendation system people you might know (Facebook) jobs you might be interested (LinkedIn) people who bought this product also bought (Amazon) movies (Tvs) that you might like to watch (Netflix). Social network analysis A new and very popular area Can be applied to a lot of applications: fraud detection, marketing, terrorism and crime prevention, 23

25 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 24

26 Data Mining: Confluence of Multiple Disciplines Machine Learning Pattern Recognition Statistics Applications Data Mining Visualization Algorithm Database Technology High-Performance Computing 25

27 Top 10 Algorithms in DM IEEE International Conference of Data Mining Decision Trees 2. The K-Means Algorithm 3. Support Vector Machines 4. The Apriori Algorithm 5. The EM Algorithm 6. PageRank Algorithm 7. AdaBoost Algorithm 8. K-Nearest Neighbor Algorithm 9. Naive Baye 10. CART Algorithm 26

28 Algorithms in DM Kdnuggets Poll (Nov, 2011) Algorithms for data analysis / data mining Rexer Analytics Survey (2012) 27

29 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 28

30 Applications of Data Mining Kdnuggets Poll (December, 2011): Industries / Fields where you applied Data Mining in 2011 Rexer Analytics Survey (2012) 29

31 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 30

32 10 Challenging Problems in DM IEEE International Conference of Data Mining Developing a Unifying Theory of Data Mining 2. Scaling Up for High Dimensional Data and High Speed Data Streams 3. Mining Sequence Data and Time Series Data 4. Mining Complex Knowledge from Complex Data 5. Data Mining in a Network Setting 6. Distributed Data Mining and Mining Multi-agent Data 7. Data Mining for Biological and Environmental Problems 8. Data-Mining-Process Related Problems 9. Security, Privacy and Data Integrity 10. Dealing with Non-static, Unbalanced and Cost-sensitive Data 31

33 Hot Topics and Trends in DM Kdnuggets Poll (Jan, 2012) Hottest Analytics / Data Mining Topics in 2012 Rexer Analytics Survey (2012) 32

34 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 33

35 Data Mining Conferences Conferences ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining (KDD) IEEE Int. Conf. on Data Mining (ICDM) SIAM Data Mining Conf. (SDM) European Conf. on Machine Learning and Principles and Practices of Knowledge Discovery and Data Mining (ECML-PKDD) Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD) Other Related Conferences DB conferences: ACM SIGMOD, VLDB, ICDE Web and IR conferences: WWW, SIGIR, CIKM ML conferences: ICML, NIPS AI conferences: IJCAI, AAAI 34

36 Journals and Online Resources Data Mining Journals Data Mining and Knowledge Discovery (DMKD) IEEE Trans. On Knowledge and Data Eng. (TKDE) KDD Explorations ACM Trans. on KDD Online Resources Kdnuggets Kaggle UCI Machine Learning Repository 35

37 Software Kdnuggets poll (May 2012) What Analytics, Data mining, Big Data software you used in the past 12 months for a real project? Rexer Analytics Survey (2012) 36

38 Programming Languages Kdnuggets poll (August 2012) Programming languages for analytics / data mining? 37

39 Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technology Are Used? What Kind of Applications Are Targeted? Trends and Challenges in Data Mining Data Mining Resources Summary 38

40 Summary Data mining: Discovering interesting patterns and knowledge from massive amount of data A natural evolution of science and information technology, in great demand, with wide applications A KDD process includes data pre-processing, data mining, data post-processing pattern, and knowledge presentation Mining can be performed in a variety of data Data mining patterns: association, classification, clustering, outlier analysis, recommendation system, social network analysis, etc. A variety of data mining technologies and applications Data mining resources 39

Knowledge Discovery and Data Mining. Course Outlines

Knowledge Discovery and Data Mining. Course Outlines Knowledge Discovery and Data Mining Unit # 1 1 Course Outlines Classification Techniques Classification/Decision Trees Naïve Bayes Neural Networks Clustering Partitioning Methods Hierarchical Methods Patterns

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 1 Introduction SURESH BABU M ASST PROF IT DEPT VJIT 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind of

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web

More information

Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining

Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining Introduction of Information Visualization and Visual Analytics Chapter 4 Data Mining Books! P. N. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining. First Edition, ISBN-13: 978-0321321367, 2005.

More information

Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence. Introduction to Data Mining Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 1: Introduction Instructor: Yizhou Sun yzsun@ccs.neu.edu January 8, 2013 Course Information Class homepage: http://www.ccs.neu.edu/home/yzsun/classes/2013spring _CS6220/index.htm

More information

Introduction to Data Mining

Introduction to Data Mining Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Data Mining: Introduction

Data Mining: Introduction Data Mining: Introduction Introducing the course How the course is organized How students are evaluated Deadlines Data Mining [Chapt. 1 of course book] What is it about? The KDD process Relations to other

More information

CSE4334/5334 Data Mining Lecturer 2: Introduction to Data Mining. Chengkai Li University of Texas at Arlington Spring 2016

CSE4334/5334 Data Mining Lecturer 2: Introduction to Data Mining. Chengkai Li University of Texas at Arlington Spring 2016 CSE4334/5334 Data Mining Lecturer 2: Introduction to Data Mining Chengkai Li University of Texas at Arlington Spring 2016 Big Data http://dilbert.com/strip/2012-07-29 Big Data http://www.ibmbigdatahub.com/infographic/four-vs-big-data

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Organization Lectures Mondays and Thursdays from 10:30 to 12:30 Lecturer: Mouna Kacimi Office hours: appointment by email Labs Thursdays from 14:00 to 16:00 Teaching Assistant:

More information

CS 412 Intro. to Data Mining

CS 412 Intro. to Data Mining CS 412 Intro. to Data Mining Chapter 1. Introduction Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2106 1 August 27, 2016 Data Mining: Concepts and Techniques 2 August 27, 2016 Data

More information

Data Privacy and Data Security. Chapter 1. Introduction to Data Mining

Data Privacy and Data Security. Chapter 1. Introduction to Data Mining Data Privacy and Data Security Chapter 1 Introduction to Data Mining Jun Zhang January 13, 2011 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Why Mine Data? Commercial Viewpoint Lots of

More information

DATA MINING - 1DL105, 1Dl111

DATA MINING - 1DL105, 1Dl111 1 DATA MINING - 1DL105, 1Dl111 Fall 2006 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht06 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Machine Learning Introduction

Machine Learning Introduction Machine Learning Introduction Jeff Howbert Introduction to Machine Learning Winter 2012 1 Course logistics (1) Course: CSS 490 / 590, Introduction to Machine Learning course website: http://courses.washington.edu/css490/2012.winter/

More information

Quick Introduction of Data Mining Techniques

Quick Introduction of Data Mining Techniques Quick Introduction of Data Mining Techniques *Sources partially from Introduction to Data Mining, by P.-N. Tan, M. Steinbach, V. Kumar, Addison-Wesley, 2005. Main Data Mining Techniques Link Analysis Associations

More information

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT?

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? Data mining is mainly used for decision making in business. The abundance of data, coupled with the need for powerful data analysis tools, has been described

More information

Web Mining by means of Concept Lattice Theory. Dr. Joyee Yi Zhao FernUniversität in Hagen

Web Mining by means of Concept Lattice Theory. Dr. Joyee Yi Zhao FernUniversität in Hagen Web Mining by means of Concept Lattice Theory Dr. Joyee Yi Zhao FernUniversität in Hagen 1 Outline * Data mining Web mining Concept Lattice Theory Concept Lattices based web mining Conclusion 2 Why Is

More information

Data Mining. Introduction

Data Mining. Introduction Data Mining Introduction Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Computers

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Databases: Visualization, Data Mining, New DB Paradigms. Thomas Weik FH Münster

Databases: Visualization, Data Mining, New DB Paradigms. Thomas Weik FH Münster Databases: Visualization, Data Mining, New DB Paradigms Thomas Weik FH Münster 9. Basic Mining Strategies 9.0 References 9.1 Motivation 9.2 Classification 9.3 Clustering 9.4 Association Rule Discovery

More information

Data Mining. Yeow Wei Choong Anne Laurent

Data Mining. Yeow Wei Choong Anne Laurent Data Mining Yeow Wei Choong Anne Laurent Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card

More information

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees

More information

Machine Learning Introduction

Machine Learning Introduction Machine Learning Introduction Jeff Howbert Introduction to Machine Learning Winter 2014 1 Course logistics (1) Course: CSS 581, Introduction to Machine Learning course website: http://courses.washington.edu/css581/

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2012/2013 Free University of Bozen, Bolzano DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html Organization

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Slides related to: Data Mining: Concepts and Techniques Chapter 1 and 2 Introduction and Data preprocessing Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign

More information

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113 CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining MIT-652 Data Mining Applications Thimaporn Phetkaew School of Informatics, Walailak University MIT-652: DM 1: Introduction to Data Mining 1 Introduction Motivation: Why data

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Introduksi Data Mining. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Introduksi Data Mining. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Introduksi Data Mining S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 DM-MA/S1IF/FTI/UKM/2010 Agenda Pendahuluan Definisi Data Mining Data Mining Steps Data Mining Tasks

More information

Algorithms for Big Data. Dr. Jianye HAO Associate Professor School of Software Tianjin University

Algorithms for Big Data. Dr. Jianye HAO Associate Professor School of Software Tianjin University Algorithms for Big Data Dr. Jianye HAO Associate Professor School of Software Tianjin University Lecturer Jianye HAO ( 郝建业 ) Associate Professor at School of Software, Tianjin University Office: 55-A319

More information

Visual Data Mining and Document Collections Visualization

Visual Data Mining and Document Collections Visualization Universidade de São Paulo, São Carlos/SP, Brasil Instituto de Ciências Matemáticas e de Computação (ICMC) Departamento de Ciências da Computação Introduction Visualization and Data Analysis Visual Data

More information

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Machine Learning, Data Mining, and Knowledge Discovery: An Introduction Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico

More information

Non-trivial extraction of implicit, previously unknown and potentially useful information from data. Knowledge Discovery in Databases (KDD)

Non-trivial extraction of implicit, previously unknown and potentially useful information from data. Knowledge Discovery in Databases (KDD) Non-trivial extraction of implicit, previously unknown and potentially useful information from data Knowledge Discovery in Databases (KDD) 2 What is not Data Mining? Look up phone number in phone directory

More information

cse537 Artificial Intelligence

cse537 Artificial Intelligence cse537 Artificial Intelligence Part 2: Data Mining Professor Anita Wasilewska Computer Science Department Stony Brook University Textbook Course Part 2 Textbook: Jianwei Han, Micheline Kamber DATA MINING

More information

Data Mining: Concepts and Techniques. Chapter 1

Data Mining: Concepts and Techniques. Chapter 1 Data Mining: Concepts and Techniques Chapter 1 Richong Zhang Office: New Main Building, G521 Email:zhangrc@act.buaa.edu.cn This slide is made based on the slides provided by Jiawei Han, Micheline Kamber,

More information

Data Mining. Shahram Hassas Math 382 Professor: Shapiro

Data Mining. Shahram Hassas Math 382 Professor: Shapiro Data Mining Shahram Hassas Math 382 Professor: Shapiro Agenda Introduction Major Elements Steps/ Processes Examples Tools used for data mining Advantages and Disadvantages What is Data Mining? Described

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining Information Retrieval and Data Mining. Prof. Matteo Matteucci

Data Mining Information Retrieval and Data Mining. Prof. Matteo Matteucci Data Mining Prof. Matteo Matteucci Slide Credits and References 2 These slides have been heavily taken from: Resources for Instructors and Students for Introduction to Data Mining by Pang-Ning Tan, Michael

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Knowledge Discovery in Databases Javier Béjar cbea CS - MIA AMLT - 2016/2017 Javier Béjar cbea (CS - MIA) Knowledge Discovery in Databases AMLT - 2016/2017 1 / 32 Outline 1 Knowledge Discovery in Databases

More information

Introduction to Data Mining. Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

Introduction to Data Mining. Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Introduction to Data Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Overview Introduction The Data Mining Process The Basic Data Types The Major Building Blocks Scalability and Streaming

More information

Chapter 1: Introduction

Chapter 1: Introduction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 1: Introduction Lecture: Prof. Dr. Thomas

More information

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis Data Foundations Data Attributes and Features Data Pre-processing Data Storage Data Analysis 1 Data Attributes Describing data content and characteristics Representing data dimensions Set of all attributes:

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Data Mining and Information retrieval

Data Mining and Information retrieval Data Mining and Information retrieval Pedro Contreras pedro@cs.rhul.ac.uk Department of Computer Science Royal Holloway, University of London 20 February 2008 Overview, Lecture I Data Mining What s Data?

More information

Data Mining: Opportunities and Challenges

Data Mining: Opportunities and Challenges Data Mining: Opportunities and Challenges Xindong Wu University of Vermont, USA; Hefei University of Technology, China ( 合 肥 工 业 大 学 计 算 机 应 用 长 江 学 者 讲 座 教 授 ) 1 Deduction Induction: My Research Background

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES 1: Introduction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 8, 2014 Course Information Course homepage: http://www.ccs.neu.edu/home/yzsun/classes/ 2014Fall_CS6220/index.htm

More information

MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

More information

Lecture Slides for. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for. ETHEM ALPAYDIN The MIT Press, Lecture Slides for ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Why Learn? Machine learning is programming computers to optimize a performance criterion

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Research Challenges for Data Mining in Science and Engineering

Research Challenges for Data Mining in Science and Engineering Research Challenges for Data Mining in Science and Engineering Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj October 17, 2007 Why Data Mining

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Administrative I am trying to get supporting team to agree to keep your pgsql account for another semester (until Dec 2009).

More information

Mining Association Rules

Mining Association Rules Mining Association Rules (based on notes by Jiawei Han and Micheline Kamber) 4/4/2014 www.onlineexamnepal.com 1 Agenda Association rule mining Mining single-dimensional Boolean association rules from transactional

More information

Classification Basic Concepts, Decision Trees, and Model Evaluation

Classification Basic Concepts, Decision Trees, and Model Evaluation Classification Basic Concepts, Decision Trees, and Model Evaluation Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification definition Given a collection of samples (training set) Each

More information

Data mining is described as the method of comparing large volumes of data, looking

Data mining is described as the method of comparing large volumes of data, looking Data Mining Shahram Hassas California State University, Northridge General Terms: Data Mining Additional Key Words and Phrases: Data Mining Data mining is described as the method of comparing large volumes

More information

Data Warehousing & Data Mining IT434

Data Warehousing & Data Mining IT434 Data Warehousing & Data Mining IT434 Lab Instructors Ms. Wejdan Alkaldi Ms. Sumayah Al-Rabiaah Ms. Weam AlRashed Note: when you email me, please insert [IT434]

More information

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

More information

PREDICTING MISSING ITEMS IN SHOPPING CART USING ASSOCIATIVE CLASSIFICATION MINING

PREDICTING MISSING ITEMS IN SHOPPING CART USING ASSOCIATIVE CLASSIFICATION MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 11, November 2013,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives

More information

DATA MINING INTRO LECTURE. Introduction

DATA MINING INTRO LECTURE. Introduction DATA MINING INTRO LECTURE Introduction Instructors Aris (Aris Anagnostopoulos) Yiannis (Ioannis Chatzigiannakis) Mailing list Register to the list of Pierpaolo Brutti. What is Data Science? What is Data

More information

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Jeff Ullman

More information

DATA AND WEB MINING. Introduction

DATA AND WEB MINING. Introduction DATA AND WEB MINING Introduction Salvatore Orlando The slides of this course were partly taken up by tutorials and courses available on the web. More specifically: Vipin Kumar, Data mining course at University

More information

This lecture is based on the following resources - slides: G.Piatetsky-Shapiro: Association Rules and Frequent Item Analysis. and partly on J.

This lecture is based on the following resources - slides: G.Piatetsky-Shapiro: Association Rules and Frequent Item Analysis. and partly on J. Association rules Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 10 SE Master Course 2008/2009 This lecture is based on the following

More information

Introduction to Classification, aka Machine Learning

Introduction to Classification, aka Machine Learning Introduction to Classification, aka Machine Learning Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes

More information

Performance Comparison of Naïve Bayes and J48 Classification Algorithms

Performance Comparison of Naïve Bayes and J48 Classification Algorithms Performance Comparison of Naïve Bayes and J48 Classification Algorithms Anshul Goyal and Rajni Mehta, Assistant Professor,CSE JCDM COE, Kurukshetra University, Kurukshetra JCDM COE, Kurukshetra University,

More information

What is Data Mining?

What is Data Mining? Introduction What is Data Mining? Data Mining: Concepts and Techniques Slides for Course Data Mining Chapter 1 Jiawei Han 1 Necessity Is the Mother of Invention Data explosion problem Automated data collection

More information

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University Mining Big Data Pang-Ning Tan Associate Professor Dept of Computer Science & Engineering Michigan State University Website: http://www.cse.msu.edu/~ptan Google Trends Big Data Smart Cities Big Data and

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

ECS289- Data Mining Lecture Outline

ECS289- Data Mining Lecture Outline ECS289- Data Mining Lecture Outline What is Data Mining? Absurdly Simple Student Grade Example Fundamental Tasks in Data Mining Course Structure Overview Emphasis on social networks and graphs Our aim

More information

Digital Humanities Data Mining with Weka

Digital Humanities Data Mining with Weka Digital Humanities Data Mining with Weka THATCamp Center for History and New Media, GMU Huzefa Rangwala Assistant Professor, Computer Science George Mason University Email: rangwala@cs.gmu.edu Website:

More information

3. Mining Association Rules in Large Databases. Chunping Li. Association Rule. Some Typical Applications

3. Mining Association Rules in Large Databases. Chunping Li. Association Rule. Some Typical Applications Data Analysis and Mining 3. Mining Association Rules in Large Databases 3. Mining Association Rules in Large Databases Chunping Li cli@tsinghua.edu.cn Association rule mining Mining single-dimensional

More information

What is Data? Also called samples, examples, instances, data points, objects, tuples.

What is Data? Also called samples, examples, instances, data points, objects, tuples. What is Data? Data sets are made up of data objects. A data object represents an entity. Also called samples, examples, instances, data points, objects, tuples. Data objects are described by attributes.

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

INTRODUCTION TO DATA MINING

INTRODUCTION TO DATA MINING INTRODUCTION TO DATA MINING Alka Arora Indian Agricultural Statistics Research Institute, New Delhi-11012 1. INTRODUCTION The fast developing computer science and engineering techniques has made the information

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 10 What is Data? Collection of data objects and their attributes Attributes An attribute is a property

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Dr. Hui Xiong Rutgers University Questions? Instructor: Dr. Hui Xiong Office Hours: Ackerson 200K Wednesday 11:00AM 12:00pm Office Phone: 973 353 5261 Email: hxiong@rutgers.edu

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES 1: Introduction Instructor: Yizhou Sun yzsun@ccs.neu.edu January 13, 2016 Course Information Course homepage: http://www.ccs.neu.edu/home/yzsun/classes/ 2016Fall_CS6220/index.htm

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecture Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e CHAPTER 1: INTRODUCTION Big Data 3 Widespread

More information

Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com A Purpose

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

Massive Data Analytics

Massive Data Analytics 1 Massive Data Analytics Data Mining Introduction Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Jeff Ullman Data

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Abstract: Keywords: association rules, frequent itemsets, closed frequent itemsets and maximal frequent itemsets

Abstract: Keywords: association rules, frequent itemsets, closed frequent itemsets and maximal frequent itemsets Suggestions to Improve the Efficiency of Association Rules Techniques in Data mining Prof. Dr. Hillal Hadi Salih Dr. Soukaena Hassan Hashem Shaimaa Akram Abstract: Data mining is a process that uses a

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

A Review on Role of Data Mining Techniques in Enhancing Educational Data to Analyze Student s Performance

A Review on Role of Data Mining Techniques in Enhancing Educational Data to Analyze Student s Performance A Review on Role of Data Mining Techniques in Enhancing Educational Data to Analyze Student s Performance 1 Dr. N. Preethi, 2 Deepak Goswami 1 Assistant Professor, Jain University, Bangalore, India 2 PG

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified for I211) Tan,Steinbach, Kumar

More information