Chapter 3: Cluster Analysis

Size: px
Start display at page:

Download "Chapter 3: Cluster Analysis"

Transcription

1 Chapter 3: Cluster Analysis 3.1 Basic Concepts of Clustering 3.2 Partitioning Methods 3.3 Hierarchical Methods 3.4 Density-Based Methods 3.5 Model-Based Methods 3.6 Clustering High-Dimensional Data 3.7 Outlier Analysis Definition Statistical-Based Methods Distance-Based Methods Density-Based Local Methods Deviation-Based Methods

2 3.7.1 Definition Outliers: data objects that do not comply with the general behavior or model of the data Outlier detection or analysis is referred to as Outlier Mining Outlier mining has different applications Fraud detection Detecting unusual usage of telecommunication services Identifying the spending behavior of costumers with extremely low or extremely high incomes Finding unusual responses to various medical treatments Etc.

3 Outlier Mining Given a set if n data objects and k expected number of outliers Find the top k objects that are considerably Dissimilar Exceptional Inconsistent with respect to the remaining data The outlier mining problem can be seen as two sub-problems 1) Define what data can be considered as inconsistent in a given data set 2) Find an efficient method to mine the outliers so defined Data visualization methods are weak in detecting data with many categorical attributes or data of high dimensionality Investigate computer-based techniques to detect outliers

4 3.7.2 Statistical Distribution-Based Methods Assume a distribution model for the given data set(e.g., Normal) Identify outliers w. r. t the model using a discordancy test How does it work? Examine two hypothesis working hypothesis alternative hypothesis A working hypothesis H is a statement that the entire data set of n objects comes from an initial distribution model F that is: H: o i F, where i=1,2,,n The hypothesis H is retained if there is no statistically significant evidence supporting its rejection

5 Discordancy Test Verifies whether an object o i is significantly large(or small) in relation to the distribution F Principle Choose a some statistic T for discordancy testing Consider the value v i of an object o i If significance probability SP(vi) is sufficiently small o i is discordant The working hypothesis is rejected An alternative hypothesis H which says that o i comes from a another distribution model G is adopted The result depends on the model F is chosen because o i may be an outlier under one model and perfectly valid value under another

6 Discordancy Test: Example Let o 1,,o n represent the data objects Compute the sample mean µ and the standard deviation σ If the an object o i is suspected to be an outlier Compute the test statistic T T = i µ o σ If T exceeds some critical value, then o i is an outlier

7 Discordancy Test: Example Consider the following ordered data: 3.84, 4.26, 4.53, 4.60, 5.28, 5.29, 5.74, 5.86 Consider an additional sample P: 10 (it is suspected that this point might be an outlier) Compute µ and σ without the suspected outlier µ = 5.48, σ = 1.82 T = = 2.48 With n=9 and level of significance α=0.05, the critical value is T>2.110, then there is an evidence that P is an outlier

8 Alternative Distributions Inherent Alternative Distribution The working hypothesis that all objects come from distribution F is rejected Alternative hypothesis assume that all objects come from another distribution G H: o i G, where i=1,2,,n F and G: different distributions F and G : the same distribution but with different parameters Distribution G must have the potential to produce outliers (a different mean, or dispersion, or a longer tail)

9 Alternative Distributions Mixture Alternative Distribution The discordant values are not outliers in F population but contaminants from some other population G The alternative hypothesis is H: o i (1-λ) F+ λg, where i=1,2,,n Slippage Alternative Distribution All objects (except a small number) are from initial model F, with its given parameters The remaining objects are from a modified version of F in which the parameters have been shifted

10 Characteristics of Statistical-Based Methods Tests are for single attributes Need to find outliers in multidimensional space Statistical approaches require knowledge about parameters of the data set Statistical methods do not guarantee that all outliers will be found No specific test was developed The distribution cannot be adequately modeled with any standard distribution

11 3.7.2 Distance-Based Methods Generalize the test-based techniques Distance-based outliers are those objects that do not have enough neighbors Formally Define DB(pct, dmin)-outlier: a distance based outlier with parameters pct and dmin An object o is DB(pct, dmin)-outlier if at least a fraction pct of the objects lie at a distance greater than dmin from o Avoids excessive computation related to fitting the observed data into some standard distribution and selecting discordancy tests

12 Distance-Based Algorithms Index-based algorithms Use multidimensional indexing structures such as R-trees or k- d trees to search for neighbors of each object o

13 Distance-Based Algorithms Find neighbors of object o within a radius dmin M is the maximum number of objects within the dmin-neighborhood of an outlier Once M+1 objects of object o are found, then o is not an outlier Complexity of O(n 2 k) N: number of objects K: dimensionality Complexity is in search time. Building the index can be computationally very expensive

14 Distance-Based Algorithms Cell-based algorithms The data space is partitioned into cells with a side length equal to dmin 2 k dmin: radius around objects K: dimensionality Each cell has two layers surrounding it First layer is 1-cell thick 2 k 1 Second layer is thick, rounded up to the closest integer

15 Distance-Based Algorithms Cell-based algorithms Count outliers on a cell-by-cell rather than object-by-object basis For a given cell, the algorithm accumulates three counts The number of objects on the cell C The number of objects in the cell and the first layer C+1 The number of objects In the cell and the second layer C+2 How to determine outliers with these counts?

16 Distance-Based Algorithms Cell-based algorithms Assume M to be a threshold used to detect outliers An object o is considered as an outlier if C+1 <M, else all the objects in the cell are considered as non outliers If C+2 <M, all the objects in the cell are considered outliers If C+2 >M, it is possible that some objects in the cell are outliers do object-by-object processing to detect outliers only objects that have less than M objects in their dminneighborhood are outliers the dmin-neighborhood consist of the object s cell, all of its first layer and some of its second layer

17 Characteristics of Distance-Based Methods Avoid O(n 2 ) computational complexity Its complexity is O(c k +n) c is a constant depending on the number of cells k the dimensionality n number of objects Developed for memory-resident data sets Requires the user to set both dmin and pct Finding suitable settings for these parameters can involve much trial and error

18 3.7.3 Density-Based Methods Statistical and distance-based methods depend on the overall global distribution of data Data are usually not uniformly distributed Data can have different density distributions C 1 C o 2 2 o 1

19 Density-Based Methods Define Local Outliers An object is a local outlier if it is outlying relative to its local neighborhood (w. r. t the density of the neighborhood ) Does not consider being an outlier as a binary property Asses the degree to which an object is an outlier The degree of the outlierness is computed as the Local Outlier Factor(LOF) of an object The degree depends on how isolated the object is with respect to the surrounding neighborhood Detect global and local outliers

20 Density-Based Methods To define the local outlier factor of an object, the following concepts should be introduced K-distance K-distance neighborhood Reachability distance Local reachability distance

21 K-distance & K-distance neighborhood The k-distance of an object p is the maximal distance that p gets from its k-nearest neighbors Denoted k-distance(p) p How k is determined? LOF method sets k to the parameter MinPts used in the densitybased clustering (e.g., Minpts=4) [MinPts-distance] K-distance neighborhood of an object p contains the MinPtsnearest neighbors of p Denoted N k-distance (P) or N k (P), also N MinPts p

22 Reachability distance The reachability-distance of an object q with respect to object o (where o is within the MinPts-nearest neighbors of P) is denoted reach_distminpts(p,o) p Reach_distMinPts (p,o)=max{minpts_distance(o), d(p,o)} If p is far away from o, the reachability distance between the two is simply their actual distance If they are close, then the actual distance is replaced by the MinPts_distance of o

23 Local Outlier Factor (LOF) The local reachability density of p is the inverse of the average reachability density based on the MinPts-nearest neighbors of p lrd MinPts (p) = o NMinPts(p) NMinPts(P) reach_dist MinPts (p,o) The local outlier factor (LOF) of p captures the degree to which we call p an outlier LOF MinPts (p) = o NMinPts(P) N MinPts Ird Ird ( P) MinPts MinPts ( o) ( P)

24 3.7.4 Deviation-Based Methods Identify outliers by examining the main characteristics of objects on a group Objects that deviate from this description are outliers The term deviation is used to refer to outliers Two main methods Sequential Exception Technique OLAP Data Cube Technique

25 Summary of Chapter 3 A cluster is a collection of data objects that are similar within the same cluster and dissimilar to the objects on other clusters Clustering can be used as a main task to gain insights about the data a preprocessing step for other data mining algorithms Several applications Market segmentation Pattern recognition Biological studies Spatial data analysis Web document classification, etc.

26 Summary of Chapter 3 The quality of clustering can be assessed based on dissimilarity of objects Many techniques have been developed Partitioning Methods Hierarchical methods Density-based methods Grid-based methods Model-based methods Clustering high dimensional data Constrained-based methods

27 Applications and Tools in Data Mining Summary

28 1. Financial Data Analysis Banks and Institutions offer a wise variety of banking services Checking and savings accounts for business or individual customers Credit business, mortgage, and automobile loans Investment services (mutual funds) Insurance services and stock investment services Financial data is relatively complete, reliable, and of high quality What to do with this data?

29 1. Financial Data Analysis Design of data warehouses for multidimensional data analysis and data mining Construct data warehouses (data come from different sources) Multidimensional Analysis: e.g., view the revenue changes by month. By region, by sector, etc. along with some statistical information such as the mean, the average, the maximum and the minimum values, etc. Characterization and class comparison Outlier analysis

30 1. Financial Data Analysis Loan Payment Prediction and costumer credit policy analysis Attribute selection and attribute relevance ranking may help indentifying important factors and eliminate irrelevant ones Example of factors related to the risk of loan payment Term of the loan Debt ratio Payment to income ratio Customer level income Education level Residence region The bank can adjust its decisions according to the subset of factors selected (use classification)

31 2. Retail Industry Collect huge amount of data on sales, customer shopping history, goods transportation, consumption and service, etc. Many stores have web sites where you can buy online. Some of them exist only online (e.g., Amazon) Data mining helps to Identify costumer buying behaviors Discover customers shopping patterns and trends Improve the quality of costumer service Achieve better costumer satisfaction Design more effective good transportation Reduce the cost of business

32 2. Retail Industry Design data warehouses Multidimensional analysis Analysis of the effectiveness of sales campaigns Advertisements, coupons, discounts, bonuses, etc Comparing transactions that contain sales items during and after the campaign Costumer retention Analyze the change in costumers behaviors Product Recommendation Mining association rules Display associative information to promote sales

33 3. Telecommunication Industry Many different ways of communicating Fax, cellular phone, Internet messenger, images, e- mail, computer and Web data transmission, etc. Great demand of data mining to help Understanding the business involved Indentifying telecommunication patterns Catching fraudulent activities Making better use of resources Improve the quality of service

34 3. Telecommunication Industry Multidimensional analysis (several attributes) Several features: Calling time, Duration, Location of caller, Location of callee, Type of call, etc. Compare data traffic, system workload, resource usage, user group behavior, and profit Fraudulent Pattern Analysis Identify potential fraudulent users Detect attempts to gain fraudulent entry to costumer accounts Discover unusual patterns (outlier analysis)

35 4. Many Other Applications Biological Data Analysis E.g., identification and analysis of human genomes and other species Web Mining E.g., explore linkage between web pages to compute authority scores (Page Rank Algorithm) Intrusion detection Detect any action that threaten file integrity, confidentiality, or availability of a network resource

36 How to Choose a Data Mining System (Tool)? Do data mining system share the same well defined operations and a standard query language? No Many commercial data mining system have a little in common Different functionalities Different methodology Different data sets You need to carefully choose the data mining system that is appropriate for your task

37 How to Choose a Data Mining System (Tool)? Data Types Available systems handle formatted record-based, relational-like data with numerical, and nominal attributes That data could be on the form of ASCII text, relational databases, or data warehouse data It is important to check which kind of data the system you are choosing can handle Operating System A data mining system may run only on one operating system The most popular operating systems that host data mining tools are UNIX/LINUX and Microsoft Windows Large industry data mining systems adopt client-server architecture

38 How to Choose a Data Mining System (Tool)? Data Sources Data formats Some systems work only with ASCII test files, whereas many other work with databases It is important that the data mining system supports ODBC connections (Open Database Connectivity) Data Mining functions and Methodologies Some systems provide only one data mining function(e.g., classification). Other system support many functions For a given data mining function (e.g., classification), some systems support only one method. Other systems may support many methods (k-nearest neighbor, naive Bayesian, etc.) Data mining system should provide default settings for non experts

39 How to Choose a Data Mining System (Tool)? Coupling data mining with databases(data warehouse) systems No Coupling A DM system will not use any function of a DB/DW system Fetch data from particular resource (file) Process data and then store results in a file Loose coupling A DM system use some facilities of a DB/DW system Fetch data from data repositories managed by a DB/DW Store results in a file or in the DB/DW Semi-tight coupling Efficient implementation of few essential data mining primitives (sorting, indexing, histogram analysis) is provided by the DB/DW Tight coupling A DM system is smoothly integrated into the DB/DW Data mining queries are optimized Tight coupling is highly desirable because it facilitates implementations and provide high system performance

40 How to Choose a Data Mining System (Tool)? Scalability Query execution time should increase linearly with the number of dimensions Visualization A picture is worth a thousand words The quality and the flexibility of visualization tools may strongly influence usability, interpretability and attractiveness of the system Data Mining Query Language and Graphical user Interface High quality user interface It is not common to have a query language in a DM system

41 Examples of Commercial Data Mining Tools Database system and graphics vendors Intelligent Miner (IBM) Microsoft SQL Server 2005 MineSet (Purple Insight) Oracle Data Mining (ODM)

42 Examples of Commercial Data Mining Tools Vendors of statistical analysis or data mining software Clementine (SPSS) Enterprise Miner (SAS Institute) Insightful Miner (Insightful Inc.)

43 Examples of Commercial Data Mining Tools Machine learning community CART (Salford Systems) See5 and C5.0 (RuleQuest) Weka developed at the university Waikato (open source)

44 End of The Data Mining Course Questions? Suggestions?

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

CS590D: Data Mining Chris Clifton

CS590D: Data Mining Chris Clifton CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 11 Applications and Trends in Data Mining SURESH BABU M ASST PROFESSOR VJIT 1 Applications and Trends in Data Mining Data mining applications Data mining system

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Outlier Detection in Clustering

Outlier Detection in Clustering Outlier Detection in Clustering Svetlana Cherednichenko 24.01.2005 University of Joensuu Department of Computer Science Master s Thesis TABLE OF CONTENTS 1. INTRODUCTION...1 1.1. BASIC DEFINITIONS... 1

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining. Vera Goebel. Department of Informatics, University of Oslo Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

2.1. Data Mining for Biomedical and DNA data analysis

2.1. Data Mining for Biomedical and DNA data analysis Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: simmibagga12@gmail.com) Dr. G.N. Singh Department of Physics and

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Organization Lectures Mondays and Thursdays from 10:30 to 12:30 Lecturer: Mouna Kacimi Office hours: appointment by email Labs Thursdays from 14:00 to 16:00 Teaching Assistant:

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

How To Perform An Ensemble Analysis

How To Perform An Ensemble Analysis Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Importance or the Role of Data Warehousing and Data Mining in Business Applications Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Use of Data Mining in Banking

Use of Data Mining in Banking Use of Data Mining in Banking Kazi Imran Moin*, Dr. Qazi Baseer Ahmed** *(Department of Computer Science, College of Computer Science & Information Technology, Latur, (M.S), India ** (Department of Commerce

More information

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

An Overview of Database management System, Data warehousing and Data Mining

An Overview of Database management System, Data warehousing and Data Mining An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,

More information

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Data Mining as Part of Knowledge Discovery in Databases (KDD) Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users 1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

III JORNADAS DE DATA MINING

III JORNADAS DE DATA MINING III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE

More information

Knowledge Discovery Process and Data Mining - Final remarks

Knowledge Discovery Process and Data Mining - Final remarks Knowledge Discovery Process and Data Mining - Final remarks Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 14 SE Master Course 2008/2009

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

Applications and Trends in Data Mining

Applications and Trends in Data Mining ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:

More information

DATA MINING ALPHA MINER

DATA MINING ALPHA MINER DATA MINING ALPHA MINER AlphaMiner is developed by the E-Business Technology Institute (ETI) of the University of Hong Kong under the support from the Innovation and Technology Fund (ITF) of the Government

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Two-Phase Data Warehouse Optimized for Data Mining

Two-Phase Data Warehouse Optimized for Data Mining Two-Phase Data Warehouse Optimized for Data Mining Balázs Rácz András Lukács Csaba István Sidló András A. Benczúr Data Mining and Web Search Research Group Computer and Automation Research Institute Hungarian

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining 1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify

More information

Knowledge Discovery in Data with FIT-Miner

Knowledge Discovery in Data with FIT-Miner Knowledge Discovery in Data with FIT-Miner Michal Šebek, Martin Hlosta and Jaroslav Zendulka Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno {isebek,ihlosta,zendulka}@fit.vutbr.cz

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Data Mining and Marketing Intelligence

Data Mining and Marketing Intelligence Data Mining and Marketing Intelligence Alberto Saccardi 1. Data Mining: a Simple Neologism or an Efficient Approach for the Marketing Intelligence? The streamlining of a marketing campaign, the creation

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

OLAP Theory-English version

OLAP Theory-English version OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy Agenda The Market Why OLAP (On-Line-Analytic-Processing Introduction

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011 DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Chapter 6 - Enhancing Business Intelligence Using Information Systems Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning

More information

LVQ Plug-In Algorithm for SQL Server

LVQ Plug-In Algorithm for SQL Server LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico licinia.monteiro@tagus.ist.utl.pt I. Executive Summary In this Resume we describe a new functionality implemented

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

How To Improve Your Profit With Optimized Prediction

How To Improve Your Profit With Optimized Prediction Higher Business ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Forward thinking businesses are starting to use predictive analytics to predict which future business events will

More information

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies WHITEPAPER Today, leading companies are looking to improve business performance via faster, better decision making by applying advanced predictive modeling to their vast and growing volumes of data. Business

More information

DATA MINING - SELECTED TOPICS

DATA MINING - SELECTED TOPICS DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

OUTLIER ANALYSIS. Authored by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

OUTLIER ANALYSIS. Authored by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY, USA OUTLIER ANALYSIS OUTLIER ANALYSIS Authored by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY, USA Kluwer Academic Publishers Boston/Dordrecht/London Contents Preface Acknowledgments

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information