Chapter 3: Cluster Analysis
|
|
- Anne Montgomery
- 8 years ago
- Views:
Transcription
1 Chapter 3: Cluster Analysis 3.1 Basic Concepts of Clustering 3.2 Partitioning Methods 3.3 Hierarchical Methods 3.4 Density-Based Methods 3.5 Model-Based Methods 3.6 Clustering High-Dimensional Data 3.7 Outlier Analysis Definition Statistical-Based Methods Distance-Based Methods Density-Based Local Methods Deviation-Based Methods
2 3.7.1 Definition Outliers: data objects that do not comply with the general behavior or model of the data Outlier detection or analysis is referred to as Outlier Mining Outlier mining has different applications Fraud detection Detecting unusual usage of telecommunication services Identifying the spending behavior of costumers with extremely low or extremely high incomes Finding unusual responses to various medical treatments Etc.
3 Outlier Mining Given a set if n data objects and k expected number of outliers Find the top k objects that are considerably Dissimilar Exceptional Inconsistent with respect to the remaining data The outlier mining problem can be seen as two sub-problems 1) Define what data can be considered as inconsistent in a given data set 2) Find an efficient method to mine the outliers so defined Data visualization methods are weak in detecting data with many categorical attributes or data of high dimensionality Investigate computer-based techniques to detect outliers
4 3.7.2 Statistical Distribution-Based Methods Assume a distribution model for the given data set(e.g., Normal) Identify outliers w. r. t the model using a discordancy test How does it work? Examine two hypothesis working hypothesis alternative hypothesis A working hypothesis H is a statement that the entire data set of n objects comes from an initial distribution model F that is: H: o i F, where i=1,2,,n The hypothesis H is retained if there is no statistically significant evidence supporting its rejection
5 Discordancy Test Verifies whether an object o i is significantly large(or small) in relation to the distribution F Principle Choose a some statistic T for discordancy testing Consider the value v i of an object o i If significance probability SP(vi) is sufficiently small o i is discordant The working hypothesis is rejected An alternative hypothesis H which says that o i comes from a another distribution model G is adopted The result depends on the model F is chosen because o i may be an outlier under one model and perfectly valid value under another
6 Discordancy Test: Example Let o 1,,o n represent the data objects Compute the sample mean µ and the standard deviation σ If the an object o i is suspected to be an outlier Compute the test statistic T T = i µ o σ If T exceeds some critical value, then o i is an outlier
7 Discordancy Test: Example Consider the following ordered data: 3.84, 4.26, 4.53, 4.60, 5.28, 5.29, 5.74, 5.86 Consider an additional sample P: 10 (it is suspected that this point might be an outlier) Compute µ and σ without the suspected outlier µ = 5.48, σ = 1.82 T = = 2.48 With n=9 and level of significance α=0.05, the critical value is T>2.110, then there is an evidence that P is an outlier
8 Alternative Distributions Inherent Alternative Distribution The working hypothesis that all objects come from distribution F is rejected Alternative hypothesis assume that all objects come from another distribution G H: o i G, where i=1,2,,n F and G: different distributions F and G : the same distribution but with different parameters Distribution G must have the potential to produce outliers (a different mean, or dispersion, or a longer tail)
9 Alternative Distributions Mixture Alternative Distribution The discordant values are not outliers in F population but contaminants from some other population G The alternative hypothesis is H: o i (1-λ) F+ λg, where i=1,2,,n Slippage Alternative Distribution All objects (except a small number) are from initial model F, with its given parameters The remaining objects are from a modified version of F in which the parameters have been shifted
10 Characteristics of Statistical-Based Methods Tests are for single attributes Need to find outliers in multidimensional space Statistical approaches require knowledge about parameters of the data set Statistical methods do not guarantee that all outliers will be found No specific test was developed The distribution cannot be adequately modeled with any standard distribution
11 3.7.2 Distance-Based Methods Generalize the test-based techniques Distance-based outliers are those objects that do not have enough neighbors Formally Define DB(pct, dmin)-outlier: a distance based outlier with parameters pct and dmin An object o is DB(pct, dmin)-outlier if at least a fraction pct of the objects lie at a distance greater than dmin from o Avoids excessive computation related to fitting the observed data into some standard distribution and selecting discordancy tests
12 Distance-Based Algorithms Index-based algorithms Use multidimensional indexing structures such as R-trees or k- d trees to search for neighbors of each object o
13 Distance-Based Algorithms Find neighbors of object o within a radius dmin M is the maximum number of objects within the dmin-neighborhood of an outlier Once M+1 objects of object o are found, then o is not an outlier Complexity of O(n 2 k) N: number of objects K: dimensionality Complexity is in search time. Building the index can be computationally very expensive
14 Distance-Based Algorithms Cell-based algorithms The data space is partitioned into cells with a side length equal to dmin 2 k dmin: radius around objects K: dimensionality Each cell has two layers surrounding it First layer is 1-cell thick 2 k 1 Second layer is thick, rounded up to the closest integer
15 Distance-Based Algorithms Cell-based algorithms Count outliers on a cell-by-cell rather than object-by-object basis For a given cell, the algorithm accumulates three counts The number of objects on the cell C The number of objects in the cell and the first layer C+1 The number of objects In the cell and the second layer C+2 How to determine outliers with these counts?
16 Distance-Based Algorithms Cell-based algorithms Assume M to be a threshold used to detect outliers An object o is considered as an outlier if C+1 <M, else all the objects in the cell are considered as non outliers If C+2 <M, all the objects in the cell are considered outliers If C+2 >M, it is possible that some objects in the cell are outliers do object-by-object processing to detect outliers only objects that have less than M objects in their dminneighborhood are outliers the dmin-neighborhood consist of the object s cell, all of its first layer and some of its second layer
17 Characteristics of Distance-Based Methods Avoid O(n 2 ) computational complexity Its complexity is O(c k +n) c is a constant depending on the number of cells k the dimensionality n number of objects Developed for memory-resident data sets Requires the user to set both dmin and pct Finding suitable settings for these parameters can involve much trial and error
18 3.7.3 Density-Based Methods Statistical and distance-based methods depend on the overall global distribution of data Data are usually not uniformly distributed Data can have different density distributions C 1 C o 2 2 o 1
19 Density-Based Methods Define Local Outliers An object is a local outlier if it is outlying relative to its local neighborhood (w. r. t the density of the neighborhood ) Does not consider being an outlier as a binary property Asses the degree to which an object is an outlier The degree of the outlierness is computed as the Local Outlier Factor(LOF) of an object The degree depends on how isolated the object is with respect to the surrounding neighborhood Detect global and local outliers
20 Density-Based Methods To define the local outlier factor of an object, the following concepts should be introduced K-distance K-distance neighborhood Reachability distance Local reachability distance
21 K-distance & K-distance neighborhood The k-distance of an object p is the maximal distance that p gets from its k-nearest neighbors Denoted k-distance(p) p How k is determined? LOF method sets k to the parameter MinPts used in the densitybased clustering (e.g., Minpts=4) [MinPts-distance] K-distance neighborhood of an object p contains the MinPtsnearest neighbors of p Denoted N k-distance (P) or N k (P), also N MinPts p
22 Reachability distance The reachability-distance of an object q with respect to object o (where o is within the MinPts-nearest neighbors of P) is denoted reach_distminpts(p,o) p Reach_distMinPts (p,o)=max{minpts_distance(o), d(p,o)} If p is far away from o, the reachability distance between the two is simply their actual distance If they are close, then the actual distance is replaced by the MinPts_distance of o
23 Local Outlier Factor (LOF) The local reachability density of p is the inverse of the average reachability density based on the MinPts-nearest neighbors of p lrd MinPts (p) = o NMinPts(p) NMinPts(P) reach_dist MinPts (p,o) The local outlier factor (LOF) of p captures the degree to which we call p an outlier LOF MinPts (p) = o NMinPts(P) N MinPts Ird Ird ( P) MinPts MinPts ( o) ( P)
24 3.7.4 Deviation-Based Methods Identify outliers by examining the main characteristics of objects on a group Objects that deviate from this description are outliers The term deviation is used to refer to outliers Two main methods Sequential Exception Technique OLAP Data Cube Technique
25 Summary of Chapter 3 A cluster is a collection of data objects that are similar within the same cluster and dissimilar to the objects on other clusters Clustering can be used as a main task to gain insights about the data a preprocessing step for other data mining algorithms Several applications Market segmentation Pattern recognition Biological studies Spatial data analysis Web document classification, etc.
26 Summary of Chapter 3 The quality of clustering can be assessed based on dissimilarity of objects Many techniques have been developed Partitioning Methods Hierarchical methods Density-based methods Grid-based methods Model-based methods Clustering high dimensional data Constrained-based methods
27 Applications and Tools in Data Mining Summary
28 1. Financial Data Analysis Banks and Institutions offer a wise variety of banking services Checking and savings accounts for business or individual customers Credit business, mortgage, and automobile loans Investment services (mutual funds) Insurance services and stock investment services Financial data is relatively complete, reliable, and of high quality What to do with this data?
29 1. Financial Data Analysis Design of data warehouses for multidimensional data analysis and data mining Construct data warehouses (data come from different sources) Multidimensional Analysis: e.g., view the revenue changes by month. By region, by sector, etc. along with some statistical information such as the mean, the average, the maximum and the minimum values, etc. Characterization and class comparison Outlier analysis
30 1. Financial Data Analysis Loan Payment Prediction and costumer credit policy analysis Attribute selection and attribute relevance ranking may help indentifying important factors and eliminate irrelevant ones Example of factors related to the risk of loan payment Term of the loan Debt ratio Payment to income ratio Customer level income Education level Residence region The bank can adjust its decisions according to the subset of factors selected (use classification)
31 2. Retail Industry Collect huge amount of data on sales, customer shopping history, goods transportation, consumption and service, etc. Many stores have web sites where you can buy online. Some of them exist only online (e.g., Amazon) Data mining helps to Identify costumer buying behaviors Discover customers shopping patterns and trends Improve the quality of costumer service Achieve better costumer satisfaction Design more effective good transportation Reduce the cost of business
32 2. Retail Industry Design data warehouses Multidimensional analysis Analysis of the effectiveness of sales campaigns Advertisements, coupons, discounts, bonuses, etc Comparing transactions that contain sales items during and after the campaign Costumer retention Analyze the change in costumers behaviors Product Recommendation Mining association rules Display associative information to promote sales
33 3. Telecommunication Industry Many different ways of communicating Fax, cellular phone, Internet messenger, images, e- mail, computer and Web data transmission, etc. Great demand of data mining to help Understanding the business involved Indentifying telecommunication patterns Catching fraudulent activities Making better use of resources Improve the quality of service
34 3. Telecommunication Industry Multidimensional analysis (several attributes) Several features: Calling time, Duration, Location of caller, Location of callee, Type of call, etc. Compare data traffic, system workload, resource usage, user group behavior, and profit Fraudulent Pattern Analysis Identify potential fraudulent users Detect attempts to gain fraudulent entry to costumer accounts Discover unusual patterns (outlier analysis)
35 4. Many Other Applications Biological Data Analysis E.g., identification and analysis of human genomes and other species Web Mining E.g., explore linkage between web pages to compute authority scores (Page Rank Algorithm) Intrusion detection Detect any action that threaten file integrity, confidentiality, or availability of a network resource
36 How to Choose a Data Mining System (Tool)? Do data mining system share the same well defined operations and a standard query language? No Many commercial data mining system have a little in common Different functionalities Different methodology Different data sets You need to carefully choose the data mining system that is appropriate for your task
37 How to Choose a Data Mining System (Tool)? Data Types Available systems handle formatted record-based, relational-like data with numerical, and nominal attributes That data could be on the form of ASCII text, relational databases, or data warehouse data It is important to check which kind of data the system you are choosing can handle Operating System A data mining system may run only on one operating system The most popular operating systems that host data mining tools are UNIX/LINUX and Microsoft Windows Large industry data mining systems adopt client-server architecture
38 How to Choose a Data Mining System (Tool)? Data Sources Data formats Some systems work only with ASCII test files, whereas many other work with databases It is important that the data mining system supports ODBC connections (Open Database Connectivity) Data Mining functions and Methodologies Some systems provide only one data mining function(e.g., classification). Other system support many functions For a given data mining function (e.g., classification), some systems support only one method. Other systems may support many methods (k-nearest neighbor, naive Bayesian, etc.) Data mining system should provide default settings for non experts
39 How to Choose a Data Mining System (Tool)? Coupling data mining with databases(data warehouse) systems No Coupling A DM system will not use any function of a DB/DW system Fetch data from particular resource (file) Process data and then store results in a file Loose coupling A DM system use some facilities of a DB/DW system Fetch data from data repositories managed by a DB/DW Store results in a file or in the DB/DW Semi-tight coupling Efficient implementation of few essential data mining primitives (sorting, indexing, histogram analysis) is provided by the DB/DW Tight coupling A DM system is smoothly integrated into the DB/DW Data mining queries are optimized Tight coupling is highly desirable because it facilitates implementations and provide high system performance
40 How to Choose a Data Mining System (Tool)? Scalability Query execution time should increase linearly with the number of dimensions Visualization A picture is worth a thousand words The quality and the flexibility of visualization tools may strongly influence usability, interpretability and attractiveness of the system Data Mining Query Language and Graphical user Interface High quality user interface It is not common to have a query language in a DM system
41 Examples of Commercial Data Mining Tools Database system and graphics vendors Intelligent Miner (IBM) Microsoft SQL Server 2005 MineSet (Purple Insight) Oracle Data Mining (ODM)
42 Examples of Commercial Data Mining Tools Vendors of statistical analysis or data mining software Clementine (SPSS) Enterprise Miner (SAS Institute) Insightful Miner (Insightful Inc.)
43 Examples of Commercial Data Mining Tools Machine learning community CART (Salford Systems) See5 and C5.0 (RuleQuest) Weka developed at the university Waikato (open source)
44 End of The Data Mining Course Questions? Suggestions?
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationCS590D: Data Mining Chris Clifton
CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 11 Applications and Trends in Data Mining SURESH BABU M ASST PROFESSOR VJIT 1 Applications and Trends in Data Mining Data mining applications Data mining system
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationOutlier Detection in Clustering
Outlier Detection in Clustering Svetlana Cherednichenko 24.01.2005 University of Joensuu Department of Computer Science Master s Thesis TABLE OF CONTENTS 1. INTRODUCTION...1 1.1. BASIC DEFINITIONS... 1
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationData Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationData Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationBuilding Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu
Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
More informationDiscovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III
www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationDATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
More information2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: simmibagga12@gmail.com) Dr. G.N. Singh Department of Physics and
More informationData Mining Introduction
Data Mining Introduction Organization Lectures Mondays and Thursdays from 10:30 to 12:30 Lecturer: Mouna Kacimi Office hours: appointment by email Labs Thursdays from 14:00 to 16:00 Teaching Assistant:
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationData Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationImportance or the Role of Data Warehousing and Data Mining in Business Applications
Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationUse of Data Mining in Banking
Use of Data Mining in Banking Kazi Imran Moin*, Dr. Qazi Baseer Ahmed** *(Department of Computer Science, College of Computer Science & Information Technology, Latur, (M.S), India ** (Department of Commerce
More informationDigging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationAn Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
More informationData Mining as Part of Knowledge Discovery in Databases (KDD)
Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationIT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More informationIII JORNADAS DE DATA MINING
III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE
More informationKnowledge Discovery Process and Data Mining - Final remarks
Knowledge Discovery Process and Data Mining - Final remarks Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 14 SE Master Course 2008/2009
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationA STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationSubject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
More informationApplications and Trends in Data Mining
ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:
More informationDATA MINING ALPHA MINER
DATA MINING ALPHA MINER AlphaMiner is developed by the E-Business Technology Institute (ETI) of the University of Hong Kong under the support from the Innovation and Technology Fund (ITF) of the Government
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationTwo-Phase Data Warehouse Optimized for Data Mining
Two-Phase Data Warehouse Optimized for Data Mining Balázs Rácz András Lukács Csaba István Sidló András A. Benczúr Data Mining and Web Search Research Group Computer and Automation Research Institute Hungarian
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationHow Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information
More informationGrid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
More informationOUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
More information1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
More informationKnowledge Discovery in Data with FIT-Miner
Knowledge Discovery in Data with FIT-Miner Michal Šebek, Martin Hlosta and Jaroslav Zendulka Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno {isebek,ihlosta,zendulka}@fit.vutbr.cz
More informationChapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
More informationData Mining and Marketing Intelligence
Data Mining and Marketing Intelligence Alberto Saccardi 1. Data Mining: a Simple Neologism or an Efficient Approach for the Marketing Intelligence? The streamlining of a marketing campaign, the creation
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationOLAP Theory-English version
OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy Agenda The Market Why OLAP (On-Line-Analytic-Processing Introduction
More informationData Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
More informationDistance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationDATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011
DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationSunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationChapter 6 - Enhancing Business Intelligence Using Information Systems
Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning
More informationLVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico licinia.monteiro@tagus.ist.utl.pt I. Executive Summary In this Resume we describe a new functionality implemented
More informationSome vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
More informationHow To Improve Your Profit With Optimized Prediction
Higher Business ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Forward thinking businesses are starting to use predictive analytics to predict which future business events will
More informationProduct recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies
WHITEPAPER Today, leading companies are looking to improve business performance via faster, better decision making by applying advanced predictive modeling to their vast and growing volumes of data. Business
More informationDATA MINING - SELECTED TOPICS
DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationOUTLIER ANALYSIS. Authored by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
OUTLIER ANALYSIS OUTLIER ANALYSIS Authored by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY, USA Kluwer Academic Publishers Boston/Dordrecht/London Contents Preface Acknowledgments
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More information