Data Clustering for Forecasting
|
|
- Priscilla Wright
- 7 years ago
- Views:
Transcription
1 Data Clustering for Forecasting James B. Orlin MIT Sloan School and OR Center Mahesh Kumar MIT OR Center Nitin Patel Visiting Professor Jonathan Woo ProfitLogic Inc. 1
2 Overview of Talk Overview of Clustering Error-based clustering Use of clustering in forecasting But first, a few words from Scott Adams 2
3 3
4 4
5 5
6 What is clustering Clustering is the process of partitioning a set of data or objects into clusters with the following properties: Homogeneity within clusters: data that belong to the same cluster should be as similar as possible Heterogeneity between clusters: data that belong to different clusters should be as different as possible. 6
7 Overview of this talk Provide a somewhat personal view of the significance of clustering in life, and why it has not met its promise Provide our technique for how to incorporate uncertainty about data into clustering, so as to reduce uncertainty in forecasting. 7
8 Iris Data (Fisher, 1936) can2 can1 Species Setosa Versicolor Virginica 8
9 Cluster the iris data This is a 2-dimensional projection of 4-dimensional data. (sepal length and width, petal length and width) It is not clear if there are 2, 3 or 4 clusters There are 3 clusters Clusters are usually chosen to minimize some metric (e.g., sum of squared distances from center of the cluster) 9
10 Iris Data can2 can1 Species Setosa Versicolor Virginica 10
11 Iris Data, using ellipses can2 can1 Species Setosa Versicolor Virginica 11
12 Why is clustering important: a personal perspective Two very natural aspects of intelligence: grouping (clustering) and categorizing It s an organizing principle of our minds and of our life Just a few examples We cluster life into work life and family life We cluster our life by our roles father, mother, sister, brother, teacher, manager, researcher, analyst, etc We cluster our work life into various ways, perhaps organized by projects, or who we report to, or by who reports to us, etc. We even cluster what talks we attend, perhaps organized by quality, or what we learned, or where it was. 12
13 More on Clustering in Life More clustering Examples: Go shopping: products are clustered in the store (useful for locating things) As a professor: I need to cluster students into letter grades: what really is the difference between a B + and an A -? (useful in evaluations) When we figure out what to do, we often prioritize by clustering things (important vs. non-important) We cluster people into multiple dimensions based on appearance, intelligence, character, religion, sexual orientation, place of origin, etc Conclusion: Humans are clustering and categorizing by nature. It is part of our nature. It is part of our intelligence 13
14 Fields that have used clustering Marketing (market segmentation, catalogues) Chemistry (the periodic table is a great example) Finance (making sense of stock transactions) Medicine (clustering patients) Data mining (what can we do with transactional data, such as click stream data?) Bioinformatics (how can we make sense of proteins?) Data compression and aggregation (can we cluster massive data sets into smaller data sets for subsequent analysis? plus much more 14
15 Has clustering been successful in data mining? Initial hope: clustering would find many interesting patterns and surprising relationships arguably not met, at least not nearly enough perhaps it requires too much intelligence perhaps we can do better in the future Nevertheless: clustering has been successful in use computers for things that humans are quite bad at dealing with massive amounts of data effectively using knowledge of uncertainty 15
16 An issue in clustering: the effect of scale Background: an initial motivation for our work in clustering (as sponsored by the e- business Center) is to eliminate the effect of scale in clustering 16
17 A Chart of 6 Points Clustering 6 points
18 Two Clusters of the 6 Points Clustering 6 points
19 We added two points and adjusted the scale Clustering 8 points
20 3 clusters of the 8 points Clustering 8 points The 6 points on the left are clustered differently 20
21 Scale Invariance A clustering approach is called scale invariant if it develops the same solution, independent of the scales used The approach developed next is scale invariant 21
22 Using clustering to reduce uncertainty. Try to find the average of the 3 populations can2 can1 Species Setosa Versicolor Virginica 22
23 Using uncertainty to improve clustering: an example with 4 points in 1 dimension The four points were obtained as sample means for four samples, two from one distribution, and two from another. Objective: cluster into two groups of two each so as to maximize the probability that each cluster represents two samples from the same distribution
24 Standard Approach Consider the four data points, and cluster based on these values. Resulting cluster
25 Incorporating Uncertainty a common assumption in statistics data comes from populations or distributions from data, we can estimate the mean of the population and the standard deviation of the original Usual approach to clustering keep track of the estimated mean ignore the standard deviation (estimate of the error) Our approach: use both the estimated mean and the estimate of the error
26 The two samples on the left were samples with 10,000 points each. The samples on the right were two samples with 100 points each The radius corresponds to standard deviation. Smaller circles! larger data sets! more certainty. 26
27 probability = 4/19 probability = 8/19 probability = 7/
28 10,000 points with mean points with mean ,100 points with mean.501 True mean:.5 10,000 points with mean points with mean ,100 points with mean.537 True mean:.53 28
29 More on using uncertainty We will use clustering to reduce uncertainty We will use our knowledge of the uncertainty to improve the clustering In the previous example, the correct cluster was probability = 8/19 We had generated 20 sets of four points at random. The data was from the second set of four points. 29
30 Error based clustering 1. Start with n points in k-dimensional space next example has 15 points, 2 dimensions Each point has an estimated mean as well as a standard deviation of the estimate 2. Determine the likelihood for each pair of points coming from the same distribution 3. Merge the two points with the greatest likelihood 4. Return to Step 2. 30
31 Using Maximum Likelihood Maximum Likelihood Method Suppose we have G clusters, C 1, C 2,, C G. Out of exponentially many clusterings possible, which clustering is most likely w.r.t. to the observed data. Objective: x 1 x max ( ) ( ) ( ) G i t 1 i k= 1 i C σ k i i C σ k i i C σ k i Computationally difficult! 31
32 Heuristic solution based on maximum likelihood Greedy heuristic Start with n single point clusters Combine pair of clusters that lead to maximum increase in the objective value (based on maximum likelihood) Stop when we have G clusters. Similar to hierarchical Clustering 32
33 Error-based clustering At each step combine pair of clusters C i, C j with smallest ( ) t x x ( σ + σ ) ( x x ) i j i j i j x i, x i : maximum likelihood of means of clusters " i, " j : standard errors in x s. We define the distance between two clusters as t i j σi + σ j i j ( x x ) ( ) ( x x ) Computationally much easier!! 33
34 Error-based Clustering Algorithm distance(c i, C j ) = t i j σi + σ j i j ( x x ) ( ) ( x x ) Start with n singleton clusters At each step combine pair of clusters C i, C j with smallest distance. Stop when we have desired number of clusters It is a generalization of Ward s method. 34
35 The mean is the dot. The error is given by the ellipse. A small ellipse means that the data is quite accurate. 35
36 Determine the two elements most likely to come from the same distribution. Merge them into a single element. 36
37 Merge them into a single element. Determine the two elements most likely to come from the same distribution. 37
38 Continue this process, reducing the number of clusters one at a time. 38
39 39
40 40
41 41
42 42
43 43
44 44
45 45
46 46
47 47
48 48
49 49
50 Here we went all the way to a single cluster. We could stop with 2 or 3 or more clusters. We can also evaluate different numbers of clusters at the end. 50
51 Rest of the Lecture The use of clustering in forecasting developed while Mahesh Kumar worked at ProfitLogic. Joint work: Mahesh Kumar, Nitin Patel, Jonathon Woo. 51
52 Motivation Accurate sales forecasting is very important in retail industry in order to make good decisions. Shipping Allocation Pricing Manufacturer Wholesaler Retailer Customer Kumar et al. used clustering to help in accurate sales forecasting. 52
53 Forecasting Problem Goal: Forecast Sales Parameters that affect sales Price When a product is introduced Promotions Inventory Base demand as a function of time of the year. Random effects. 53
54 Seasonality Definition Seasonality is the hypothesized underlying base demand of a group of similar merchandize as a function of time of the year. It is a vector of size 52, describing variations over the year. It is independent of external factors like changes in price, promotions, inventory, etc. and is modeled as a multiplicative factor. e.g., two portable CD players have essentially the same seasonality, but they may differ in price, promotions, inventory, etc. 54
55 Seasonality Examples (made up data) weekly sales for summer shoes weekly sales for winter boots 55
56 Objective: determine seasonality of products Difficulty: observations of a product s seasonality is complicated by so other factors when the product is introduced sales and promotions inventory Solution methods preprocess data to compensate for sales and promotions and inventory effects average over lots of very similar products to eliminate some of the uncertainty Further clustering of products can eliminate more uncertainty 56
57 Retail Merchandize Hierarchy J-Mart Chain Men s summer Shoes Shoes Department Class Item Debok walkers Sales data available for items 57
58 Modeling Seasonality i = i1 σi 1 i2 σi2 i52 σi52 = i σi Seas {( x, ),( x, ),...,( x, )} ( x, ) Seasonality is modeled as a vector with 52 components Assumptions: We assume errors are Guassian We treat the estimate of the σ s as if they are the correct values 58
59 Illustration on simulated data Kumar et al generated data with 3 different seasonalities. They then combined similar products and produced estimates of seasonalities. Clustering produced much better final estimates. 59
60 Simulation Study 3 different seasonalities were used to generate sales data for 300 items. All 300 items divided into 12 classes. 12 estimates of seasonality coefficients along with associated errors. Used clustering into three clusters to forecast correct seasonalities. 60
61 Seasonalities 61
62 Initial seasonality estimates 62
63 Clustering Cluster classes with similar seasonality to reduce errors. Example: Men s winter shoes, men s winter coats. Standard Clustering methods do not incorporate information contained in the errors. Hierarchical clustering K-means clustering Ward s method 63
64 Further Clustering They used K-means, hierarchical, and Ward s technique They also used error based clustering 64
65 Kmeans, hierarchical (avg), Ward s Result 65
66 Error-based Clustering Result 66
67 Real Data Study Data from retail industry. 6 department: books, sporting goods, greeting cards, videos, etc. 45 classes. Sales forecast Without clustering Standard clustering Error-based clustering 67
68 Forecast Result (An example) No Clustering Sales Standard Clustering Error-based Clustering Weeks 68
69 Result Statistics Average Forecast Error ForecastSale = ActualSale ActualSale 69
70 Summary and Conclusion A new clustering method that incorporates information contained in errors It has strong theoretical justification under appropriate assumptions Computationally easy Works well in practice 70
71 Summary and Conclusion Major point: if one is using clustering to reduce uncertainty, then it makes sense to use error-based clustering. Scale invariance. Error-based clustering has strong theoretical justification and works well in practice. The concept of using errors can be applied to many other applications where one has reasonable estimate of errors. 71
COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationEquational Reasoning as a Tool for Data Analysis
AUSTRIAN JOURNAL OF STATISTICS Volume 31 (2002), Number 2&3, 231-239 Equational Reasoning as a Tool for Data Analysis Michael Bulmer University of Queensland, Brisbane, Australia Abstract: A combination
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationData Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More information3F3: Signal and Pattern Processing
3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More informationOUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationAlgorithms and optimization for search engine marketing
Algorithms and optimization for search engine marketing Using portfolio optimization to achieve optimal performance of a search campaign and better forecast ROI Contents 1: The portfolio approach 3: Why
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationIris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
More informationClustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence
Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence Taiyeong Lee, Yongqiao Xiao, Xiangxiang Meng, David Duling SAS Institute, Inc 100 SAS Campus Dr. Cary, NC 27513,
More informationThe Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
More informationCOC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
More informationK-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
More informationDecision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
More informationBehavioral Segmentation
Behavioral Segmentation TM Contents 1. The Importance of Segmentation in Contemporary Marketing... 2 2. Traditional Methods of Segmentation and their Limitations... 2 2.1 Lack of Homogeneity... 3 2.2 Determining
More informationUse of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationInventory Management and Risk Pooling. Xiaohong Pang Automation Department Shanghai Jiaotong University
Inventory Management and Risk Pooling Xiaohong Pang Automation Department Shanghai Jiaotong University Key Insights from this Model The optimal order quantity is not necessarily equal to average forecast
More informationDHL Data Mining Project. Customer Segmentation with Clustering
DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010 DHL Data Mining Project Table of Contents Introduction to DHL and the
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More information0.1 What is Cluster Analysis?
Cluster Analysis 1 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationSimple Inventory Management
Jon Bennett Consulting http://www.jondbennett.com Simple Inventory Management Free Up Cash While Satisfying Your Customers Part of the Business Philosophy White Papers Series Author: Jon Bennett September
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationSALES FORCE SIZING & PORTFOLIO OPTIMIZATION. David Wood, PhD, Senior Principal Rajnish Kumar, Senior Manager
SALES FORCE SIZING & PORTFOLIO OPTIMIZATION David Wood, PhD, Senior Principal Rajnish Kumar, Senior Manager Today s Webinar as part of a series All PMSA Webinars available via http://www.pmsa.net/conferences/webinar
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationSoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationGraphical Representation of Multivariate Data
Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationINVENTORY MANAGEMENT, SERVICE LEVEL AND SAFETY STOCK
INVENTORY MANAGEMENT, SERVICE LEVEL AND SAFETY STOCK Alin Constantin RĂDĂŞANU Alexandru Ioan Cuza University, Iaşi, Romania, alin.radasanu@ropharma.ro Abstract: There are many studies that emphasize as
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationCluster analysis with SPSS: K-Means Cluster Analysis
analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationPERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationlesson three budgeting your money teacher s guide
lesson three budgeting your money teacher s guide budgeting your money lesson outline lesson 3 overview I m all out of money, and I won t get paid again until the end of next week! This is a common dilemma
More informationStrategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationDiagnosis of Students Online Learning Portfolios
Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationIntroduction to Principal Component Analysis: Stock Market Values
Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from
More informationConstrained Clustering of Territories in the Context of Car Insurance
Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationHigh-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
More informationIDENTIFICATION OF DEMAND FORECASTING MODEL CONSIDERING KEY FACTORS IN THE CONTEXT OF HEALTHCARE PRODUCTS
IDENTIFICATION OF DEMAND FORECASTING MODEL CONSIDERING KEY FACTORS IN THE CONTEXT OF HEALTHCARE PRODUCTS Sushanta Sengupta 1, Ruma Datta 2 1 Tata Consultancy Services Limited, Kolkata 2 Netaji Subhash
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationCLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS
CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford,
More informationA STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
More informationInventory Management & Optimization in Practice
Inventory Management & Optimization in Practice Lecture 16 ESD.260 Logistics Systems Fall 2006 Edgar E. Blanco, Ph.D. Research Associate MIT Center for Transportation & Logistics 1 Session goals The challenges
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationDistances between Clustering, Hierarchical Clustering
Distances between Clustering, Hierarchical Clustering 36-350, Data Mining 14 September 2009 Contents 1 Distances Between Partitions 1 2 Hierarchical clustering 2 2.1 Ward s method............................
More informationConnecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules
Journal of Universal Computer Science, vol. 11, no. 11(2005), 1835-1848 submitted: 1/9/05, accepted: 1/10/05, appeared: 28/11/05 J.UCS Connecting Segments for Visual Data Exploration and Interactive Mining
More informationOperations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras
Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture - 36 Location Problems In this lecture, we continue the discussion
More informationBusiness Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
More informationThe Economic Benefits of Multi-echelon Inventory Optimization
SOLUTION PERSPECTIVES: Leveraging Multi-echelon Replenishment to Maximize Return on Inventory Investment The Economic Benefits of Multi-echelon Inventory Optimization Lower working capital requirements,
More informationB490 Mining the Big Data. 2 Clustering
B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern
More informationBig Data and Scripting
Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationSample of Best Practices
Sample of Best Practices For a Copy of the Complete Set Call Katral Consulting Group 954-349-1281 Section 1 Planning & Forecasting Retail Best Practice Katral Consulting Group 1 of 7 Last printed 2005-06-10
More informationMEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
More informationThe Statistics of Income (SOI) Division of the
Brian G. Raub and William W. Chen, Internal Revenue Service The Statistics of Income (SOI) Division of the Internal Revenue Service (IRS) produces data using information reported on tax returns. These
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationVENDOR MANAGED INVENTORY
VENDOR MANAGED INVENTORY Martin Savelsbergh School of Industrial and Systems Engineering Georgia Institute of Technology Joint work with Ann Campbell, Anton Kleywegt, and Vijay Nori Distribution Systems:
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationAspen Collaborative Demand Manager
A world-class enterprise solution for forecasting market demand Aspen Collaborative Demand Manager combines historical and real-time data to generate the most accurate forecasts and manage these forecasts
More informationSPSS Tutorial. AEB 37 / AE 802 Marketing Research Methods Week 7
SPSS Tutorial AEB 37 / AE 802 Marketing Research Methods Week 7 Cluster analysis Lecture / Tutorial outline Cluster analysis Example of cluster analysis Work on the assignment Cluster Analysis It is a
More informationOperations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras
Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras Lecture - 41 Value of Information In this lecture, we look at the Value
More informationMISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)
MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,
More information