How To Identify Noisy Variables In A Cluster

Size: px
Start display at page:

Download "How To Identify Noisy Variables In A Cluster"

Transcription

1 Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science, Nowowiejska 3, Jelenia Gora, Poland Abstract. A proposal of an extended version of the HINoV method for the identification of the noisy variables (Carmone et al [1999]) for nonmetric, mixed, and symbolic interval data is presented in this paper. Proposed modifications are evaluated on simulated data from a variety of models. The models contain the known structure of clusters. In addition, the models contain a different number of noisy (irrelevant) variables added to obscure the underlying structure to be recovered. 1 Introduction Choosing variables is the one of the most important steps in a cluster analysis. Variables used in applied clustering should be selected and weighted carefully. In a cluster analysis we should include only those variables that are believed to help to discriminate the data (see Milligan [1996], p. 348). Two classes of approaches, while choosing the variables for cluster analysis, can facilitate a cluster recovery in the data (see e.g. Gnanadesikan et al [1995]; Milligan [1996], pp ): variable selection (selecting a subset of relevant variables), variable weighting (introducing relative importance of the variables according to their weights). Carmone et al [1999] discussed the literature on the variable selection and weighting (the characteristics of six methods and their limitations) and proposed the HINoV method for the identification of the noisy variables, in the area of the variable selection, to remedy problems with these methods. They demonstrated its robustness with metric data and k-means algorithm. The authors suggest further studies of the HINoV method with different types of data and other clustering algorithms on p In this paper we propose extended version of the HINoV method for nonmetric, mixed, and symbolic interval data. The proposed modifications are evaluated for eight clustering algorithms on simulated data from a variety of models.

2 2 Marek Walesiak and Andrzej Dudek 2 Characteristics of the HINoV method and its modifications Algorithm of Heuristic Identification of Noisy Variables (HINoV) method for metric data (see Carmone et al [1999]) is following: 1. A data matrix [x ij ] containing n objects and m normalized variables measured on a metric scale (i = 1,..., n; j = 1,..., m) is a starting point. 2. Cluster, via kmeans method, the observed data separately for each j- th variable for a given number of clusters u. It is possible to use clustering methods based on a distance matrix (pam or any hierarchical agglomerative method: single, complete, average, mcquitty, median, centroid, Ward). 3. Calculate adjusted Rand indices R jl (j, l = 1,..., m) for partitions formed from all distinct pairs of the m variables (j l). Due to a fact that adjusted Rand index is symmetrical we need to calculate m(m 1)/2 values. 4. Construct m m adjusted Rand matrix (parim). Sum rows or columns for each j-th variable R j = m R jl (topri): l=1 Variable parim topri M 1 R R 1m R 1 M 2 R R 2m R M m R m1 R m2... R m 5. Rank topri values R 1, R 2,..., R m in a decreasing order (stopri) and plot the scree diagram. The size of the topri values indicate a contribution of that variable to the cluster structure. A scree diagram identifies sharp changes in the topri values. Relatively low-valued topri variables (the noisy variables) are identified and eliminated from the further analysis (say h variables). 6. Run a cluster analysis (based on the same classification method) with the selected m h variables. The modification of the HINoV method for nonmetric data (where number of objects is much more than a number of categories) differs in steps 1, 2, and 6 (see Walesiak [2005]): 1. A data matrix [x ij ] containing n objects and m ordinal and/or nominal variables is a starting point. 2. For each j-th variable we receive natural clusters, where the number of clusters equals the number of categories for that variable (for instance five for Likert scale or seven for semantic differential scale). 6. Run a cluster analysis with one of clustering methods based on a distance appropriate to nonmetric data (GDM2 for ordinal data see Jajuga et al [2003]; Sokal and Michener distance for nominal data) with the selected m h variables. The modification of the HINoV method for symbolic interval data differs in steps 1 and 2:

3 Identification of noisy variables for nonmetric and symbolic data A symbolic data array containing n objects and m symbolic interval variables is a starting point. 2. Cluster the observed data with one of clustering methods (pam, single, complete, average, mcquitty, median, centroid, Ward) based on a distance appropriate to the symbolic interval data (e.g. Hausdorff distance - see Billard and Diday [2006], p. 246) separately for each j-th variable for a given number of clusters u. Functions HINoV.Mod and HINoV.Symbolic of clustersim computer program working in R allow adequately using mixed (metric, nonmetric), and the symbolic interval data. The proposed modifications of the HINoV method are evaluated on simulated data from a variety of models. 3 Simulation models We generate data sets in eleven different scenarios. The models contain the known structure of clusters. In the models 2-11 the noisy variables are simulated independently from the uniform distribution. Model 1. No cluster structure. 200 observations are simulated from the uniform distribution over the unit hypercube in 10 dimensions (see Tibshirani et al [2001], p. 418). Model 2. Two elongated clusters in 5 dimensions (3 noisy variables). Each cluster contains 50 observations. The observations in each of the two clusters are independent bivariate normal random variables with means (0, 0), (1, 5), and covariance matrix (σ jj = 1, σ jl = 0.9). Model 3. Three elongated clusters in 7 dimensions (5 noisy variables). Each cluster is randomly chosen to have 60, 30, 30 observations, and the observations are independently drawn from bivariate normal distribution with means (0, 0), (1.5, 7), (3, 14) and covariance matrix (σ jj = 1, σ jl = 0.9). Model 4. Three elongated clusters in 10 dimensions (7 noisy variables). Each cluster is randomly chosen to have 70, 35, 35 observations, and the observations are independently drawn from multivariate normal distribution with means (1.5, 6, 3), (3, 12, 6), (4.5, 18, 9), and identity covariance matrix, where σ jj = 1 (1 j 3), σ 12 = σ 13 = 0.9, and σ 23 = 0.9. Model 5. Five clusters in 3 dimensions that are not well separated (1 noisy variable). Each cluster contains 25 observations. The observations are independently drawn from bivariate normal distribution with means (5, 5), ( 3, 3), (3, 3), (0, 0), ( 5, 5), and identity covariance matrix (σ jj = 1, σ jl = 0.9). Model 6. Five clusters in 5 dimensions that are not well separated (2 noisy variables). Each cluster contains 30 observations. The observations are independently drawn from multivariate normal distribution with means (5, 5, 5), ( 3, 3, 3), (3, 3, 3), (0, 0, 0), ( 5, 5, 5), and covariance matrix, where σ jj = 1 (1 j 3), and σ jl = 0.9 (1 j l 3).

4 4 Marek Walesiak and Andrzej Dudek Model 7. Five clusters in 10 dimensions (8 noisy variables). Each cluster is randomly chosen to have 50, 20, 20, 20, 20 observations, and the observations are independently drawn from bivariate normal distribution with means (0, 0), (0, 10), (5, 5), (10, 0), (10, 10), and identity covariance matrix (σ jj = 1, σ jl = 0). Model 8. Five clusters in 9 dimensions (6 noisy variables). Each cluster contains 30 observations. The observations are independently drawn from multivariate normal distribution with means (0, 0, 0), (10, 10, 10), ( 10, 10, 10), (10, 10, 10), ( 10, 10, 10), and identity covariance matrix, where σ jj = 3 (1 j 3), and σ jl = 2 (1 j l 3). Model 9. Four clusters in 6 dimensions (4 noisy variables). Each cluster is randomly chosen to have 50, 50, 25, 25 observations, and the observations are independently drawn from bivariate normal distribution with means ( 4, 5), (5, 14), (14, 5), (5, 4), and identity covariance matrix (σ jj = 1, σ jl = 0). Model 10. Four clusters in 12 dimensions (9 noisy variables). Each cluster contains 30 observations. The observations are independently drawn from multivariate normal distribution with means ( 4, 5, 4), (5, 14, 5), (14, 5, 14), (5, 4, 5), and identity covariance matrix, where σ jj = 1 (1 j 3), and σ jl = 0 (1 j l 3). Model 11. Four clusters in 10 dimensions (9 noisy variables). Each cluster contains 35 observations. The observations on the first variable are independently drawn from univariate normal distribution with means 2, 4, 10, 16 respectively, and identity variance σj 2 = 0.5 (1 j 4). Ordinal data. The clusters in models 1-11 contain continuous data and a discretization process is performed on each variable to obtain ordinal data. [ The number of categories ]/ k determines the width of each class intervals: max{x ij ) min{x ij } k. Independently for each variable each class interval i i receive category 1,..., k and the actual value of variable x ij is replaced by these categories. In simulation study k = 5 (for k = 7 we have received similar results). Symbolic interval data. To obtain symbolic interval data the data were generated for each model twice into sets A and B and minimal (maximal) value of {a ij, b ij } is treated as the beginning (the end) of an interval. Fifty realizations were generated from each setting. 4 Discussion on the simulation results In testing the robustness of the HINoV modified algorithm using simulated ordinal or symbolic interval data, the major criterion was the identification of the noisy variables. The HINoV-selected variables contain variables with the highest topri values. In models 2-11 the number of nonnoisy variables is known. Due to this fact, in simulation study, the number of the HINoVselected variables equals the number of nonnoisy variables in each model.

5 Identification of noisy variables for nonmetric and symbolic data... 5 When the noisy variables were identified, the next step was to run the one of clustering methods based on distance matrix (pam, single, complete, average, mcquitty, median, centroid, Ward) with the nonnoisy subset of variables (HINoV-selected variables) and with all variables. Then each clustering result was compared with the known cluster structure from models 2-11 using Hubert and Arabie s [1985] corrected Rand index (see Table 1 and 2). Table 1. Cluster recovery for all variables and HINoV-selected subsets of variables for ordinal data (five categories) by experimental model and clustering method Clustering method Model pam ward single complete average mcquitty median centroid a b a b a b a b a b a b a b a b a b b r ccr 98.22% 98.00% 94.44% 90.67% 97.11% 89.56% 98.89% 98.44% 11 a , , b , , ,06419 a (b) values represent Hubert and Arabie s adjusted Rand indices averaged over fifty replications for each model with all variables (with HINoV-selected variables); r = b ā; ccr corrected cluster recovery. Some conclusions can be drawn from the simulations results: 1. The cluster recovery that used only the HINoV-selected variables for ordinal data (Table 1) and symbolic interval data (Table 2) was better than the one that used all variables for all models 2-10 and each clustering method. 2. Among 450 simulated data sets (nine models with 50 runs) the HINoV method was better (see ccr in Table 1 and 2):

6 6 Marek Walesiak and Andrzej Dudek Table 2. Cluster recovery for all variables and HINoV-selected subsets of variables for symbolic interval data by experimental model and clustering method Model Clustering method pam ward single complete average mcquitty median centroid 2 a b a b a b a b a b a b a b a b a b b r ccr 94.67% 91.78% 97.33% 99.11% 96.22% 96.44% 99.56% 99.78% 11 a b a (b); r = b ā; ccr see Table 1. from 89.56% (mcquitty) to 98.89% (median) of runs for ordinal data, from 91.78% (ward) to 99,78% (centroid) of runs for symbolic interval data. 3. Figure 1 shows the relationship between the values of adjusted Rand indices averaged over fifty replications and models 2-10 with the HINoV-selected variables ( b) and values showing an improvement ( r) of average adjusted Rand indices (cluster recovery with the HINoV selected variables against all variables) separately for eight clustering methods and types of data (ordinal, symbolic interval). Based on adjusted Rand indices averaged over fifty replications and models 2-10 the improvements in cluster recovery (HINoV selected variables against all variables) are varying: for ordinal data from (mcquitty) to (centroid), for symbolic interval data from (ward) to (centroid).

7 Identification of noisy variables for nonmetric and symbolic data pam ward mcquitty average complete single centroid median 0.9 average pam ward centroid b mcquitty complete median 0.6 single ordinal data symbolic data r Fig. 1. The relationship between values of b and r Source: own research 5 Conclusions The HINoV algorithm has limitations for analyzing nonmetric and symbolic interval data almost the same as the ones mentioned in Carmone et al [1999] article for metric data. First, the HINoV is of a little use with a nonmetric data set or a symbolic data array in which all variables are noisy (no cluster structure see model 1). In this situation topri values are similar and close to zero (see Table 3). Table 3. Mean and standard deviation of topri values for 10 variables in model 1 Ordinal data with five categories Symbolic data array Variable mean sd mean sd Second, the HINoV method depends on the relationship between pairs of variables. If we have only one variable with a cluster structure and the others are noisy, the HINoV will not be able to isolate this nonnoisy variable (see Table 4).

8 8 Marek Walesiak and Andrzej Dudek Table 4. Mean and standard deviation of topri values for 10 variables in model 11 Ordinal data with five categories Symbolic data array Variable mean sd mean sd Third, if all variables have the same cluster structure (no noisy variables) the topri values will be large and similar for all variables. The suggested selection process using a scree diagram will be ineffective. Fourth, an important problem is to decide on a proper number of clusters in stage two of the HINoV algorithm with symbolic interval data. To resolve this problem we should initiate the HINoV algorithm with a different number of clusters. References BILLARD, L., DIDAY, E. (2006): Symbolic data analysis. Conceptual statistics and data mining, Wiley, Chichester. CARMONE, F.J., KARA, A. and MAXWELL, S. (1999): HINoV: a new method to improve market segment definition by identifying noisy variables, Journal of Marketing Research, vol. 36, November, GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995): Weighting and selection of variables for cluster analysis, Journal of Classification, vol. 12, no. 1, HUBERT, L.J., ARABIE, P. (1985): Comparing partitions, Journal of Classification, vol. 2, no. 1, JAJUGA, K., WALESIAK, M., BAK, A. (2003): On the General Distance Measure, In: M., Schwaiger, and O., Opitz (Eds.), Exploratory data analysis in empirical research, Springer-Verlag, Berlin, Heidelberg, MILLIGAN, G.W. (1996): Clustering validation: results and implications for applied analyses, In: P., Arabie, L.J., Hubert, G., de Soete (Eds.), Clustering and classification, World Scientific, Singapore, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001): Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, ser. B, vol. 63, part 2, WALESIAK, M. (2005): Variable selection for cluster analysis approaches, problems, methods, Plenary Session of the Committee on Statistics and Econometrics of the Polish Academy of Sciences, 15, March, Wroclaw.

STANDARDISATION OF DATA SET UNDER DIFFERENT MEASUREMENT SCALES. 1 The measurement scales of variables

STANDARDISATION OF DATA SET UNDER DIFFERENT MEASUREMENT SCALES. 1 The measurement scales of variables STANDARDISATION OF DATA SET UNDER DIFFERENT MEASUREMENT SCALES Krzysztof Jajuga 1, Marek Walesiak 1 1 Wroc law University of Economics, Komandorska 118/120, 53-345 Wroc law, Poland Abstract: Standardisation

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

CLUSTERING is a process of partitioning a set of objects into

CLUSTERING is a process of partitioning a set of objects into IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 5, MAY 2005 657 Automated Variable Weighting in k-means Type Clustering Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong,

More information

A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA. Kevin Voges University of Canterbury. Nigel Pope Griffith University

A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA. Kevin Voges University of Canterbury. Nigel Pope Griffith University Abstract A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA Kevin Voges University of Canterbury Nigel Pope Griffith University Mark Brown University of Queensland Track: Marketing Research and Research

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Basic Questions in Cluster Analysis

Basic Questions in Cluster Analysis Cluster Analysis 1 Cluster analysis, like reduced space analysis (factor analysis), is concerned with data matrices in which the variables have not been partitioned beforehand into criterion versus predictor

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Classify then Summarize or Summarize then Classify

Classify then Summarize or Summarize then Classify Classify then Summarize or Summarize then Classify DIMACS, Rutgers University Piscataway, NJ 08854 Workshop Honoring Edwin Diday held on September 4, 2007 What is Cluster Analysis? Software package? Collection

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Cluster Analysis using R

Cluster Analysis using R Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Computer program review

Computer program review Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P2.1.25 Making the Most of Missing Values: Object Clustering

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239 STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering

More information

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

More information

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status. MARKET SEGMENTATION The simplest and most effective way to operate an organization is to deliver one product or service that meets the needs of one type of customer. However, to the delight of many organizations

More information

Data Preparation and Statistical Displays

Data Preparation and Statistical Displays Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

BUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.

BUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim. Tomáš Horváth BUSINESS ANALYTICS Lecture 3 Data Pre-processing Information Systems and Machine Learning Lab University of Hildesheim Germany Overview The aim of this lecture is to describe some data pre-processing

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings

A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings Dan Houston, Ph.D. Automation and Control Solutions Honeywell, Inc. dxhouston@ieee.org Abstract

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

How to report the percentage of explained common variance in exploratory factor analysis

How to report the percentage of explained common variance in exploratory factor analysis UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

More information

Introduction to Multivariate Analysis

Introduction to Multivariate Analysis Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief

More information

Multivariate Analysis

Multivariate Analysis Table Of Contents Multivariate Analysis... 1 Overview... 1 Principal Components... 2 Factor Analysis... 5 Cluster Observations... 12 Cluster Variables... 17 Cluster K-Means... 20 Discriminant Analysis...

More information

Method of Data Center Classifications

Method of Data Center Classifications Method of Data Center Classifications Krisztián Kósi Óbuda University, Bécsi út 96/B, H-1034 Budapest, Hungary kosi.krisztian@phd.uni-obuda.hu Abstract: This paper is about the Classification of big data

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. John L. Weatherwax May 7, 9 Introduction Here you ll find various notes and derivations

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Matthias Schonlau RAND 7 Main Street Santa Monica, CA 947 USA Summary In hierarchical cluster analysis dendrogram graphs

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Cluster analysis Cosmin Lazar. COMO Lab VUB

Cluster analysis Cosmin Lazar. COMO Lab VUB Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,

More information

Supplementary Material: Covariate-adjusted matrix visualization via correlation decomposition

Supplementary Material: Covariate-adjusted matrix visualization via correlation decomposition Supplementary Material: Covariate-adjusted matrix visualization via correlation decomposition Han-Ming Wu 1, Yin-Jing Tien 2, Meng-Ru Ho 3,4,5, Hai-Gwo Hwu 6, Wen-chang Lin 5, Mi-Hua Tao 5, and Chun-Houh

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Hierarchical Cluster Analysis Some Basics and Algorithms

Hierarchical Cluster Analysis Some Basics and Algorithms Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Cluster-wise assessment of cluster stability

Cluster-wise assessment of cluster stability Cluster-wise assessment of cluster stability Christian Hennig Department of Statistical Science, University College London, United Kingdom, tel: + 0 9 9, fax: + 0 0, email: chrish@stats.ucl.ac.uk Abstract

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

Sub-class Error-Correcting Output Codes

Sub-class Error-Correcting Output Codes Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Package MixGHD. June 26, 2015

Package MixGHD. June 26, 2015 Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

Methods for assessing reproducibility of clustering patterns

Methods for assessing reproducibility of clustering patterns Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data Lisa M. McShane 1,*, Michael D. Radmacher 1, Boris Freidlin 1, Ren Yu 2, 3, Ming-Chung Li 2 and Richard

More information

Alignment and Preprocessing for Data Analysis

Alignment and Preprocessing for Data Analysis Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise

More information

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining Volume 12 Issue 5 Version 1.0 March 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: & Print ISSN: Abstract - Data mining is sorting

More information

CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.

CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining. CLUSTER ANALYSIS Introduction Cluster analysis is a technique for grouping individuals or objects hierarchically into unknown groups suggested by the data. Cluster analysis can be considered an alternative

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

CREATING VALUE WITH BUSINESS ANALYTICS EDUCATION

CREATING VALUE WITH BUSINESS ANALYTICS EDUCATION ISAHP Article: Ozaydin, Ulengin/Creating Value with Business Analytics Education, Washington D.C., U.S.A. CREATING VALUE WITH BUSINESS ANALYTICS EDUCATION Ozay Ozaydin Faculty of Engineering Dogus University

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Knowledge Discovery in Stock Market Data

Knowledge Discovery in Stock Market Data Knowledge Discovery in Stock Market Data Alfred Ultsch and Hermann Locarek-Junge Abstract This work presents the results of a Data Mining and Knowledge Discovery approach on data from the stock markets

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information