Multidimensional scaling

Similar documents
Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices

The Value of Visualization 2

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Data Mining: Algorithms and Applications Matrix Math Review

An Introduction to MDS

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Visualization by Linear Projections as Information Retrieval

Cover Page. "Assessing the Agreement of Cognitive Space with Information Space" A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program

Least Squares Estimation

STANDARDISATION OF DATA SET UNDER DIFFERENT MEASUREMENT SCALES. 1 The measurement scales of variables

Interactive Data Visualization with Multidimensional Scaling

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Microsoft Business Intelligence Visualization Comparisons by Tool

Using Metric Space Methods to Analyse Reservoir Uncertainty

Chapter 7 Factor Analysis SPSS

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

Visualization Software

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Visualization of General Defined Space Data

Sawtooth Software. The CPM System for Composite Product Mapping TECHNICAL PAPER SERIES

Multivariate Analysis of Ecological Data

Visualization Methodology for Multidimensional Scaling

Dimensionality Reduction - Nonlinear Methods

CLUSTER ANALYSIS FOR SEGMENTATION

Introduction to Principal Component Analysis: Stock Market Values

Big Ideas in Mathematics

THREE DIMENSIONS OF THE ONLINE COURSE EVALUATION INSTRUMENT IN POSTSECONDARY EDUCATION

Computer program review

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

What is Rotating in Exploratory Factor Analysis?

Imputing Missing Data using SAS

Exploratory Data Analysis with MATLAB

How To Make Visual Analytics With Big Data Visual

Performance Metrics for Graph Mining Tasks

HDDVis: An Interactive Tool for High Dimensional Data Visualization

Self Organizing Maps: Fundamentals

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

How To Identify Noisy Variables In A Cluster

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Virtual Landmarks for the Internet

Manifold Learning Examples PCA, LLE and ISOMAP

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA

Chapter 7. Cluster Analysis

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Introduction to Matrix Algebra

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Regression III: Advanced Methods

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

Data Visualization with Multidimensional Scaling

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Visualization Techniques in Data Mining

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

MSCA Introduction to Statistical Concepts

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Partial Least Squares (PLS) Regression.

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Strategic Online Advertising: Modeling Internet User Behavior with

Social Media Mining. Data Mining Essentials

Chapter ML:XI (continued)

Investigating the genetic basis for intelligence

Leveraging Ensemble Models in SAS Enterprise Miner

Tutorial on Markov Chain Monte Carlo

Visualization of Breast Cancer Data by SOM Component Planes

Least-Squares Intersection of Lines

How To Understand Multivariate Models

Factor Analysis. Chapter 420. Introduction

Spazi vettoriali e misure di similaritá

Utilizing spatial information systems for non-spatial-data analysis

Statistics for BIG data

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Data Exploration Data Visualization

CHAPTER 1 INTRODUCTION

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

A simplified implementation of the least squares solution for pairwise comparisons matrices

Method of Data Center Classifications

Visualization of large data sets using MDS combined with LVQ.

Overview of Factor Analysis

AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

An introduction to OBJECTIVE ASSESSMENT OF IMAGE QUALITY. Harrison H. Barrett University of Arizona Tucson, AZ

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

-3- TWO SOFTWARE PACKAGES FOR ARCHAEOLOGICAL QUANTITATIVE DATA ANALYSIS. S.G.H.Daniels 1, Gwendrock Villas, Fernleigh Road, Wadebridge, Cornwall

Supervised and unsupervised learning - 1

Clustering & Visualization

Mini-project in TSRT04: Cell Phone Coverage

Graph/Network Visualization

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

QoS Mapping of VoIP Communication using Self-Organizing Neural Network

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

An Overview of Tests of Cognitive Spatial Ability

Software Cost Estimation with Incomplete Data

3D Distance from a Point to a Triangle

Performance of KDB-Trees with Query-Based Splitting*

Transcription:

Multidimensional scaling From Wikipedia, the free encyclopedia Multidimensional scaling (MDS) refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. An MDS algorithm aims to place each object in N-dimensional space such that the betweenobject distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. Unlike principal component analysis wherein most of the variance in the data is captured in the first axis with each subsequent axis containing progressively less information, axes in MDS are arbitrary and distance units along each axis do not reflect equal quantitative distances at other sections of the same axis. 1 The number of dimensions of an MDS plot N can exceed 2 and are specified a priori. Choosing N=2 optimizes the object locations for a two-dimensional scatterplot. 2 Types MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: Classical multidimensional scaling Also known as Principal Coordinates Analysis, Torgerson Scaling or Torgerson Gower scaling. Takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain. 2 Metric multidimensional scaling A superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Non-metric multidimensional scaling In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression. Louis Guttman's smallest space analysis (SSA) is an example of a non-metric MDS procedure. Generalized multidimensional scaling An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. 3 Page 1 of 5

Details The data to be analyzed is a collection of I objects (colors, faces, stocks,...) on which a distance function is defined, δ i,j := distance between i th and j th objects. These distances are the entries of the dissimilarity matrix The goal of MDS is, given Δ, to find I vectors such that for all, where is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. 4 In other words, MDS attempts to find an embedding from the I objects into R N such that distances are preserved. If the dimension N is chosen to be 2 or 3, we may plot the vectors x i to obtain a visualization of the similarities between the I objects. Note that the vectors x i are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances. There are various approaches to determining the vectors x i. Usually, MDS is formulated as an optimization problem, where example, is found by minimizing some cost function, for A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimization can be stated analytically in terms of matrix eigendecompositions. Page 2 of 5

Procedure There are several steps in conducting MDS research: 1. Formulating the problem What variables do you want to compare? How many variables do you want to compare? Fewer than 8 (4 pairs) will not give valid results. What purpose is the study to be used for? 2. Obtaining input data a. Perception data: direct approach. Respondents are asked to rate the similarity of two items, usually on a 5 point Likert scale, from most similar to most dissimilar (or least similar). The first comparison pair could be for Coke/Pepsi, for example, the next for Coke/Hires root beer, followed by Pepsi/Dr Pepper, and Dr Pepper/Hires root beer. The number of comparisons Q is a function of the number of items N and can be calculated by Q = N(N 1) / 2. b. Perception data: derived approach. Here, items are decomposed into features that are rated on a semantic differential scale. c. Preference data approach. Respondents are asked to select their preference of one item over another, rather than rate the degree of similarity between two items. 3. Running the MDS statistical program Software for running the procedure is available in many software for statistics. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). 4. Decide the number of dimensions The researcher must decide on the number of dimensions they want the procedure to use. The more dimensions, the better the statistical fit, but the more difficult it is to visualize and interpret the results. 5. Mapping results and interpreting the dimensions The statistical procedure will map the results. The map will plot each item, usually in a low-dimensional space with two or three dimensions. The proximity of products to one another indicates either how similar or preferred they are, depending on which response procedure was used. However, the relationship between the embedding dimensions and the dimensions of system behavior may not be intuitively obvious. Here, a subjective judgment about the correspondence can be made, as found for example in perceptual mapping. 6. Testing results for reliability and validity Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. An R-square of 0.8 is considered good for metric scaling and.9 is considered good for non-metric scaling. Other possible tests include Kruskal s Stress, split-half reliability, data stability tests (i.e., excluding one item), and test-retest reliability. 7. Reporting results comprehensively Along with the mapping, a shortest distance measure such as a Sorenson or Jaccard index and reliability (i.e., stress value) should be reported. It is also advisable to report the MDS algorithm used (e.g., Kruskal or Mather scaling), often defined by the procedure and sometimes featured in lieu of the algorithm report, whether a specified configuration or random initialization was used, the number of Page 3 of 5

runs obtained with the MDS procedure, a substantive interpretation of what the dimensionality represents, any Monte Carlo method results obtained, the number of iterations, an assessment of the stability of the solution, and the proportion of the overall r-square variance explained by each axis. Applications Applications include scientific visualization and data mining in fields such as cognitive science, information science, psychophysics, psychometrics, ecology and marketing. New applications arise in the coverage of autonomous wireless nodes populating a given space or an area. MDS may apply as an enhanced real-time approach to monitoring and managing such areas. MDS has also been used extensively in geostatistics to model the spatial variability of the patterns of an image (by representing them as points in a lower-dimensional space), 2 and natural language processing, for modeling the semantic and affective relatedness of natural language concepts (by representing them as points in a 100-dimensional vector space). 6 In market research, MDS has been used to model the preferences and perceptions of respondents by representing them on visual grids known as perceptual maps. Comparison and advantages Hypothetical customers are asked to compare pairs of products and to make judgments about their degree of similarity. Although other ordination techniques, such as principal components analysis, factor analysis, discriminant analysis, and conjoint analysis, are often used to reveal the underlying dimensions based on item features specified by the researcher, MDS is used to reveal the underlying dimensions from respondents judgments about the similarity of items. This does not require that a list of features be shown to respondents. The underlying dimensions come from respondents judgments about or comparisons made between pairs of items. For these reasons, MDS is the most common technique used in perceptual mapping. Although both MDS and factor analysis BOTH involve eigenanalysis, the data being analyzed are not the same. Component analysis uses singly centered data that adjust variable means to equality (0). 7 By contrast, MDS uses doubly centered data that also adjust for subject differences. Consequently, 1 MDS will provide a space of one less dimension than a factor analytic solution. 2 The origin of the space will be shifted to the centroid of the points in metric MDS. 3 The MDS solution will essentially be the same as the factor analytic solution, ignoring the first factor if the subject means are independent of the MDS scalar products or 4 The MDS and overall factor solutions will be essentially the same if the average correlation between each variable and all other variables is nearly zero, as when each has a mixture of both positive and negative correlations. Davidson (1985) emphasized the importance of the context in which an ordination analysis is conducted. 8 The first factor in abilities testing is typically of great importance since it reflects differences the subjects overall ability. However, the first factor obtained with preference data is Page 4 of 5

often of trivial significance since it generally reflects the subjects overall willingness to employ high versus low ratings. Excluding this factor from such an analysis, which can be done with adlib factoring as well as MDS, often provides a useful simplification. Bibliography 1. Holland, Steven. "NON-METRIC MULTIDIMENSIONAL SCALING (MDS)". Retrieved 27 June 2013. 2. Borg, I., Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207 212. ISBN 0-387-94845-7. 3. Bronstein AM, Bronstein MM, Kimmel R (January 2006). "Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 1168 72. doi:10.1073/pnas.0508601103. PMC 1360551. PMID 16432211. 4. Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. 5. Honarkhah, M and Caers, J, 2010, Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling, Mathematical Geosciences, 42: 487 517 6. Cambria, E, Song, Y, Wang, H and Howard, N, 2013, 'Semantic multi-dimensional scaling for open-domain sentiment analysis", IEEE Intelligent Systems. 7. Nunnally, J.C. and Bernstein, I. H. Psychometric Theory, 3rd ed. New York: McGraw-Hill, 1994., p. 642. ISBN 0071070885, 9780071070881 8. Davidson, M. L. (1985). Multidimensional scaling vs. components analysis of test intercorrelations. Psychological Bulletin, 97, p. 94-105. ISBN 0-89464-662-1 Cox, T.F., Cox, M.A.A. (2001). Multidimensional Scaling. Chapman and Hall. Coxon, Anthony P.M. (1982). The User's Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs. London: Heinemann Educational Books. Green, P. (January 1975). "Marketing applications of MDS: Assessment and outlook". Journal of Marketing 39 (1): 24 31. doi:10.2307/1250799. McCune, B. and Grace, J.B. (2002). Analysis of Ecological Communities. Oregon, Gleneden Beach: MjM Software Design. ISBN 0-9721290-0-6. Torgerson, Warren S. (1958). Theory & Methods of Scaling. New York: Wiley. ISBN 0-89874- 722-8 Page 5 of 5