Multidimensional scaling

Multidimensional scaling From Wikipedia, the free encyclopedia Multidimensional scaling (MDS) refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. An MDS algorithm aims to place each object in N-dimensional space such that the betweenobject distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. Unlike principal component analysis wherein most of the variance in the data is captured in the first axis with each subsequent axis containing progressively less information, axes in MDS are arbitrary and distance units along each axis do not reflect equal quantitative distances at other sections of the same axis. 1 The number of dimensions of an MDS plot N can exceed 2 and are specified a priori. Choosing N=2 optimizes the object locations for a two-dimensional scatterplot. 2 Types MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: Classical multidimensional scaling Also known as Principal Coordinates Analysis, Torgerson Scaling or Torgerson Gower scaling. Takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain. 2 Metric multidimensional scaling A superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Non-metric multidimensional scaling In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression. Louis Guttman's smallest space analysis (SSA) is an example of a non-metric MDS procedure. Generalized multidimensional scaling An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. 3 Page 1 of 5

Details The data to be analyzed is a collection of I objects (colors, faces, stocks,...) on which a distance function is defined, δ i,j := distance between i th and j th objects. These distances are the entries of the dissimilarity matrix The goal of MDS is, given Δ, to find I vectors such that for all, where is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. 4 In other words, MDS attempts to find an embedding from the I objects into R N such that distances are preserved. If the dimension N is chosen to be 2 or 3, we may plot the vectors x i to obtain a visualization of the similarities between the I objects. Note that the vectors x i are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances. There are various approaches to determining the vectors x i. Usually, MDS is formulated as an optimization problem, where example, is found by minimizing some cost function, for A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimization can be stated analytically in terms of matrix eigendecompositions. Page 2 of 5

Procedure There are several steps in conducting MDS research: 1. Formulating the problem What variables do you want to compare? How many variables do you want to compare? Fewer than 8 (4 pairs) will not give valid results. What purpose is the study to be used for? 2. Obtaining input data a. Perception data: direct approach. Respondents are asked to rate the similarity of two items, usually on a 5 point Likert scale, from most similar to most dissimilar (or least similar). The first comparison pair could be for Coke/Pepsi, for example, the next for Coke/Hires root beer, followed by Pepsi/Dr Pepper, and Dr Pepper/Hires root beer. The number of comparisons Q is a function of the number of items N and can be calculated by Q = N(N 1) / 2. b. Perception data: derived approach. Here, items are decomposed into features that are rated on a semantic differential scale. c. Preference data approach. Respondents are asked to select their preference of one item over another, rather than rate the degree of similarity between two items. 3. Running the MDS statistical program Software for running the procedure is available in many software for statistics. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). 4. Decide the number of dimensions The researcher must decide on the number of dimensions they want the procedure to use. The more dimensions, the better the statistical fit, but the more difficult it is to visualize and interpret the results. 5. Mapping results and interpreting the dimensions The statistical procedure will map the results. The map will plot each item, usually in a low-dimensional space with two or three dimensions. The proximity of products to one another indicates either how similar or preferred they are, depending on which response procedure was used. However, the relationship between the embedding dimensions and the dimensions of system behavior may not be intuitively obvious. Here, a subjective judgment about the correspondence can be made, as found for example in perceptual mapping. 6. Testing results for reliability and validity Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. An R-square of 0.8 is considered good for metric scaling and.9 is considered good for non-metric scaling. Other possible tests include Kruskal s Stress, split-half reliability, data stability tests (i.e., excluding one item), and test-retest reliability. 7. Reporting results comprehensively Along with the mapping, a shortest distance measure such as a Sorenson or Jaccard index and reliability (i.e., stress value) should be reported. It is also advisable to report the MDS algorithm used (e.g., Kruskal or Mather scaling), often defined by the procedure and sometimes featured in lieu of the algorithm report, whether a specified configuration or random initialization was used, the number of Page 3 of 5

runs obtained with the MDS procedure, a substantive interpretation of what the dimensionality represents, any Monte Carlo method results obtained, the number of iterations, an assessment of the stability of the solution, and the proportion of the overall r-square variance explained by each axis. Applications Applications include scientific visualization and data mining in fields such as cognitive science, information science, psychophysics, psychometrics, ecology and marketing. New applications arise in the coverage of autonomous wireless nodes populating a given space or an area. MDS may apply as an enhanced real-time approach to monitoring and managing such areas. MDS has also been used extensively in geostatistics to model the spatial variability of the patterns of an image (by representing them as points in a lower-dimensional space), 2 and natural language processing, for modeling the semantic and affective relatedness of natural language concepts (by representing them as points in a 100-dimensional vector space). 6 In market research, MDS has been used to model the preferences and perceptions of respondents by representing them on visual grids known as perceptual maps. Comparison and advantages Hypothetical customers are asked to compare pairs of products and to make judgments about their degree of similarity. Although other ordination techniques, such as principal components analysis, factor analysis, discriminant analysis, and conjoint analysis, are often used to reveal the underlying dimensions based on item features specified by the researcher, MDS is used to reveal the underlying dimensions from respondents judgments about the similarity of items. This does not require that a list of features be shown to respondents. The underlying dimensions come from respondents judgments about or comparisons made between pairs of items. For these reasons, MDS is the most common technique used in perceptual mapping. Although both MDS and factor analysis BOTH involve eigenanalysis, the data being analyzed are not the same. Component analysis uses singly centered data that adjust variable means to equality (0). 7 By contrast, MDS uses doubly centered data that also adjust for subject differences. Consequently, 1 MDS will provide a space of one less dimension than a factor analytic solution. 2 The origin of the space will be shifted to the centroid of the points in metric MDS. 3 The MDS solution will essentially be the same as the factor analytic solution, ignoring the first factor if the subject means are independent of the MDS scalar products or 4 The MDS and overall factor solutions will be essentially the same if the average correlation between each variable and all other variables is nearly zero, as when each has a mixture of both positive and negative correlations. Davidson (1985) emphasized the importance of the context in which an ordination analysis is conducted. 8 The first factor in abilities testing is typically of great importance since it reflects differences the subjects overall ability. However, the first factor obtained with preference data is Page 4 of 5

often of trivial significance since it generally reflects the subjects overall willingness to employ high versus low ratings. Excluding this factor from such an analysis, which can be done with adlib factoring as well as MDS, often provides a useful simplification. Bibliography 1. Holland, Steven. "NON-METRIC MULTIDIMENSIONAL SCALING (MDS)". Retrieved 27 June 2013. 2. Borg, I., Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207 212. ISBN 0-387-94845-7. 3. Bronstein AM, Bronstein MM, Kimmel R (January 2006). "Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 1168 72. doi:10.1073/pnas.0508601103. PMC 1360551. PMID 16432211. 4. Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. 5. Honarkhah, M and Caers, J, 2010, Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling, Mathematical Geosciences, 42: 487 517 6. Cambria, E, Song, Y, Wang, H and Howard, N, 2013, 'Semantic multi-dimensional scaling for open-domain sentiment analysis", IEEE Intelligent Systems. 7. Nunnally, J.C. and Bernstein, I. H. Psychometric Theory, 3rd ed. New York: McGraw-Hill, 1994., p. 642. ISBN 0071070885, 9780071070881 8. Davidson, M. L. (1985). Multidimensional scaling vs. components analysis of test intercorrelations. Psychological Bulletin, 97, p. 94-105. ISBN 0-89464-662-1 Cox, T.F., Cox, M.A.A. (2001). Multidimensional Scaling. Chapman and Hall. Coxon, Anthony P.M. (1982). The User's Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs. London: Heinemann Educational Books. Green, P. (January 1975). "Marketing applications of MDS: Assessment and outlook". Journal of Marketing 39 (1): 24 31. doi:10.2307/1250799. McCune, B. and Grace, J.B. (2002). Analysis of Ecological Communities. Oregon, Gleneden Beach: MjM Software Design. ISBN 0-9721290-0-6. Torgerson, Warren S. (1958). Theory & Methods of Scaling. New York: Wiley. ISBN 0-89874- 722-8 Page 5 of 5