Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure
|
|
|
- Baldwin Fowler
- 9 years ago
- Views:
Transcription
1 Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure Chun-houh Chen, hun-chuan Chang, Yueh-Yun Chi, and Chih-Wen Ou-Young Academia inica, Taiwan Abstract: GAP is a dimension free visualization environment for multivariate data structure. Given a multivariate data set, GAP first compute the proximity matrices for variables as well as for subjects. Proper seriations (permutations) are searched for rearrange these two matrices to satisfy certain properties. Double sorted raw data matrix together with two sorted proximity matrices are then projected through appropriate color spectrums to create matrix maps. These three maps should be cross-examined to identify important information structure embedded in the raw data matrix. GAP has unique seriation algorithm for identifying permutations of matrices with better global sense and clustering algorithm for finding multiple clustering patterns co-exist in proximity matrices. Extensions have been added to GAP for analyzing more complicated formats such as categorical, longitudinal, and multiple-set structure. CONVERGING EQUENCE OF CORRELATION MATRICE The computation kernel of GAP is a converging sequence of iteratively generated correlation matrices, Figure. This sequence of correlation matrices is used to dynamically identify seriation and clustering for a given proximity matrix. D R () R () R (2) R (3) R (4) R (5) R (6) R (7) R (8 ) R (9) R ( ) DL DL ρ ( D) = 5 ρ ( ) = 49 ρ ( ) = 49 ρ (2 ) = 4 DL DL AH2 AH3 DL AH DL6 TH NE NA BE NC NB4 NA7 ND NA5 NB2 NB ρ (3 ) = 2 ρ (4 ) = 7 ρ (5 ) = 3 ρ (6 ) = 2 DL6 AH3 AH2AHDL TH NE BE NC NA NA7 NB NB2 NB4 NA5 ND DL DL DL DL6 AH2 AH3 DL AH DL DL6 AH2 AH3 AHDL BE NE TH NC NB2 NB NB4 NA7 NA NA5 ND TH NE BE NC NA NA5 NA7 NB NB2 NB4 ND DL DL DL6 AH3 AH2 DL DL TH NE BE AH DL DL6 AH2 AH3 AH DL NC NB2 NA NB NA5 NA7 NB4 ND TH NE BE NA NA5 NA7 NB NB2 NC ND NB4 DL DL DL DL DL6 AH3 AH2 AH DL DL6 AH3 AH2AH DL TH NE BE NC NB2 NA NB NA5 NA7 NB4 ND TH NE BE NA NA5 NA7 NB NB2 NB4 NC ND ρ (7 ) = 2 ρ (8) = 2 ρ (9 ) = 2 ρ ( ) = DL DL AH2 AH3 DL6 AH DL TH NA NA5 NA7 NB NB2 BE NB4 ND NC NE DL6 DL DL AH AH2 AH3 DL BE TH NA NA5 NA7 NB NB2 NB4 NC ND NE AH AH2 AH3 DL6 DL DL AH AH2 AH3 DL6 DL DL NA NA TH NA5 NA7 NB NB2 NB4 NC ND NE TH NA5 NA7 NB NB2 NB4 NC ND NE BE BE AH AH2 AH3 DL6 DL DL BE TH NA NA5 NA7 NB NB2 NB4 NC ND NE (a) (b) Figure. Converging equence of Iteratively Generated Correlation Matrices. (a). orted Color Maps for the Correlation Matrices; (b). First Two Eigen-Vectors for the Correlation Matrices. 2 BAIC PRINCIPLE OF GAP There are three major pieces of information contained in any multivariate data set with n subjects and p variables:. the linkage amongst n subject points in the p-dimensional space; 2. the
2 linkage between p variable vectors in the n-dimensional space; and 3. the interaction linkage between the sets of subjects and variables. We use a psychosis disorder data with 95 patients on 5 symptoms to illustrate the framework of a standard GAP analysis, in Figure 2. GAP integrates the following four major steps to extract and summarize information embedded in a multivariate data set. 2. Raw Data and Proximity Matrix Maps with uitable Color Projection The raw data matrix is denoted as D a in Figure 2a. A gray spectrum is applied to project numerical numbers into gray dots with different intensities. The correlation matrix is calculated as the proximity matrix V a for the 5 symptoms. For the 95 patients, also the correlation matrix is used as the proximity matrix. The diverging blue-red color scheme is used to represent the bi-directional property of the correlation coefficients. (a) (b) V a V b D a a D b b (c) (d) V c V d v v2 v3 v4 v5 D c c v v2 v3 v4 v5 D d d Figure 2. tandard GAP Procedure.
3 (a). Original Raw Data and Proximity Matrices Maps; (b). orted Data Matrix Map and Proximity Matrix Maps; (c). Clustering for Variables and ubjects; (d). ufficient Graph with Three Multivariate Linkages. 2.2 The orted Matrix Maps with the Principle of Geometry The next step is to find proper seriations for V a and a respectively. eriation is a data analytic tool for finding a permutation or ordering of a set of objects using a data matrix. The seriations are then applied to arrange the two correlation matrices V a and a into V b and b in Figure 2b. The same seriations are also used to reshape the raw data matrix D a into D b. The difference between V a and V b is not much since V a is already grouped by the symptom tables. However, there is a dramatic change from a to b since the patients are admitted in a random order. There is a clear latent structure in D b. A band of dark gray dots moves from the upper right corner to the lower left corner. 2.3 Partitioned Matrix Maps with near tationary Iterations After the seriations have been applied to the matrix maps, the potential groups of variable and clusters of subject are identified using dynamic clustering procedures (to be discussed in ection 2) as displayed in Figure 2c. The sorted proximity matrix maps for variables and subjects are then partitioned into squares on the diagonal for within-group mean correlation structure and rectangles off diagonal for between-group mean correlation relationship, Figure 2c. The double sorted raw data matrix map is also partitioned into rectangles to represent the mean interaction effect between each subject-cluster on every variable-group. 2.4 The ufficient Graph with Three Multivariate Linkages In order to extract and summarize the visualized information in Figure 2b, we can further convert these matrix maps into a simplified version. Illustrated in Figure 2d are the mean-structure maps of the three matrices for raw data and proximities. These three mosaic-displays in Figure 2d contain the principal structural information embedded in the original data set. The mean function in Figure 2d can be replaced with any statistic for displaying desired information structure. We shall name these three mosaic displays the sufficient graph for a multivariate data set. The sufficient graph is then used to answer the three multivariate problems raised by the psychiatrist. Fifty symptoms are divided into five symptom-groups with different within and between group structure.
4 Ninety-five patients are also grouped into three clusters. The general behavior of these three patient-clusters on each of the five symptom-groups can now be easily comprehended. 3 EXTENION OF GAP ection 2 introduced a typical GAP procedure for a data set with continuous variables measured at a single time point. We have developed several modules of GAP for handling many possible variants of data structure. 3. Categorical GAP Figure 3 displays four matrix maps with different data type. Categorical GAP with dual scaling can be used for information visualization for binary and nominal data matrices. Continuous Ordinal Binary Categorical ACGT Figure 3. Information Visualization for Data Matrices with Different Data Types. 3.2 Longitudinal GAP It is possible that data profile may be measured at multiple time points as illustrated in Figure 4. When the data profile is observed at multiple time points, we have an extra piece of linkage for subjects and variables over time. This extra linkage makes the visualization of longitudinal multivariate data profiles a much harder statistical challenge than
5 DL6 DL DL AH3 AH AH2 DL ND NA7 NA5 NA NB4 NB NB2 NC NE BE TH BPR7 BPR3 BPR4 BPR4 BPR3 BPR6 BPR9 BPR6 BPR5 BPR BPR5 BPR BPR BPR8 DL6 DL DL AH3 AH AH2 DL ND NA7 NA5 NA NB4 NB NB2 NC NE BE TH BPR7 BPR3 BPR4 BPR4 BPR3 BPR6 BPR9 BPR6 BPR5 BPR2 BPR BPR2 BPR5 BPR BPR BPR8 the single profile GAP procedure. Longitudinal GAP is created for visualizing data profiles with longitudinal structure. t t2 t3 Figure 4. Composed Color Profile with eriation for Data Matrices at Multiple Time Points 3.3 Canonical GAP Given two data sets with different sets of variables for the same subjects, we would like to explore the relationship with Canonical GAP, Figure 5. Corr(X) Corr(X,Y) Corr((XY)') Corr(Y,X) BPR2 BPR2 Corr(Y) Corr((XY)') X Y Figure 5. Information Visualization for Comparing Two ets of Variables with Canonical GAP.
6 3.4 Cartograph GAP When each subject in a multivariate categorical data set is affiliated to a district in a map, Cartography GAP extents the power of Categorical GAP to systematically identify suitable color scheme for coloring districts in the map, Figure 6. x _ ŽØ ÚŽ s À x š- q x n x _ø yƒðø ÆÁŽÈø s Àø ]ƽø x ø Û ß s \ßÐ Û ß s \ ßÐ š ²ø nßîø Ž Lø š- qø x nø ŽØø à Fø x Fø ž ø ºÍ Úø ø s øø Figure 6. Cartography GAP for Taiwan Presidential Election 2.
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Rank one SVD: un algorithm pour la visualisation d une matrice non négative
Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
DATA SCIENCE Workshop November 12-13, 2015
DATA SCIENCE Workshop November 12-13, 2015 Paris-Dauphine University Place du Maréchal de Lattre de Tassigny, 75016 Paris DATA SCIENCE Workshop will be held at Paris-Dauphine university, on November 12
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Singular Spectrum Analysis with Rssa
Singular Spectrum Analysis with Rssa Maurizio Sanarico Chief Data Scientist SDG Consulting Milano R June 4, 2014 Motivation Discover structural components in complex time series Working hypothesis: a signal
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
Principal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
5: Magnitude 6: Convert to Polar 7: Convert to Rectangular
TI-NSPIRE CALCULATOR MENUS 1: Tools > 1: Define 2: Recall Definition --------------- 3: Delete Variable 4: Clear a-z 5: Clear History --------------- 6: Insert Comment 2: Number > 1: Convert to Decimal
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
How to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS
Examples: Exploratory Factor Analysis CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Exploratory factor analysis (EFA) is used to determine the number of continuous latent variables that are needed to
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Math 1050 Khan Academy Extra Credit Algebra Assignment
Math 1050 Khan Academy Extra Credit Algebra Assignment KhanAcademy.org offers over 2,700 instructional videos, including hundreds of videos teaching algebra concepts, and corresponding problem sets. In
CLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Partial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process
MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process Business Intelligence and Decision Making Professor Jason Chen The analytical hierarchy process (AHP) is a systematic procedure
A MULTIVARIATE OUTLIER DETECTION METHOD
A MULTIVARIATE OUTLIER DETECTION METHOD P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA e-mail: [email protected] Abstract A method for the detection of multivariate
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
1 Example of Time Series Analysis by SSA 1
1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales
Introduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
3.1 State Space Models
31 State Space Models In this section we study state space models of continuous-time linear systems The corresponding results for discrete-time systems, obtained via duality with the continuous-time models,
Introduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
Introduction to Clustering
Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi
Factor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series
Sequences and Series Overview Number of instruction days: 4 6 (1 day = 53 minutes) Content to Be Learned Write arithmetic and geometric sequences both recursively and with an explicit formula, use them
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Estimation of Fractal Dimension: Numerical Experiments and Software
Institute of Biomathematics and Biometry Helmholtz Center Münhen (IBB HMGU) Institute of Computational Mathematics and Mathematical Geophysics, Siberian Branch of Russian Academy of Sciences, Novosibirsk
Chapter 7 Factor Analysis SPSS
Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often
Regression Clustering
Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm
Performing a data mining tool evaluation
Performing a data mining tool evaluation Start with a framework for your evaluation Data mining helps you make better decisions that lead to significant and concrete results, such as increased revenue
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
QUALITY ENGINEERING PROGRAM
QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Elementary Matrices and The LU Factorization
lementary Matrices and The LU Factorization Definition: ny matrix obtained by performing a single elementary row operation (RO) on the identity (unit) matrix is called an elementary matrix. There are three
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.
Vectors 2 The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Launch Mathematica. Type
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS Cativa Tolosa, S. and Marajofsky, A. Comisión Nacional de Energía Atómica Abstract In the manufacturing control of Fuel
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Grade 6 Mathematics Performance Level Descriptors
Limited Grade 6 Mathematics Performance Level Descriptors A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Grade 6 Mathematics. A student at this
Goodness of fit assessment of item response theory models
Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing
An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values
Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
Multivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
Data Mining 5. Cluster Analysis
Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
Master of Arts in Mathematics
Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA
Are Image Quality Metrics Adequate to Evaluate the Quality of Geometric Objects? Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA ABSTRACT
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening
, pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae
Common factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
Computer program review
Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis
Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis M. Kreuseler, T. Nocke, H. Schumann, Institute of Computer Graphics University of Rostock, D-18059 Rostock, Germany
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
Similar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
The PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
