Unsupervised and supervised dimension reduction: Algorithms and connections

Similar documents

Data, Measurements, Features

Supervised Feature Selection & Unsupervised Dimensionality Reduction

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Uncorrelated Transferable Feature Extraction for Signal Classification in Brain-Computer Interfaces

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Statistics Graduate Courses

Adaptive Face Recognition System from Myanmar NRC Card

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Final Project Report

A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set

Statistical Machine Learning

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Chapter ML:XI (continued)

Component Ordering in Independent Component Analysis Based on Data Power

Dimensionality Reduction: Principal Components Analysis

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Principal components analysis

Exploratory data analysis for microarray data

High-Dimensional Data Visualization by PCA and LDA

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Feature Selection vs. Extraction

Going Big in Data Dimensionality:

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Visualization by Linear Projections as Information Retrieval

TIETS34 Seminar: Data Mining on Biometric identification

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Visualization of textual data: unfolding the Kohonen maps.

Visualization of Large Font Databases

SYMMETRIC EIGENFACES MILI I. SHAH

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Subspace Analysis and Optimization for AAM Based Face Alignment

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Learning outcomes. Knowledge and understanding. Competence and skills

Machine Learning for Data Science (CS4786) Lecture 1

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

The Scientific Data Mining Process

Treelets A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Predict the Popularity of YouTube Videos Using Early View Data

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Nonlinear Iterative Partial Least Squares Method

THE GOAL of this work is to learn discriminative components

Introduction to Pattern Recognition

Introduction: Overview of Kernel Methods

Factor analysis. Angela Montanari

CITY UNIVERSITY OF HONG KONG 香港城市大學. Self-Organizing Map: Visualization and Data Handling 自組織神經網絡 : 可視化和數據處理

Index Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control.

Multidimensional data analysis

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies

Mixtures of Robust Probabilistic Principal Component Analyzers

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Multi-class Classification: A Coding Based Space Partitioning

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Classification Techniques for Remote Sensing

ADVANCED MACHINE LEARNING. Introduction

Least Squares Estimation

Linear Algebra Review. Vectors

Soft Clustering with Projections: PCA, ICA, and Laplacian

A Computational Framework for Exploratory Data Analysis

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Accurate and robust image superresolution by neural processing of local image representations

Principal Component Analysis

Machine Learning Algorithms for GeoSpatial Data. Applications and Software Tools

Exploratory Data Analysis with MATLAB

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY Principal Components Null Space Analysis for Image and Video Classification

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

CS Introduction to Data Mining Instructor: Abdullah Mueen

Map-Reduce for Machine Learning on Multicore

The Data Mining Process

Principal Component Analysis

Text Analytics (Text Mining)

Environmental Remote Sensing GEOG 2021

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Transcription:

Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona State University

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Analysis (PCA) Issues and extensions Linear Discriminant Analysis (LDA) Issues and extensions Summary

Overview of my research Protein structure analysis Pairwise and multiple structure alignment Gene expression pattern analysis (joint work with Sudhir Kumar s group) Dimension reduction Clustering Biclustering Classification Semi-supervised learning

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Issues and extensions Linear Discriminant Analysis Issues and extensions Summary

Gene expression pattern analysis In Situ staining of a target mrna at several time points during the development of a D. melanogaster embryo gives a detailed spatial-temporal view of the expression pattern of a given gene. Capture spatial gene interactions based on computational analysis of images Microarray gene expression data reveals only the average expression levels. Fail to capture any pivotal spatial patterns

Challenges in gene expression pattern analysis High dimensionality (312*120 pixels) Noise (images taken under different conditions) Large database (about 50000 images ) Growing rapidly Natural solution Apply dimension reduction as a preprocessing step

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Issues and extensions Linear Discriminant Analysis Issues and extensions Summary

What is dimension reduction? Embed the original high-dimensional data in a lower-dimensional space. Critical information should be preserved. Motivation Curse of dimensionality Intrinsic dimensionality Visualization Noise removal

What is dimension reduction? Algorithms Unsupervised versus supervised Linear versus nonlinear Local versus global Many other applications Microarray data analysis Protein classification Face recognition Text mining Image retrieval

Unsupervised versus supervised dimension reduction Unsupervised Principal Component Analysis Independent Component Analysis Canonical Correlation Analysis Partial Least Square Supervised Linear Discriminant Analysis

PCA and LDA Well known for dimension reduction Face recognition Eigenface versus Fisherface Most unsupervised dimension algorithms are closely related to PCA The criterion used in LDA is shared with many other clustering and classification algorithms

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Analysis Issues and extensions Linear Discriminant Analysis Issues and extensions Summary

Principal Component Analysis (PCA) Keep the largest variance Capture global structure of the data Computed via Singular Value Decomposition (SVD) Achieve the minimum reconstruction error Applied as a preprocessing step for many other algorithms in machine learning and data mining.

Issues in PCA PCA does not scale to large databases Solution: GLRAM PCA does not capture local structure Solution: Local PCA

GLRAM for large databases GLRAM extends PCA to 2D data. (PCA) GLRAM scales to large databases. Time: Linear on both the sample size and the data dimension Space: Independent of the sample size

Local PCA Find clusters in the data Compute the transformation matrix considering the cluster structure. Consider the local information Computation: Borrow techniques from GLRAM. Low rank approximations on covariance matrices of all clusters.

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Analysis Issues and extensions Linear Discriminant Analysis Issues and extensions Summary

Linear Discriminant Analysis (LDA) Find the projection with maximum discrimination Effective for classification Computed by solving a generalized eigenvalue problem Optimal when each class is Gaussian and has the same covariance matrix

LDA versus PCA LDA PCA PCA only considers the global structure of the data, while LDA utilizes the class information (maximum separation).

Issues in LDA Not effective when the assumptions are violated The class centroids coincide Class covariances vary Singularity or undersampled problem Data dimension is larger than sample size

Covariance-preserving Projection CPM considers the class covariance information CPM preserves the class covariance information, while maximizing class discrimination Applicable when class centroids coincide or class covariances vary

Other related work Sliced Average Variance Estimator (SAVE). Cook and et al. Heteroscedastic Discriminant Analysis (HDA). Kumar and et al. Another method by Zhu and Hastie Theoretical result CPM is an approximation of HDA Efficiency and robustness SAVE is a special case of CPM

Singularity problem PCA+LDA: Apply PCA before the LDA stage [Swet and Weng, TPAMI 1996; Belhumeour and et al., TPAMI 1997] RLDA: Modify the scatter matrix to make it nonsingular [Friedman, JASA, 1989; Ye and et al., 2005] ULDA: The features in the transformed space are uncorrelated. [Ye and et al., TCBB 2004] OLDA: Orthogonal transformation. [Ye. JMLR 2005] More robust to noise compared with ULDA 2DLDA: Extend LDA to 2D data [Ye. NIPS 2005]

Efficient model selection for RLDA Key idea in RLDA Perturb the scatter matrix to make it nonsingular Limitation How to choose the best regularization parameter (amount of perturbation) efficiently? Proposed approach Compute RLDA with a large number of parameters with approximately the same time as running RLDA a small number of times.

Outline of talk Overview of my research Challenges in gene expression pattern analysis What is dimension reduction? Unsupervised versus Supervised Principal Component Analysis Issues and extensions Linear Discriminant Analysis Issues and extensions Summary

Summary Challenges in gene expression pattern analysis PCA and LDA are important techniques for dimension reduction Limitations of PCA and LDA Recent extensions of PCA and LDA

Related publication J. Ye. Generalized low rank approximations of matrices. Machine Learning, 2005. J. Ye. Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 2005. J. Ye and et al. Two-dimensional Linear Discriminant Analysis. NIPS, 2005. J. Ye and et al. Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004. J. Ye and et al. Efficient Model Selection for Regularized Linear Discriminant Analysis. TR-05-006, Dept. of CSE, ASU. J. Ye and et al. CPM: Covariance-preserving Projection Method. TR-05-007, Dept. of CSE, ASU.