Learning Tools for Big Data Analytics



Similar documents
Robust and Scalable Algorithms for Big Data Analytics

Signal Processing for Big Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Randomized Robust Linear Regression for big data applications

Unsupervised Data Mining (Clustering)

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

So which is the best?

Signal Processing Tools for. Big Data Analy5cs. G. B. Giannakis, K. Slavakis, and G. Mateos. Call for Papers. Nice, France Location and venue

Going Big in Data Dimensionality:

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Statistical machine learning, high dimension and big data

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Greedy Column Subset Selection for Large-scale Data Sets

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Visualization by Linear Projections as Information Retrieval

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Social Media Mining. Data Mining Essentials

Subspace clustering of dimensionality-reduced data

A Computational Framework for Exploratory Data Analysis

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Geometric-Guided Label Propagation for Moving Object Detection

Machine Learning for Data Science (CS4786) Lecture 1

SYMMETRIC EIGENFACES MILI I. SHAH

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

An Overview of Knowledge Discovery Database and Data mining Techniques

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Support Vector Machines with Clustering for Training with Very Large Datasets

Knowledge Discovery from patents using KMX Text Analytics

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

Bilinear Prediction Using Low-Rank Models

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 10, MAY 15,

The Scientific Data Mining Process

Maximum-Margin Matrix Factorization

Sketch As a Tool for Numerical Linear Algebra

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects

Bag of Pursuits and Neural Gas for Improved Sparse Coding

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

Collaborative Filtering. Radek Pelánek

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

Sparse Nonnegative Matrix Factorization for Clustering

CS Introduction to Data Mining Instructor: Abdullah Mueen

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

Big Data Analytics in Future Internet of Things

When is missing data recoverable?

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

The Data Mining Process

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Learning, Sparsity and Big Data

DATA ANALYSIS II. Matrix Algorithms

Large-Scale Similarity and Distance Metric Learning

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Big learning: challenges and opportunities

Clustering & Visualization

Soft Clustering with Projections: PCA, ICA, and Laplacian

Tensor Factorization for Multi-Relational Learning

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

1 Spectral Methods for Dimensionality

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

Visualization of Large Font Databases

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

350 Serra Mall, Stanford, CA

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Signal and Information Processing

THE problem of computing sparse solutions (i.e., solutions

How To Make Visual Analytics With Big Data Visual

Information Management course

CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set

Spark and the Big Data Library

Cluster Analysis: Advanced Concepts

Forschungskolleg Data Analytics Methods and Techniques

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Transcription:

Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURI-FA9550-10-1-0567 Center for Advanced Signal and Image Sciences (CASIS) 1 Livermore, May 13, 2015

Growing data torrent Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May 2011. 2

Big Data: Capturing its value Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May 2011. 3

Challenges BIG Sheer volume of data Decentralized and parallel processing Security and privacy measures Modern massive datasets involve many attributes Parsimonious models to ease interpretability and enhance learning performance Fast Real-time streaming data Online processing Quick-rough answer vs. slow-accurate answer? Outliers and misses Robust imputation approaches Messy 4

Opportunities Big tensor data models and factorizations High-dimensional statistical SP Network data visualization Theoretical and Statistical Foundations of Big Data Analytics Pursuit of low-dimensional structure Analysis of multi-relational data Common principles across networks Resource tradeoffs Randomized algorithms Scalable online, decentralized optimization Algorithms and Implementation Platforms to Learn from Massive Datasets Convergence and performance guarantees Information processing over graphs Novel architectures for large-scale data analytics Robustness to outliers and missing data Graph SP 5

Roadmap Context and motivation Critical Big Data tasks Encompassing and parsimonious data modeling Dimensionality reduction Data cleansing, anomaly detection, and inference Randomized learning via data sketching Conclusions and future research directions 6

Encompassing model Background (low rank) Patterns, innovations, (co-)clusters, outliers Observed data Dictionary Noise Sparse matrix Subset of observations and projection operator allow for misses Large-scale data and/or h Any of unknown 7

Subsumed paradigms Structure leveraging criterion Nuclear norm: : singular val. of -norm (With or without misses) known Compressive sampling (CS) [Candes-Tao 05] Dictionary learning (DL) [Olshausen-Field 97] Non-negative matrix factorization (NMF) [Lee-Seung 99] Principal component pursuit (PCP) [Candes etal 11] Principal component analysis (PCA) [Pearson 1901] 8

PCA formulations Training data Minimum reconstruction error Compression Reconstruction Component analysis model Solution: 9

Dual and kernel PCA SVD: Ma Q Gram matrix Inner products Q Q. What if approximating low-dim space not a hyperplane? Q A1. Stretch it to become linear: Kernel PCA; e.g., [Scholkopf-Smola 01] maps to, and leverages dual PCA in high-dim spaces A2. General (non)linear models; e.g., union of hyperplanes, or, locally linear tangential hyperplanes B. Schölkopf and A. J. Smola, Learning with Kernels, Cambridge, MIT Press, 2001 10

Identification of network communities Kernel PCA instrumental for partitioning of large graphs (spectral clustering) Relies on graph Laplacian to capture nodal correlations Facebook egonet 744 nodes, 30,023 edges arxiv collaboration network (General Relativity) 4,158 nodes, 13,422 edges For random sketching and validation reduces complexity to P. A. Traganitis, K. Slavakis, and G. B. Giannakis, Spectral clustering of large-scale communities via random sketching and validation, Proc. Conf. on Info. Science and Systems, Baltimore, Maryland, March 18-20, 2015. 11

Local linear embedding For each find neighborhood, e.g., k-nearest neighbors Weight matrix captures local affine relations Sparse and [Elhamifar-Vidal 11] Identify low-dimensional vectors preserving local geometry [Saul-Roweis 03] Solution: The rows of are the minor, excluding, L. K. Saul and S. T. Roweis, Think globally, fit locally: Unsupervised learning of low dimensional manifolds, J. Machine Learning Research, vol. 4, pp. 119-155, 2003. 12

Dictionary learning Solve for dictionary and sparse : (Lasso task; sparse coding) (Constrained LS task) Alternating minimization; both and are convex Under conditions, converges to a stationary point of [Tseng 01] B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vis. Res., vol. 37, no. 23, pp. 3311 3325, 1997. 13

Joint DL-LLE paradigm DL fit LLE fit Sparsity regularization Dictionary morphs data to a smooth basis; reduces noise and complexity Inpainting DL-LLE offers by local-affine-geometry-preserving data-driven non-linear embedding DL; for robust misses; (de-)compression PSNR db K. Slavakis, G. B. Giannakis, and G. Leus, Robust sparse embedding and reconstruction via dictionary learning, Proc. of Conf. on Info. Science and Systems, JHU, Baltimore, March 2013. 14

From low-rank matrices to tensors Data cube, e.g., sub-sampled MRI frames a r A= α i b r B= β i PARAFAC decomposition per slab t [Harshman 70] c r Tensor subspace comprises R rank-one matrices C= γ i Goal: Given streaming, learn the subspace matrices (A,B) recursively, and impute possible misses of Y t J. A. Bazerque, G. Mateos, and G. B. Giannakis, "Rank regularization and Bayesian inference for tensor completion and xtrapolation," IEEE Trans. on Signal Processing, vol. 61, no. 22, pp. 5689-5703, November 2013. 15

Online tensor subspace learning Image domain low tensor rank Tikhonov regularization promotes low rank Proposition [Bazerque-GG 13]: With Stochastic alternating minimization; parallelizable across bases Real-time reconstruction (FFT per iteration) M. Mardani, G. Mateos, and G. B. Giannakis, "Subspace learning and imputation for streaming big data matrices and tensors," IEEE Trans. on Signal Processing, vol. 63, pp. 2663-2677, May 2015. 16

Dynamic cardiac MRI test in vivo dataset: 256 k-space 200x256 frames Ground-truth frame Sampling trajectory R=100, 90% misses R=150, 75% misses Potential for accelerating MRI at high spatio-temporal resolution Low-rank plus can also capture motion effects M. Mardani and G. B. Giannakis, "Accelerating dynamic MRI via tensor subspace learning, Proc. of ISMRM 23rd Annual Meeting and Exhibition, Toronto, Canada, May 30 - June 5, 2015. 17

Roadmap Context and motivation Critical Big Data tasks Randomized learning via data sketching Johnson-Lindenstrauss lemma Randomized linear regression Randomized clustering Conclusions and future research directions 18

Randomized linear algebra Basic tools: Random sampling and random projections Attractive features Reduced dimensionality to lower complexity with Big Data Rigorous error analysis at reduced dimension Ordinary least-squares (LS) Given If SVD incurs complexity. Q: What if? M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp. 123-224, Nov. 2011. 19

Randomized LS for linear regression LS estimate using (pre-conditioned) random projection matrix R d x D Random diagonal w/ and Hadamard matrix subsets of data obtained by uniform sampling/scaling via yield LS estimates of comparable quality Select reduced dimension Complexity reduced from to N. Ailon and B. Chazelle, The fast Johnson-Lindenstrauss transform and approximate nearest neighbors, SIAM Journal on Computing, 39(1):302 322, 2009. 20

Johnson-Lindenstrauss lemma The workhorse for proofs involving random projections JL lemma: If, integer, and reduced dimension satisfies then for any there exists a mapping s.t. Almost preserves pairwise distances! ( ) If with i.i.d. entries of and reduced dimension, then ( ) holds w.h.p. [Indyk-Motwani'98] If with i.i.d. uniform over {+1,-1} entries of and reduced dimension as in JL lemma, then holds w.h.p. [Achlioptas 01] ( ) W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math, vol. 26, pp. 189 206, 1984. 21

Performance of randomized LS Theorem For any, if, then w.h.p. condition number of ; and Uniform sampling vs Hadamard preconditioning D = 10,000 and p =50 Performance depends on X M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp. 123-224, Nov. 2011. 22

Online censoring for large-scale regression Key idea: Sequentially test and update RLS estimates only for informative data Adaptive censoring (AC) rule Criterion reveals causal support vectors (SVs) Threshold controls avg. data reduction: D. K. Berberidis, G. Wang, G. B. Giannakis, and V. Kekatos, "Adaptive Estimation from Big Data via Censored Stochastic Approximation," Proc. of Asilomar Conf., Pacific Grove, CA, Nov. 2014. 23

Censoring algorithms and performance AC least mean-squares (LMS) AC recursive least-squares (RLS) at complexity Proposition: AC-RLS AC-LMS AC Kalman Filtering and Smoothing for tracking with a budget D. K. Berberidis, and G. B. Giannakis, "Online Censoring for Large-Scale Regressions," IEEE Trans. on SP, 2015 (submitted); also in Proc. of ICASSP, Brisbane, Australia, April 2015.

Censoring vis-a-vis random projections Random projections for linear regression [Mahoney 11] Data-agnostic reduction decoupled from LS solution Adaptive censoring (AC-RLS) Data-driven measurement selection Suitable also for streaming data Minimal memory requirements 25

Performance comparison Synthetic: D=10,000, p=300 (50 MC runs); Real data: estimated from full set Highly non-uniform data AC-RLS outperforms alternatives at comparable complexity Robustness to uniform data (all rows of X equally important ) 26

Big data clustering Given with assign them to clusters Key idea: Reduce dimensionality via random projections Desiderata: Preserve the pairwise data distances in lower dimensions Feature extraction Construct combined features (e.g., via ) Apply K-means to -space Feature selection Select of input features (rows of ) Apply K-means to -space 27

Random sketching and validation (SkeVa) Randomly select informative dimensions Algorithm For Sketch dimensions: Run k-means on Re-sketch dimensions Augment centroids, Validate using consensus set Similar approaches possible for Sequential and kernel variants available P. A. Traganitis, K. Slavakis, and G. B. Giannakis, Clustering High-Dimensional Data via Random Sampling and Consensus, Proc. of GlobalSIP, Atlanta, GA, December 2014. 28 28

RP versus SkeVA comparisons KDDb dataset (subset) D = 2,990,384, T = 10,000, K = 2 RP: [Boutsidis etal 13] SkeVa: Sketch and validate P. A. Traganitis, K. Slavakis, and G. B. Giannakis, "Sketch and Validate for Big Data Clustering," IEEE Journal on Special Topics in Signal Processing, June 2015. 29

Closing comments Big Data modeling and tasks Dimensionality reduction Succinct representations Vectors, matrices, and tensors Learning algorithms Data sketching via random projections Streaming, parallel, decentralized Implementation platforms Scalable computing platforms Analytics in the cloud K. Slavakis, G. B. Giannakis, and G. Mateos, Modeling and optimization for Big Data analytics, 30 IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 18-31, Sep. 2014.