Robust and Scalable Algorithms for Big Data Analytics

Similar documents

Learning Tools for Big Data Analytics

Big Data Analytics in Future Internet of Things

Cognitive Radio Network as Wireless Sensor Network (II): Security Consideration

IEEE Proof Web Version

So which is the best?

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects

Part II Redundant Dictionaries and Pursuit Algorithms

Signal Processing for Big Data

Bilinear Prediction Using Low-Rank Models

Randomized Robust Linear Regression for big data applications

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Manifold Learning Examples PCA, LLE and ISOMAP

Functional-Repair-by-Transfer Regenerating Codes

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks

ROBUST NETWORK TRAFFIC ESTIMATION VIA SPARSITY AND LOW RANK. Morteza Mardani and Georgios B. Giannakis

Large-Scale Similarity and Distance Metric Learning

Sketch As a Tool for Numerical Linear Algebra

Security Based Data Transfer and Privacy Storage through Watermark Detection

Sequential Non-Bayesian Network Traffic Flows Anomaly Detection and Isolation

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

Sparse recovery and compressed sensing in inverse problems

When is missing data recoverable?

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Detecting Network Anomalies. Anant Shah

DATA ANALYSIS II. Matrix Algorithms

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 10, MAY 15,

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

Bag of Pursuits and Neural Gas for Improved Sparse Coding

8. Linear least-squares

Adaptive Linear Programming Decoding

Application of Synchrophasor Data to Power System Operations

BIG DATA ANALYSIS BASED ON MATHEMATICAL MODEL: A COMPREHENSIVE SURVEY

Machine learning challenges for big data

Learning, Sparsity and Big Data

Cyber-Security Analysis of State Estimators in Power Systems

Big learning: challenges and opportunities

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Linear Codes. Chapter Basics

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Geometric-Guided Label Propagation for Moving Object Detection

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

Digital Video Broadcasting By Satellite

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

Solutions to Math 51 First Exam January 29, 2015

Compressed Sensing & Network Monitoring

Image Compression through DCT and Huffman Coding Technique

Group Testing a tool of protecting Network Security

IEEE JAVA TITLES

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Unsupervised Data Mining (Clustering)

Sparse LMS via Online Linearized Bregman Iteration

The Geometry of Polynomial Division and Elimination

SYMMETRIC EIGENFACES MILI I. SHAH

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Design of LDPC codes

Support Vector Machines with Clustering for Training with Very Large Datasets

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Lecture Notes 2: Matrices as Systems of Linear Equations

SOLVING LINEAR SYSTEMS

Modélisation et résolutions numérique et symbolique

Weakly Secure Network Coding

THE problem of computing sparse solutions (i.e., solutions

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA

Virtual Landmarks for the Internet

Integer Factorization using the Quadratic Sieve

A Direct Numerical Method for Observability Analysis

Streamdrill: Analyzing Big Data Streams in Realtime

Capacity Limits of MIMO Channels

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

Image Super-Resolution via Sparse Representation

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Network (Tree) Topology Inference Based on Prüfer Sequence

CS Introduction to Data Mining Instructor: Abdullah Mueen

Supervised Feature Selection & Unsupervised Dimensionality Reduction

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Visualization of General Defined Space Data

Associative Memory via a Sparse Recovery Model

Transcription:

Robust and Scalable Algorithms for Big Data Analytics Georgios B. Giannakis Acknowledgment: Drs. G. Mateos, K. Slavakis, G. Leus, and M. Mardani Arlington, VA, USA March 22, 2013 1

Roadmap n Robust principal component analysis BIG Ø Linear low-rank models and sparse outliers BIG n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization n Robust sparse embedding via dictionary learning Ø Ø Nonlinear low-rank models Data-adaptive compressed sensing n Concluding remarks Fast Messy 2

Principal component analysis n Motivation: (statistical) learning from high-dimensional data DNA microarray Traffic surveillance n Principal component analysis (PCA) [Pearson 1901] Ø Extraction of low(est)-dimensional structure Ø Applications: source (de)coding, anomaly ID, recommender systems Ø PCA is non-robust to outliers [Huber 81], [Jolliffe 86], [Wright et al 09-12] Objective: robustify PCA by controlling outlier sparsity 3

PCA formulations n Training data n Minimum reconstruction error Ø Compression operator Ø Reconstruction operator n Component analysis model Solution: 4

Robustifying PCA n Outlier variables s.t. outlier otherwise Ø Nominal data obey ; outliers something else Ø Linear regression [Fuchs 99], [Giannakis et al 11] Ø Both and unknown, typically sparse! n Natural (but intractable) estimator (P0) G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Transactions on Signal Processing, pp. 5176-5190, Oct. 2012. 5

Universal robustness n (P0) is NP-hard relax e.g., [Tropp 06] (P1) Ø Role of sparsity-controlling is central Q: Does (P1) yield robust estimates? A: Yap! Huber estimator is a special case 6

Alternating minimization (P1) Ø Ø update: SVD of outlier-compensated data update: row-wise soft-thresholding of residuals -γ γ Proposition : Algorithm 1 s iterates converge to a stationary point of (P1) 7

Video surveillance n Background modeling from video feeds [De la Torre-Black 01] Data PCA Robust PCA Outliers Data: http://www.cs.cmu.edu/~ftorre/ 8

Robust unveiling of communities n Robust kernel PCA for identification of cohesive subgroups n Network: NCAA football teams (vertices), Fall 00 games (edges) ARI=0.8967 Ø Identified exactly: Big 10, Big 12, ACC, SEC, ; Outliers: Independent teams Data: http://www-personal.umich.edu/~mejn/netdata/ 9

Online robust PCA Ø Scalability via exponentially weighted subspace tracking Ø At time, do not re-estimate n Motivation: Real-time big data and memory limitations n Nominal: n Outliers: 10

Roadmap n Robust principal component analysis Ø Linear low-rank models and sparse outliers n Scalable algorithms for big network data Ø (De-) centralized and online rank minimization n Robust embedding via dictionary learning Ø Ø Nonlinear low-rank models Data-adaptive compressed sensing n Concluding remarks 11

Modeling traffic anomalies n Anomalies: changes in origin-destination (OD) flows [Lakhina et al 04] Ø Failures, congestions, DoS attacks, intrusions, flooding n Graph G (N, L) with N nodes, L links, and F flows (F >> L); OD flow z f,t n Packet counts per link l and time slot t Anomaly 1 0.9 0.8 0.7 0.6 f 2 l 0.5 0.4 f 1 0.3 є {0,1} 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 n Matrix model across T time slots: LxT LxF 12

Low-rank plus sparse matrices n Z has low rank, e.g., [Zhang et al 05]; A is sparse across time and flows 4 x 108 a f,t 2 0 0 200 400 600 800 1000 Time index(t) Data: http://math.bu.edu/people/kolaczyk/datasets.html 13

General decomposition problem n Given and routing matrix, identify sparse when is low rank Ø fat but still low rank (P1) n Rank minimization with the nuclear norm, e.g., [Recht-Fazel-Parrilo 10] Ø Principal Comp. Pursuit (PCP) [Candes et al 10], [Chandrasekaran et al 11] 14

Challenges and importance n not necessarily sparse and fat PCP not applicable n LT + FT >> LT X A Y n Important special cases Ø R = I : matrix decomposition with PCP [Candes et al 10] Ø X = 0 : compressive sampling with basis pursuit [Chen et al 01] Ø X = C Lxρ W ρxt and A = 0 : PCA [Pearson 1901] Ø X = 0, R = D unknown: dictionary learning [Olshausen 97] 15

Exact recovery n Noise-free case (P0) Q: Can one recover sparse and low-rank exactly? A: Yes! Under certain conditions on Theorem: Given and, assume every row and column of has at most k<s non-zero entries, and has full row rank. If C1)-C2) hold, then with (P0) exactly recovers C1) C2) M. Mardani, G. Mateos, and G. B. Giannakis,``Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies," IEEE Trans. Information Theory, 2013. 16

In-network processing Smart metering n Robust imputation of network data matrix Network health cartography?????????? Goal: Given few rows per agent, perform distributed cleansing and imputation by leveraging low-rank of nominal data and sparsity of the outliers. n Challenge: not separable across rows (links/agents) G. Mateos and K. Rajawat Dynamic network cartography, IEEE Signal Processing Magazine, May 2013. 17

Separable regularization n Key property V C W n Separable formulation equivalent to (P1) Lxρ rank[x] (P2) Ø Nonconvex; less variables: Proposition: If stat. pt. of (P2) and, then is a global optimum of (P1). 18

Decentralized rank minimization n Alternating-direction method of multipliers (ADMM) solver for (P2) Ø Method [Glowinski-Marrocco 75], [Gabay-Mercier 76] Ø Learning over networks [Schizas-Ribeiro-Giannakis 07] Consensus-based optimization Attains centralized performance M. Mardani, G. Mateos, and G. B. Giannakis, In-network sparsity regularized rank minimization: Algorithms and applications," IEEE Transactions on Signal Processing, 2013. 19

Internet2 data n Real network data Ø Dec. 8-28, 2008 Ø N=11, L=41, F=121, T=504 1 Detection probability 0.8 0.6 [Lakhina04], rank=1 [Lakhina04], rank=2 0.4 [Lakhina04], rank=3 Proposed method [Zhang05], rank=1 0.2 [Zhang05], rank=2 [Zhang05], rank=3 0 0 0.2 0.4 0.6 0.8 1 False alarm probability Anomaly volume 6 5 4 3 2 1 0 100 Flows 50 0 0 100 200 ---- True ---- Estimated Time P fa = 0.03 P d = 0.92 300 400 500 Data: http://www.cs.bu.edu/~crovella/links.html 20

Online rank minimization n Construct an estimated map of anomalies in real time Ø Streaming data model: n Approach: regularized exponentially-weighted LS formulation 5 Tracking cleansed link traffic ATLA--HSTN 4 Real time unveiling of anomalies CHIN--ATLA 2 Link traffic level 0 20 10 0 20 10 DNVR--KSCY HSTN--ATLA ---- Estimated ---- True Anomaly amplitude 0 40 20 0 30 20 10 WASH--STTL WASH--WASH o---- Estimated ---- True 0 Time index (t) 0 0 1000 2000 3000 4000 5000 6000 Time index (t) M. Mardani, G. Mateos, and G. B. Giannakis, "Dynamic anomalography: Tracking network anomalies via sparsity and low rank," IEEE Journal of Selected Topics in Signal Processing, pp. 50-66, Feb. 2013. 21

Roadmap n Robust principal component analysis Ø Linear low-rank models and sparse outliers n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization n Robust sparse embedding via dictionary learning Ø Nonlinear low-rank models; data-adaptive compressed sensing n Concluding remarks 22

Nonlinear low-dimensional models? q Compressive sampling (CS) [Donoho/Candes 06]: Linear operator Ø CS vs data-adaptive principal component analysis (PCA) [Pearson 1901] Ø Data-adaptive nonlinear CS? ; quad-cs [Ohlsson etal 13] q Nonlinear dimensionality reduction for data on manifolds Ø Kernel PCA [Scholkopf etal 98]; SDE [Weinberger 04]; reconstruction? Ø Local linear embedding (LLE) [Roweis-Saul 00]; LEM; MDS; Isomap Ø Sparsity-aware embeddings [Huang etal 10], [Vidal 11], [Kong etal 12] Ø Dictionary learning (DL) [Olshausen 97]; online DL [Mairal etal 10], [Carin etal 11] 23

Learning sparse manifold models q Training data on a smooth but unknown manifold Ø Use matrix to learn dictionary ( ) Sparse training data fit Smooth affine manifold fit Ø reduces and morphs training data to yield a smoother basis for Ø Robust sparse embedding via dictionary learning (RSE-DL) 24

Parsimonious nonlinear embedding q Embedding preserves Ø Reduced complexity embedding step ( ) q RSE-DL appropriate for (de-)compression and reconstruction q Robust sparse coding: works for clustering/classification 25

RSE-DL compression and reconstruction q Operational phase @ Tx: per data vector q Compress: q Operational phase @ Rx: given (possibly noisy) q Reconstruct: Ø Less computationally demanding modules ( ) 26

Test case: Swiss roll Ø Noise on manifold:, channel noise: 27

Comparisons with LLE, RSE, RSGE (Average over 100 realizations) 28

Missing data q USC girl (predates Lena!) with 50% misses q RSE-DL: reduced complexity relative to e.g., Bayesian-type [Chen etal 10]

Concluding summary n Robust PCA; online via robust subspace tracking Ø Leveraging linear low-rank models and outlier sparsity n Unveiling anomalies in large-scale network data Ø Scalable decentralized and online algorithms n Data-adaptive, nonlinear, low-dimensional models n The road ahead Ø Performance bounds? Dynamical network data? Ø Learning via quantized big data (few bits)? Ø RSE-DL for nonlinear compressive sampling? Thank you! 30

Numerical validation n Setup L=105, F=210, T = 420 R ~ Bernoulli(1/2) X o = RPQ, P, Q ~ N(0, 1/FT) a ij ϵ {-1,0,1} w.p. {π/2, 1-π, π/2} n Relative recovery error rank(x0) R ) [r] (r) 50 40 30 20 10 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 2.5 4.5 6.5 8.5 10.5 12.5 % non-zero entries ( ρ ) [(s/ft)%] 0 31