Robust and Scalable Algorithms for Big Data Analytics

Size: px
Start display at page:

Download "Robust and Scalable Algorithms for Big Data Analytics"

Transcription

1 Robust and Scalable Algorithms for Big Data Analytics Georgios B. Giannakis Acknowledgment: Drs. G. Mateos, K. Slavakis, G. Leus, and M. Mardani Arlington, VA, USA March 22,

2 Roadmap n Robust principal component analysis BIG Ø Linear low-rank models and sparse outliers BIG n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization n Robust sparse embedding via dictionary learning Ø Ø Nonlinear low-rank models Data-adaptive compressed sensing n Concluding remarks Fast Messy 2

3 Principal component analysis n Motivation: (statistical) learning from high-dimensional data DNA microarray Traffic surveillance n Principal component analysis (PCA) [Pearson 1901] Ø Extraction of low(est)-dimensional structure Ø Applications: source (de)coding, anomaly ID, recommender systems Ø PCA is non-robust to outliers [Huber 81], [Jolliffe 86], [Wright et al 09-12] Objective: robustify PCA by controlling outlier sparsity 3

4 PCA formulations n Training data n Minimum reconstruction error Ø Compression operator Ø Reconstruction operator n Component analysis model Solution: 4

5 Robustifying PCA n Outlier variables s.t. outlier otherwise Ø Nominal data obey ; outliers something else Ø Linear regression [Fuchs 99], [Giannakis et al 11] Ø Both and unknown, typically sparse! n Natural (but intractable) estimator (P0) G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Transactions on Signal Processing, pp , Oct

6 Universal robustness n (P0) is NP-hard relax e.g., [Tropp 06] (P1) Ø Role of sparsity-controlling is central Q: Does (P1) yield robust estimates? A: Yap! Huber estimator is a special case 6

7 Alternating minimization (P1) Ø Ø update: SVD of outlier-compensated data update: row-wise soft-thresholding of residuals -γ γ Proposition : Algorithm 1 s iterates converge to a stationary point of (P1) 7

8 Video surveillance n Background modeling from video feeds [De la Torre-Black 01] Data PCA Robust PCA Outliers Data: 8

9 Robust unveiling of communities n Robust kernel PCA for identification of cohesive subgroups n Network: NCAA football teams (vertices), Fall 00 games (edges) ARI= Ø Identified exactly: Big 10, Big 12, ACC, SEC, ; Outliers: Independent teams Data: 9

10 Online robust PCA Ø Scalability via exponentially weighted subspace tracking Ø At time, do not re-estimate n Motivation: Real-time big data and memory limitations n Nominal: n Outliers: 10

11 Roadmap n Robust principal component analysis Ø Linear low-rank models and sparse outliers n Scalable algorithms for big network data Ø (De-) centralized and online rank minimization n Robust embedding via dictionary learning Ø Ø Nonlinear low-rank models Data-adaptive compressed sensing n Concluding remarks 11

12 Modeling traffic anomalies n Anomalies: changes in origin-destination (OD) flows [Lakhina et al 04] Ø Failures, congestions, DoS attacks, intrusions, flooding n Graph G (N, L) with N nodes, L links, and F flows (F >> L); OD flow z f,t n Packet counts per link l and time slot t Anomaly f 2 l f є {0,1} n Matrix model across T time slots: LxT LxF 12

13 Low-rank plus sparse matrices n Z has low rank, e.g., [Zhang et al 05]; A is sparse across time and flows 4 x 108 a f,t Time index(t) Data: 13

14 General decomposition problem n Given and routing matrix, identify sparse when is low rank Ø fat but still low rank (P1) n Rank minimization with the nuclear norm, e.g., [Recht-Fazel-Parrilo 10] Ø Principal Comp. Pursuit (PCP) [Candes et al 10], [Chandrasekaran et al 11] 14

15 Challenges and importance n not necessarily sparse and fat PCP not applicable n LT + FT >> LT X A Y n Important special cases Ø R = I : matrix decomposition with PCP [Candes et al 10] Ø X = 0 : compressive sampling with basis pursuit [Chen et al 01] Ø X = C Lxρ W ρxt and A = 0 : PCA [Pearson 1901] Ø X = 0, R = D unknown: dictionary learning [Olshausen 97] 15

16 Exact recovery n Noise-free case (P0) Q: Can one recover sparse and low-rank exactly? A: Yes! Under certain conditions on Theorem: Given and, assume every row and column of has at most k<s non-zero entries, and has full row rank. If C1)-C2) hold, then with (P0) exactly recovers C1) C2) M. Mardani, G. Mateos, and G. B. Giannakis,``Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies," IEEE Trans. Information Theory,

17 In-network processing Smart metering n Robust imputation of network data matrix Network health cartography?????????? Goal: Given few rows per agent, perform distributed cleansing and imputation by leveraging low-rank of nominal data and sparsity of the outliers. n Challenge: not separable across rows (links/agents) G. Mateos and K. Rajawat Dynamic network cartography, IEEE Signal Processing Magazine, May

18 Separable regularization n Key property V C W n Separable formulation equivalent to (P1) Lxρ rank[x] (P2) Ø Nonconvex; less variables: Proposition: If stat. pt. of (P2) and, then is a global optimum of (P1). 18

19 Decentralized rank minimization n Alternating-direction method of multipliers (ADMM) solver for (P2) Ø Method [Glowinski-Marrocco 75], [Gabay-Mercier 76] Ø Learning over networks [Schizas-Ribeiro-Giannakis 07] Consensus-based optimization Attains centralized performance M. Mardani, G. Mateos, and G. B. Giannakis, In-network sparsity regularized rank minimization: Algorithms and applications," IEEE Transactions on Signal Processing,

20 Internet2 data n Real network data Ø Dec. 8-28, 2008 Ø N=11, L=41, F=121, T=504 1 Detection probability [Lakhina04], rank=1 [Lakhina04], rank=2 0.4 [Lakhina04], rank=3 Proposed method [Zhang05], rank=1 0.2 [Zhang05], rank=2 [Zhang05], rank= False alarm probability Anomaly volume Flows True ---- Estimated Time P fa = 0.03 P d = Data: 20

21 Online rank minimization n Construct an estimated map of anomalies in real time Ø Streaming data model: n Approach: regularized exponentially-weighted LS formulation 5 Tracking cleansed link traffic ATLA--HSTN 4 Real time unveiling of anomalies CHIN--ATLA 2 Link traffic level DNVR--KSCY HSTN--ATLA ---- Estimated ---- True Anomaly amplitude WASH--STTL WASH--WASH o---- Estimated ---- True 0 Time index (t) Time index (t) M. Mardani, G. Mateos, and G. B. Giannakis, "Dynamic anomalography: Tracking network anomalies via sparsity and low rank," IEEE Journal of Selected Topics in Signal Processing, pp , Feb

22 Roadmap n Robust principal component analysis Ø Linear low-rank models and sparse outliers n Scalable algorithms for big network data analytics Ø (De-) centralized and online rank minimization n Robust sparse embedding via dictionary learning Ø Nonlinear low-rank models; data-adaptive compressed sensing n Concluding remarks 22

23 Nonlinear low-dimensional models? q Compressive sampling (CS) [Donoho/Candes 06]: Linear operator Ø CS vs data-adaptive principal component analysis (PCA) [Pearson 1901] Ø Data-adaptive nonlinear CS? ; quad-cs [Ohlsson etal 13] q Nonlinear dimensionality reduction for data on manifolds Ø Kernel PCA [Scholkopf etal 98]; SDE [Weinberger 04]; reconstruction? Ø Local linear embedding (LLE) [Roweis-Saul 00]; LEM; MDS; Isomap Ø Sparsity-aware embeddings [Huang etal 10], [Vidal 11], [Kong etal 12] Ø Dictionary learning (DL) [Olshausen 97]; online DL [Mairal etal 10], [Carin etal 11] 23

24 Learning sparse manifold models q Training data on a smooth but unknown manifold Ø Use matrix to learn dictionary ( ) Sparse training data fit Smooth affine manifold fit Ø reduces and morphs training data to yield a smoother basis for Ø Robust sparse embedding via dictionary learning (RSE-DL) 24

25 Parsimonious nonlinear embedding q Embedding preserves Ø Reduced complexity embedding step ( ) q RSE-DL appropriate for (de-)compression and reconstruction q Robust sparse coding: works for clustering/classification 25

26 RSE-DL compression and reconstruction q Operational Tx: per data vector q Compress: q Operational Rx: given (possibly noisy) q Reconstruct: Ø Less computationally demanding modules ( ) 26

27 Test case: Swiss roll Ø Noise on manifold:, channel noise: 27

28 Comparisons with LLE, RSE, RSGE (Average over 100 realizations) 28

29 Missing data q USC girl (predates Lena!) with 50% misses q RSE-DL: reduced complexity relative to e.g., Bayesian-type [Chen etal 10]

30 Concluding summary n Robust PCA; online via robust subspace tracking Ø Leveraging linear low-rank models and outlier sparsity n Unveiling anomalies in large-scale network data Ø Scalable decentralized and online algorithms n Data-adaptive, nonlinear, low-dimensional models n The road ahead Ø Performance bounds? Dynamical network data? Ø Learning via quantized big data (few bits)? Ø RSE-DL for nonlinear compressive sampling? Thank you! 30

31 Numerical validation n Setup L=105, F=210, T = 420 R ~ Bernoulli(1/2) X o = RPQ, P, Q ~ N(0, 1/FT) a ij ϵ {-1,0,1} w.p. {π/2, 1-π, π/2} n Relative recovery error rank(x0) R ) [r] (r) % non-zero entries ( ρ ) [(s/ft)%] 0 31

Learning Tools for Big Data Analytics

Learning Tools for Big Data Analytics Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURI-FA9550-10-1-0567 Center for Advanced Signal and Image Sciences

More information

C variance Matrices and Computer Network Analysis

C variance Matrices and Computer Network Analysis On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting

More information

Big Data Analytics in Future Internet of Things

Big Data Analytics in Future Internet of Things Big Data Analytics in Future Internet of Things Guoru Ding, Long Wang, Qihui Wu College of Communications Engineering, PLA University of Science and Technology, Nanjing 210007, China email: dingguoru@gmail.com;

More information

Cognitive Radio Network as Wireless Sensor Network (II): Security Consideration

Cognitive Radio Network as Wireless Sensor Network (II): Security Consideration Cognitive Radio Network as Wireless Sensor Network (II): Security Consideration Feng Lin, Zhen Hu, Shujie Hou, Jingzhi Yu, Changchun Zhang, Nan Guo, Michael Wicks, Robert C Qiu, Kenneth Currie Cognitive

More information

IEEE Proof Web Version

IEEE Proof Web Version IEEE/ACM TRANSACTIONS ON NETWORKING 1 Estimating Traffic Anomaly Maps via Network Tomography Morteza Mardani, Student Member, IEEE, Georgios B. Giannakis, Fellow, IEEE Abstract Mapping origin-destination

More information

So which is the best?

So which is the best? Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects See All by Looking at A Few: Sparse Modeling for Finding Representative Objects Ehsan Elhamifar Johns Hopkins University Guillermo Sapiro University of Minnesota René Vidal Johns Hopkins University Abstract

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

Signal Processing for Big Data

Signal Processing for Big Data Signal Processing for Big Data G. B. Giannakis, K. Slavakis, and G. Mateos Acknowledgments: NSF Grants EARS-1343248, EAGER-1343860 MURI Grant No. AFOSR FA9550-10-1-0567 Lisbon, Portugal 1 September 1,

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Randomized Robust Linear Regression for big data applications

Randomized Robust Linear Regression for big data applications Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Functional-Repair-by-Transfer Regenerating Codes

Functional-Repair-by-Transfer Regenerating Codes Functional-Repair-by-Transfer Regenerating Codes Kenneth W Shum and Yuchong Hu Abstract In a distributed storage system a data file is distributed to several storage nodes such that the original file can

More information

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks Vol.8, No.1 (14), pp.1-8 http://dx.doi.org/1.1457/ijsia.14.8.1.1 An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks Sai Ji 1,, Liping Huang, Jin Wang, Jian

More information

Inference from sub-nyquist Samples

Inference from sub-nyquist Samples Inference from sub-nyquist Samples Alireza Razavi, Mikko Valkama Department of Electronics and Communications Engineering/TUT Characteristics of Big Data (4 V s) Volume: Traditional computing methods are

More information

ROBUST NETWORK TRAFFIC ESTIMATION VIA SPARSITY AND LOW RANK. Morteza Mardani and Georgios B. Giannakis

ROBUST NETWORK TRAFFIC ESTIMATION VIA SPARSITY AND LOW RANK. Morteza Mardani and Georgios B. Giannakis ROBUST NETWORK TRAFFIC ESTIMATION VIA SPARSITY AND LOW RANK Morteza Mardani and Georgios B. Giannakis Dept. of ECE, University of Minnesota, Minneapolis, MN ABSTRACT Accurate estimation of origin-to-destination

More information

LOW-DIMENSIONAL MODELS FOR MISSING DATA IMPUTATION IN ROAD NETWORKS

LOW-DIMENSIONAL MODELS FOR MISSING DATA IMPUTATION IN ROAD NETWORKS LOW-DIMENSIONAL MODELS FOR MISSING DATA IMPUTATION IN ROAD NETWORKS Muhammad Tayyab Asif 1, Nikola Mitrovic 1, Lalit Garg 1,2, Justin Dauwels 1, Patrick Jaillet 3,4 1 School of Electrical and Electronic

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

Security Based Data Transfer and Privacy Storage through Watermark Detection

Security Based Data Transfer and Privacy Storage through Watermark Detection Security Based Data Transfer and Privacy Storage through Watermark Detection Gowtham.T 1 Pradeep Kumar.G 2 1PG Scholar, Applied Electronics, Nandha Engineering College, Anna University, Erode, India. 2Assistant

More information

Sequential Non-Bayesian Network Traffic Flows Anomaly Detection and Isolation

Sequential Non-Bayesian Network Traffic Flows Anomaly Detection and Isolation Sequential Non-Bayesian Network Traffic Flows Anomaly Detection and Isolation Lionel Fillatre 1, Igor Nikiforov 1, Sandrine Vaton 2, and Pedro Casas 2 1 Institut Charles Delaunay/LM2S, FRE CNRS 2848, Université

More information

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Zhilin Zhang and Bhaskar D. Rao Technical Report University of California at San Diego September, Abstract Sparse Bayesian

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

Sparse recovery and compressed sensing in inverse problems

Sparse recovery and compressed sensing in inverse problems Gerd Teschke (7. Juni 2010) 1/68 Sparse recovery and compressed sensing in inverse problems Gerd Teschke (joint work with Evelyn Herrholz) Institute for Computational Mathematics in Science and Technology

More information

When is missing data recoverable?

When is missing data recoverable? When is missing data recoverable? Yin Zhang CAAM Technical Report TR06-15 Department of Computational and Applied Mathematics Rice University, Houston, TX 77005 October, 2006 Abstract Suppose a non-random

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Detecting Network Anomalies. Anant Shah

Detecting Network Anomalies. Anant Shah Detecting Network Anomalies using Traffic Modeling Anant Shah Anomaly Detection Anomalies are deviations from established behavior In most cases anomalies are indications of problems The science of extracting

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 10, MAY 15, 2015 2663

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 10, MAY 15, 2015 2663 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 63, NO 10, MAY 15, 2015 2663 Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors Morteza Mardani, StudentMember,IEEE, Gonzalo Mateos,

More information

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve

More information

Bag of Pursuits and Neural Gas for Improved Sparse Coding

Bag of Pursuits and Neural Gas for Improved Sparse Coding Bag of Pursuits and Neural Gas for Improved Sparse Coding Kai Labusch, Erhardt Barth, and Thomas Martinetz University of Lübec Institute for Neuro- and Bioinformatics Ratzeburger Allee 6 23562 Lübec, Germany

More information

Compressed Sensing with Non-Gaussian Noise and Partial Support Information

Compressed Sensing with Non-Gaussian Noise and Partial Support Information IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 10, OCTOBER 2015 1703 Compressed Sensing with Non-Gaussian Noise Partial Support Information Ahmad Abou Saleh, Fady Alajaji, Wai-Yip Chan Abstract We study

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

Adaptive Linear Programming Decoding

Adaptive Linear Programming Decoding Adaptive Linear Programming Decoding Mohammad H. Taghavi and Paul H. Siegel ECE Department, University of California, San Diego Email: (mtaghavi, psiegel)@ucsd.edu ISIT 2006, Seattle, USA, July 9 14, 2006

More information

Application of Synchrophasor Data to Power System Operations

Application of Synchrophasor Data to Power System Operations 1 Application of Synchrophasor Data to Power System Operations Joe H. Chow Professor, Electrical, Computer, and Systems Engineering Campus Director, NSF/DOE CURENT ERC Rensselaer Polytechnic Institute

More information

BIG DATA ANALYSIS BASED ON MATHEMATICAL MODEL: A COMPREHENSIVE SURVEY

BIG DATA ANALYSIS BASED ON MATHEMATICAL MODEL: A COMPREHENSIVE SURVEY BIG DATA ANALYSIS BASED ON MATHEMATICAL MODEL: A COMPREHENSIVE SURVEY Vijaylakshmi S. and Priyadarshini J. School of Computing Sciences and Engineering, Vellore Institute of Technology, Chennai Campus,

More information

Machine learning challenges for big data

Machine learning challenges for big data Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012

More information

Learning, Sparsity and Big Data

Learning, Sparsity and Big Data Learning, Sparsity and Big Data M. Magdon-Ismail (Joint Work) January 22, 2014. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have data to learn it Tested on new cases? Teaching

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt. An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let) Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

1 0 5 3 3 A = 0 0 0 1 3 0 0 0 0 0 0 0 0 0 0

1 0 5 3 3 A = 0 0 0 1 3 0 0 0 0 0 0 0 0 0 0 Solutions: Assignment 4.. Find the redundant column vectors of the given matrix A by inspection. Then find a basis of the image of A and a basis of the kernel of A. 5 A The second and third columns are

More information

Geometric-Guided Label Propagation for Moving Object Detection

Geometric-Guided Label Propagation for Moving Object Detection MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Geometric-Guided Label Propagation for Moving Object Detection Kao, J.-Y.; Tian, D.; Mansour, H.; Ortega, A.; Vetro, A. TR2016-005 March 2016

More information

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach SEG 2011 San Antonio Sparsity-promoting recovery from simultaneous data: a compressive sensing approach Haneet Wason*, Tim T. Y. Lin, and Felix J. Herrmann September 19, 2011 SLIM University of British

More information

Digital Video Broadcasting By Satellite

Digital Video Broadcasting By Satellite Digital Video Broadcasting By Satellite Matthew C. Valenti Lane Department of Computer Science and Electrical Engineering West Virginia University U.S.A. Apr. 2, 2012 ( Lane Department LDPCof Codes Computer

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding Mathematical Modelling of Computer Networks: Part II Module 1: Network Coding Lecture 3: Network coding and TCP 12th November 2013 Laila Daniel and Krishnan Narayanan Dept. of Computer Science, University

More information

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

How To Write A Network Health Project For The University Of Minneapolis

How To Write A Network Health Project For The University Of Minneapolis Monitoring the University of Minnesota s Network Health Advisors: Prof. Giannakis and Dr. Dall Anese Group Members: Myles Burgeson, Nathan Glimsdale, Daniel Trudeau, Qiyue Wang May 14, 2013 Transmittal

More information

Compressed Sensing & Network Monitoring

Compressed Sensing & Network Monitoring Compressed Sensing & Network Monitoring Jarvis Haupt, Waheed U. Bajwa, Michael Rabbat, & Robert Nowak Reprinted with permission of IEEE. Originally published in IEEE Signal Processing Magazine, pp 92-101,

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Computer Networks 56 (2012) 2049 2067. Contents lists available at SciVerse ScienceDirect. Computer Networks

Computer Networks 56 (2012) 2049 2067. Contents lists available at SciVerse ScienceDirect. Computer Networks Computer Networks 56 (212) 249 267 Contents lists available at SciVerse ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet Structural analysis of network traffic matrix via

More information

Group Testing a tool of protecting Network Security

Group Testing a tool of protecting Network Security Group Testing a tool of protecting Network Security Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics, National Chiao Tung University, Hsin Chu, Taiwan Group testing (General Model) Consider a set N

More information

IEEE 2015-2016 JAVA TITLES

IEEE 2015-2016 JAVA TITLES ECWAY ECHNOLGIES IEEE 2015-2016 JAVA TITLES BE, B.TECH, ME, M.TECH, MSC, MCA PROJECTS Abstract: Introduction: Literature Survey: System Analysis: Existing System: Disadvantages: Proposed System: Advantages:

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Sparse LMS via Online Linearized Bregman Iteration

Sparse LMS via Online Linearized Bregman Iteration 1 Sparse LMS via Online Linearized Bregman Iteration Tao Hu, and Dmitri B. Chklovskii Howard Hughes Medical Institute, Janelia Farm Research Campus {hut, mitya}@janelia.hhmi.org Abstract We propose a version

More information

The Geometry of Polynomial Division and Elimination

The Geometry of Polynomial Division and Elimination The Geometry of Polynomial Division and Elimination Kim Batselier, Philippe Dreesen Bart De Moor Katholieke Universiteit Leuven Department of Electrical Engineering ESAT/SCD/SISTA/SMC May 2012 1 / 26 Outline

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued. Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

More information

Design of LDPC codes

Design of LDPC codes Design of LDPC codes Codes from finite geometries Random codes: Determine the connections of the bipartite Tanner graph by using a (pseudo)random algorithm observing the degree distribution of the code

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Computational Optical Imaging - Optique Numerique. -- Deconvolution -- Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method

More information

Sparse modeling: some unifying theory and word-imaging

Sparse modeling: some unifying theory and word-imaging Sparse modeling: some unifying theory and word-imaging Bin Yu UC Berkeley Departments of Statistics, and EECS Based on joint work with: Sahand Negahban (UC Berkeley) Pradeep Ravikumar (UT Austin) Martin

More information

Managing Incompleteness, Complexity and Scale in Big Data

Managing Incompleteness, Complexity and Scale in Big Data Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:

More information

Analysis of an Artificial Hormone System (Extended abstract)

Analysis of an Artificial Hormone System (Extended abstract) c 2013. This is the author s version of the work. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purpose or for creating

More information

Lecture Notes 2: Matrices as Systems of Linear Equations

Lecture Notes 2: Matrices as Systems of Linear Equations 2: Matrices as Systems of Linear Equations 33A Linear Algebra, Puck Rombach Last updated: April 13, 2016 Systems of Linear Equations Systems of linear equations can represent many things You have probably

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

Modélisation et résolutions numérique et symbolique

Modélisation et résolutions numérique et symbolique Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat Mohab.Safey@lip6.fr Outline Previous course: partial review of what

More information

Weakly Secure Network Coding

Weakly Secure Network Coding Weakly Secure Network Coding Kapil Bhattad, Student Member, IEEE and Krishna R. Narayanan, Member, IEEE Department of Electrical Engineering, Texas A&M University, College Station, USA Abstract In this

More information

THE problem of computing sparse solutions (i.e., solutions

THE problem of computing sparse solutions (i.e., solutions IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 7, JULY 2005 2477 Sparse Solutions to Linear Inverse Problems With Multiple Measurement Vectors Shane F. Cotter, Member, IEEE, Bhaskar D. Rao, Fellow,

More information

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA 2007 Hawaii and SITA Joint Conference on Information Theory, HISC2007 Hawaii, USA, May 29 31, 2007 An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel Gou HOSOYA, Hideki YAGI,

More information

Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

Streamdrill: Analyzing Big Data Streams in Realtime

Streamdrill: Analyzing Big Data Streams in Realtime Streamdrill: Analyzing Big Data Streams in Realtime Mikio L. Braun mikio@streamdrill.com @mikiobraun th 6 Realtime Big Data: Sources Finance Gaming Monitoring Advertisment Sensor Networks Social Media

More information

Capacity Limits of MIMO Channels

Capacity Limits of MIMO Channels Tutorial and 4G Systems Capacity Limits of MIMO Channels Markku Juntti Contents 1. Introduction. Review of information theory 3. Fixed MIMO channels 4. Fading MIMO channels 5. Summary and Conclusions References

More information

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

Image Super-Resolution via Sparse Representation

Image Super-Resolution via Sparse Representation 1 Image Super-Resolution via Sparse Representation Jianchao Yang, Student Member, IEEE, John Wright, Student Member, IEEE Thomas Huang, Life Fellow, IEEE and Yi Ma, Senior Member, IEEE Abstract This paper

More information

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

More information

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University Mining Big Data Pang-Ning Tan Associate Professor Dept of Computer Science & Engineering Michigan State University Website: http://www.cse.msu.edu/~ptan Google Trends Big Data Smart Cities Big Data and

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING I. Nikiforov Université de Technologie de Troyes, UTT/ICD/LM2S, UMR 6281, CNRS 12, rue Marie Curie, CS 42060

More information

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Visualization of General Defined Space Data

Visualization of General Defined Space Data International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm

More information

Associative Memory via a Sparse Recovery Model

Associative Memory via a Sparse Recovery Model Associative Memory via a Sparse Recovery Model Arya Mazumdar Department of ECE University of Minnesota Twin Cities arya@umn.edu Ankit Singh Rawat Computer Science Department Carnegie Mellon University

More information