Sparse Signal Recovery: Theory, Applications and Algorithms

Similar documents

THE problem of computing sparse solutions (i.e., solutions

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

RECENTLY, there has been a great deal of interest in

Part II Redundant Dictionaries and Pursuit Algorithms

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

When is missing data recoverable?

THE PROBLEM OF finding localized energy solutions

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, ACCEPTED FOR PUBLICATION 1

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

MIMO CHANNEL CAPACITY

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Christfried Webers. Canberra February June 2015

A Learning Based Method for Super-Resolution of Low Resolution Images

Sublinear Algorithms for Big Data. Part 4: Random Topics

Bag of Pursuits and Neural Gas for Improved Sparse Coding

Linear Classification. Volker Tresp Summer 2015

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

System Identification for Acoustic Comms.:

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Background: State Estimation

Component Ordering in Independent Component Analysis Based on Data Power

Least-Squares Intersection of Lines

Message-passing sequential detection of multiple change points in networks

A Statistical Framework for Operational Infrasound Monitoring

Introduction to Sparsity in Signal Processing 1

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

The p-norm generalization of the LMS algorithm for adaptive filtering

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Statistical machine learning, high dimension and big data

A Network Flow Approach in Cloud Computing

Introduction to Deep Learning Variational Inference, Mean Field Theory

Course: Model, Learning, and Inference: Lecture 5

Exploiting A Constellation of Narrowband RF Sensors to Detect and Track Moving Targets

Regularized Logistic Regression for Mind Reading with Parallel Validation

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Numerical Methods For Image Restoration

STA 4273H: Statistical Machine Learning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Detection of changes in variance using binary segmentation and optimal partitioning

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery

Cortical Source Localization of Human Scalp EEG. Kaushik Majumdar Indian Statistical Institute Bangalore Center

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

Cyber-Security Analysis of State Estimators in Power Systems

Less naive Bayes spam detection

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Lecture 5 Least-squares

High Quality Image Deblurring Panchromatic Pixels

Fast Solution of l 1 -norm Minimization Problems When the Solution May be Sparse

Bayesian Image Super-Resolution

Class-specific Sparse Coding for Learning of Object Representations

Predict Influencers in the Social Network

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA , USA kbell@gmu.edu, hlv@gmu.

8. Linear least-squares

IN THIS paper, we address the classic image denoising

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department sss40@eng.cam.ac.

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Compressed Sensing of Correlated Social Network Data

Binary Image Reconstruction

Image Super-Resolution via Sparse Representation

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Narayana P. Santhanam Department of Electrical Engineering, University of Hawaii, Honolulu HI 96822

Associative Memory via a Sparse Recovery Model

Advanced Signal Processing and Digital Noise Reduction

Data Mining - Evaluation of Classifiers

Neural Networks Lesson 5 - Cluster Analysis

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Superresolution images reconstructed from aliased images

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

UW CSE Technical Report Probabilistic Bilinear Models for Appearance-Based Vision

Linear Codes. Chapter Basics

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Statistical Machine Learning from Data

Basics of Statistical Machine Learning

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics

Ericsson T18s Voice Dialing Simulator

Image Super-Resolution as Sparse Representation of Raw Image Patches

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Transcription:

Sparse Signal Recovery: Theory, Applications and Algorithms Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Collaborators: I. Gorodonitsky, S. Cotter, D. Wipf, J. Palmer, K. Kreutz-Delgado Acknowledgement: Travel support provided by Office of Naval Research Global under Grant Number: N62909-09-1-1024. The research was supported by several grants from the National Science Foundation, most recently CCF - 0830612 1 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 2 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 3 / 42

Importance of Problem DSPl7: DSP EDUCATION VOLUME I11 DSP Instead of Circuits? - Transition to Undergraduate DSP Education at Rose-Hulman... 111-1845 W. Padgett, M. Yoder (Rose-Hulman Institute of Tecnology, USA) Organized A with Java Signal Prof. Analysis Tool Bresler for Signal Processing a Special Experiments... Session at ~...,..,...,.,...,..,...,,...III-l~9 the 1998 IEEE A. Clausen, A. Spanias, A. Xavier, M. Tampi (Arizona State University, USA) International Visualization Conference of Signal Processing on Concepts Acoustics,... Speech and Signal Processing 111-1853 J. Shaffer, J. Hamaker, J. Picone (Mississippi State University, USA) An Experimental Architecture for Interactive Web-based DSP Education... 111-1 857 M. Rahkila, M. Karjalainen (Helsinki University of Technology, Finland) SPEC-DSP: SIGNAL PROCESSING WITH SPARSENESS CONSTRAINT Signal Processing with the Sparseness Constraint... 111-1861 B. Rao (University of California, San Diego, USA) Application of Basis Pursuit in Spectrum Estimation... 111-1865 S. Chen (IBM, USA); D. Donoho (Stanford University, USA) Parsimony and Wavelet Method for Denoising... 111-1869 H. Krim (MIT, USA); J. Pesquet (University Paris Sud, France); I. Schick (GTE Ynternetworking and Harvard Univ., USA) Parsimonious Side Propagation... 111-1873 P. Bradley, 0. Mangasarian (University of Wisconsin-Madison, USA) Fast Optimal and Suboptimal Algorithms for Sparse Solutions to Linear Inverse Problems... 111-1877 G. Harikumar (Tellabs Research, USA); C. Couvreur, Y. Bresler (University of Illinois, Urbana-Champaign. USA) Measures and Algorithms for Best Basis Selection... 111-1881 K. Kreutz-Delgado, B. Rao (University of Califomia, San Diego, USA) Sparse Inverse Solution Methods for Signal and Image Processing Applications... 111-1885 B. Jeffs (Brigham Young University, USA) Image Denoising Using Multiple Compaction Domains... 111-1889 P. Ishwar, K. Ratakonda, P. Moulin, N. Ahuja (University of Illinois, Urbana-Champaign, USA) SPEC-EDUCATION: FUTURE DIRECTIONS IN SIGNAL PROCESSING EDUCATION A Different First Course in Electrical Engineering... 111-1893 D. Johnson, J. Wise, Jr. (Rice University, USA) 4 / 42

Talk Goals Sparse Signal Recovery is an interesting area with many potential applications Tools developed for solving the Sparse Signal Recovery problem are useful for signal processing practitioners to know 5 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 6 / 42

Problem Description t is N 1measurementvector ΦisN M Dictionary matrix. M >> N. w is M 1desiredvectorwhichissparsewithK non-zero entries is the additive noise modeled as additive white Gaussian 7 / 42

Problem Statement Noise Free Case: Givenatargetsignalt and a dictionary Φ, find the weights w that solve: min w M I (w i =0) suchthat t =Φw i=1 where I (.) istheindicatorfunction Noisy Case: Givenatargetsignalt and a dictionary Φ, find the weights w that solve: min w M I (w i =0) suchthat t Φw 2 2 β i=1 8 / 42

Complexity Search over all possible subsets, which would mean a search over atotalof( M C K ) subsets. Problem NP hard. With M =30, N =20, and K =10thereare3 10 7 subsets (Very Complex) Abranchandboundalgorithmcanbeusedtofindtheoptimal solution. The space of subsets searched is pruned but the search may still be very complex. Indicator function not continuous and so not amenable to standard optimization tools. Challenge: Findlowcomplexitymethodswithacceptable performance 9 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 10 / 42

Applications Signal Representation (Mallat, Coifman, Wickerhauser, Donoho,...) EEG/MEG (Leahy, Gorodnitsky, Ioannides,...) Functional Approximation and Neural Networks (Chen, Natarajan, Cun, Hassibi,...) Bandlimited extrapolations and spectral estimation (Papoulis, Lee, Cabrera, Parks,...) Speech Coding (Ozawa, Ono, Kroon, Atal,...) Sparse channel equalization (Fevrier, Greenstein, Proakis, ) Compressive Sampling (Donoho, Candes, Tao..) 11 / 42

DFT Example N chosen to be 64 in this example. Measurement t: Dictionary Elements: t[n] = cos ω 0 n, n =0, 1, 2,.., N 1 ω 0 = 2π 33 64 2 φ k = [1, e jω k, e j2ω k,.., e j(n 1)ω k ] T,ω k = 2π M Consider M =64, 128, 256 and 512 NOTE: The frequency components are included in the dictionary Φ for M =128, 256, and 512. 12 / 42

FFT Results FFT Results with Different M M=64 M=128 M=256 M=512 13 / 42

Magnetoencephalography (MEG) Given measured magnetic MEG fields outside Example of the head, the goal is to locate thegiven responsible measured current magnetic sources fields outside insideof the head, head. we would like to locate the responsible current sources inside the head. = MEG measurement point = Candidate source location 14 / 42

MEG Example At any given time, typicallymeg only aexample few sources are active (SPARSE). At any given time, typically only a few sources are active. = MEG measurement point = Candidate source location = Active source location 15 / 42

MEG Formulation Forming the overcomplete dictionary Φ The number of rows equals the number of sensors. The number of columns equals the number of possible source locations. Φ ij = the magnetic field measured at sensor i produced by a unit current at location j. We can compute Φ using a boundary element brain model and Maxwells equations. Many different combinations of current sources can produce the same observed magnetic field t. By finding the sparsest signal representation/basis, we find the smallest number of sources capable of producing the observed field. Such a representation is of neurophysiological significance 16 / 42

Compressive Sampling 17 / 42

Compressive Sampling Transform Coding Compressed Sensing w b A w A b t 17 / 42

Compressive Sampling Transform Coding Compressed Sensing w b Compressed Sensing w b A w A b t Compressive Sensing A w A b t 17 / 42

Compressive Sampling Transform Coding Compressed Sensing w b Compressed Sensing w b A w A b t Compressive Sensing A w A b t Computation : t w b 17 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 18 / 42

Potential Approaches Problem NP hard and so need alternate strategies Greedy Search Techniques: Matching Pursuit,Orthogonal Matching Pursuit Minimizing Diversity Measures: Indicator function not continuous. Define Surrogate Cost functions that are more tractable and whose minimization leads to sparse solutions, e.g. 1 minimization Bayesian Methods: Make appropriate Statistical assumptions on the solution and apply estimation techniques to identify the desired sparse solution 19 / 42

Greedy Search Methods: Matching Pursuit Select a column that is most aligned with the current residual t w 0.7 4. 0.3. Remove its contribution Stop when residual becomes small enough or if we have exceeded some sparsity threshold. Some Variations Matching Pursuit [Mallat & Zhang] Orthogonal Matching Pursuit [Pati et al.] Order Recursive Matching Pursuit (ORMP) 20 / 42

Inverse techniques For the systems of equations Φw = t, the solution set is characterized by {w s : w s =Φ + t + v, v N(Φ)}, where N (Φ) denotes the null space of Φ and Φ + =Φ T (ΦΦ T ) 1. Minimum Norm solution :Theminimum 2 norm solution w mn =Φ + t is a popular solution Noisy Case: regularized 2 norm solution often employed and is given by w reg =Φ T (ΦΦ T + λi ) 1 t Problem: Solution is not Sparse 21 / 42

Diversity Measures Functionals whose minimization leads to sparse solutions Many examples are found in the fields of economics, social science and information theory These functionals are concave which leads to difficult optimization problems 22 / 42

Examples of Diversity Measures (p 1) Diversity Measure E (p) (w) = M w l p, p 1 l=1 Gaussian Entropy Shannon Entropy H G (w) = M ln w l 2 l=1 H S (w) = M l=1 w l ln w l. where w l = w l 2 w 2 23 / 42

Diversity Minimization Noiseless Case min w E (p) (w) = M w l p subject to Φw = t l=1 Noisy Case min w t Φw 2 + λ M w l p l=1 p =1isaverypopularchoicebecauseoftheconvexnatureofthe optimization problem (Basis Pursuit and Lasso). 24 / 42

Bayesian Methods Maximum Aposteriori Approach (MAP) Assume a sparsity inducing prior on the latent variable w Develop an appropriate MAP estimation algorithm Empirical Bayes Assume a parameterized prior on the latent variable w (hyperparameters) Marginalize over the latent variable w and estimate the hyperparameters Determine the posterior distribution of w and obtain a point as the mean, mode or median of this density 25 / 42

Generalized Generalized Gaussian Gaussian Distribution Distribution Density function: Subgaussian: p > 2 and Supergaussian p : p < 2 ( f (x) ) p p x = 2σΓ( 1) exp exp x px 2 1 p σ p -p < 2: Super Gaussian -p > 2: Sub Gaussian 3 2.5 2 p=0.5 p=1 p=2 p=10 1.5 1 0.5 0-5 0 5 26 / 42

Student t Distribution Density function: 1 1 2 2 2 x px ( ) ν+1 Γ( 2 f (x) = ) 1 πνγ( ν 1+ x 2 ( ν+1 ) ν 2 2 ) 0.4 0.35 0.3 = 1 = 2 = 5 Gaussian 0.25 0.2 0.15 0.1 0.05 0-5 0 5 27 / 42

MAP using a Supergaussian prior Assuming a Gaussian likelihood model for f (t w), we can find MAP weight estimates w MAP = argmaxlog f (w t) w = argmax(log f (t w)+logf (w)) w = arg min w Φw t 2 + λ M w l p This is essentially a regularized LS framework. Interesting range for p is p 1. l=1 28 / 42

MAP Estimate: FOCal Underdetermined System Solver (FOCUSS) Approach involves solving a sequence of Regularized Weighted Minimum Norm problems q k+1 =argmin Φk+1 q t 2 + λq 2 q where Φ k+1 =ΦM k+1, and M k+1 =diag( w k.l 1 p 2. w k+1 = M k+1 q k+1. p =0isthe 0 minimization and p =1is 1 minimization 29 / 42

FOCUSS summary For p < 1, the solution is initial condition dependent Prior knowledge can be incorporated Minimum norm is a suitable choice Can retry with several initial conditions Computationally more complex than Matching Pursuit algorithms Sparsity versus tolerance tradeoff more involved Factor p allows a trade-off between the speed of convergence and the sparsity obtained 30 / 42

Convergence Errors vs. Structural Errors Convergence Errors vs. Structural Errors Convergence Error Structural Error -log p(w t) -log p(w t) w * w 0 w * w 0 w * = solution we have converged to w 0 = maximally sparse solution 31 / 42

Shortcomings of these Methods p =1 p < 1 Basis Pursuit/Lasso often suffer from structural errors. Therefore, regardless of initialization, we may never find the best solution. The FOCUSS class of algorithms suffers from numerous suboptimal local minima and therefore convergence errors. In the low noise limit, the number of local minima K satisfies M 1 M K +1, N N At most local minima, the number of nonzero coefficients is equal to N, thenumberofrowsinthedictionary. 32 / 42

Empirical Bayesian Method Main Steps Parameterized prior f (w γ) Marginalize f (t γ) = f (t, w γ)dw = f (t w)f (w γ)dw Estimate the hyperparameter ˆγ Determine the posterior density of the latent variable f (w t, ˆγ) Obtain point estimate of w Example: Sparse Bayesian Learning (SBL by Tipping) 33 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 34 / 42

DFT ExampleDFT Results Using FOCUSS 35 / 42

Empirical Tests Random overcomplete bases Φ and sparse weight vectors w 0 were generated and used to create target signals t, i.e., t =Φw 0 + SBL (Empirical Bayes) was compared with Basis Pursuit and FOCUSS (with various p values) in the task of recovering w 0. 36 / 42

Experiment Experiment 1: Comparison I: Noiseless withcomparison Noiseless Data Randomized Φ (20 rows by 40 40columns). Diversity of the true w 0 is 7. Results are from 1000 independent trials. NOTE: An error occurs whenever an algorithm converges to a solution NOTE: w An not error equaloccurs to w 0. whenever an algorithm converges to a solution w not equal to w 0. 37 / 42

Experiment II: Comparison with Noisy Data Experiment II: Noisy Comparisons Randomized Φ (20 (20 rows rows by by 40 40 columns). Diversity 20 db of AWGN. the true w 0 is 7. 20 db Diversity AWGNof the true w is 7. Results Results are are from from 1000 1000 independent trials. NOTE: We no longer distinguish convergence errors from structural NOTE: We no longer distinguish convergence errors from errors. structural errors. 38 / 42

39 / 42

MEG Example Data based on CTF MEG system at UCSF with 275 scalp sensors. Forward model based on 40, 000 points (vertices of a triangular grid) and 3 different scale factors. Dimensions of lead field matrix (dictionary): 275 by 120, 000. Overcompleteness ratio approximately 436. Up to 40 unknown dipole components were randomly placed throughout the sample space. SBL was able to resolve roughly twice as many dipoles as the next best method (Ramirez 2005). 40 / 42

Outline 1 Talk Objective 2 Sparse Signal Recovery Problem 3 Applications 4 Computational Algorithms 5 Performance Evaluation 6 Conclusion 41 / 42

Summary Discussed role of sparseness in linear inverse problems Discussed Applications of Sparsity Discussed methods for computing sparse solutions Matching Pursuit Algorithms MAP methods (FOCUSS Algorithm and 1 minimization) Empirical Bayes (Sparse Bayesian Learning (SBL)) Expectation is that there will be continued growth in the application domain as well as in the algorithm development. 42 / 42