QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITH APPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION AND COMMUNICATION. D. Wooden. M. Egerstedt. B.K.



Similar documents
Mathematics Course 111: Algebra I Part IV: Vector Spaces

Statistical Machine Learning

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

Data Mining: Algorithms and Applications Matrix Math Review

Manifold Learning Examples PCA, LLE and ISOMAP

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

University of Lille I PC first year list of exercises n 7. Review

Image Compression through DCT and Huffman Coding Technique

NOTES ON LINEAR TRANSFORMATIONS

Machine Learning and Pattern Recognition Logistic Regression

Least Squares Estimation

Notes on Factoring. MA 206 Kurt Bryan

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

11 Ideals Revisiting Z

Image Compression and Decompression using Adaptive Interpolation

Principal components analysis

The Trip Scheduling Problem

Lecture 3: Finding integer solutions to systems of linear equations

Similarity and Diagonalization. Similar Matrices

Lecture 6 Online and streaming algorithms for clustering

9.2 Summation Notation

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

Adaptive Online Gradient Descent

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

A Practical Scheme for Wireless Network Operation

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Linear Threshold Units

Indiana State Core Curriculum Standards updated 2009 Algebra I

Bag of Pursuits and Neural Gas for Improved Sparse Coding

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1)

Linear Codes. Chapter Basics

OPRE 6201 : 2. Simplex Method

Notes 11: List Decoding Folded Reed-Solomon Codes

Vector and Matrix Norms

Chapter 7. Sealed-bid Auctions

Multivariate Normal Distribution

1 Sets and Set Notation.

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

On the Interaction and Competition among Internet Service Providers

Classification of Cartan matrices

State of Stress at Point

1 if 1 x 0 1 if 0 x 1

On the representability of the bi-uniform matroid

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Numerical Analysis Lecture Notes

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Introduction to Online Learning Theory

Learning in Abstract Memory Schemes for Dynamic Optimization

4.6 Linear Programming duality

3. INNER PRODUCT SPACES

Lecture 4 Online and streaming algorithms for clustering

LOGNORMAL MODEL FOR STOCK PRICES

arxiv: v1 [math.pr] 5 Dec 2011

BUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.

Continued Fractions and the Euclidean Algorithm

LEARNING OBJECTIVES FOR THIS CHAPTER

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

Class-specific Sparse Coding for Learning of Object Representations

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

FCE: A Fast Content Expression for Server-based Computing

Lecture Notes on Polynomials

4: SINGLE-PERIOD MARKET MODELS

Prentice Hall Connected Mathematics 2, 7th Grade Units 2009

Introduction to time series analysis

Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Subspace Analysis and Optimization for AAM Based Face Alignment

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Categorical Data Visualization and Clustering Using Subjective Factors

What are the place values to the left of the decimal point and their associated powers of ten?

Single item inventory control under periodic review and a minimum order quantity

Introduction to image coding

Chapter 2 Portfolio Management and the Capital Asset Pricing Model

Mathematics Review for MS Finance Students

Dimensionality Reduction: Principal Components Analysis

Inner Product Spaces

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

DISCRIMINANT FUNCTION ANALYSIS (DA)

Polynomials. Dr. philippe B. laval Kennesaw State University. April 3, 2005

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

How To Prove The Dirichlet Unit Theorem

Discuss the size of the instance for the minimum spanning tree problem.

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Transcription:

International Journal of Innovative Computing, Information and Control ICIC International c 5 ISSN 1349-4198 Volume x, Number x, x 5 pp. QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITH APPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION AND COMMUNICATION D. Wooden School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 333, USA wooden@ece.gatech.edu M. Egerstedt School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 333, USA magnus@ece.gatech.edu B.K. Ghosh Department of Electrical and Systems Engineering Washington University in St. Louis St. Louis, MO 6313, USA ghosh@netra.wustl.edu Abstract. In this paper we show how Principal Component Analysis can be mapped to a quantized domain in an optimal manner. In particular, given a low-bandwidth communication channel over which a given set of data is to be transmitted, we show how to best compress the data. Applications to image compression are described and examples are provided that support the practical soundness of the proposed method. Keywords: principal component analysis, quantization, image compression 1

D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH 1. Introduction. Principal Component Analysis (PCA) is an algebraic tool for compressing large sets of statistical data in a structured manner. However, the reduction results in real-valued descriptions of the data. In this paper, we take the compression one step further by insisting on the use of only a finite number of bits for its representation. This is necessary in a number of applications where the data is transmitted over low-bandwidth communication channels. In particular, the inspiration for this work came from the need for multiple mobile robots to share visual information about their environment. Assuming that the data x 1,..., x N take on values in a d-dimensional space, one can identify the d principal directions, coinciding with the eigenvectors of the covariance matrix, given by C = 1 N where m is the mean of the data. N (x i m)(x i m) T, (1) If our goal is to compress the data set to a set of dimension n < d, we would pick the n dominant directions, i.e. the directions of maximum variation of the data. This results in an optimal (in the sense of least squared error) reduction of the dimension from d to n. For example, if n = then only the mean is used, while n = 1 corresponds to a 1-dimensional representation of the data. The fact that the reduction can be done in a systematic and optimal manner has lead to the widespread use of PCA in a number of areas, ranging from process control [7], to weather prediction models [3], to image compression []. In this paper, we focus on and draw inspiration from the image processing problem in particular, even though the results are of a general nature.. Principal Component Analysis. Suppose we have a stochastic process with samples x k R d, k = (1,..., N), where N is the number of samples taken. Let

QPCA 3 1. m = 1 N N k=1 x k be the mean of the data.. e i R d be the ith principal direction of the system, where i {1,..., d} 3. a i R be the ith principal component, i.e. a i = ei T (x m), associated with the sample point x R d. We can then reconstruct x perfectly from its principal components and the system s principal directions as x = m + a i e i. () If we wish to reduce the system complexity from a d-dimensional data set to n dimensions, only the n principal directions (corresponding to the n largest eigenvalues of the covariance matrix) should be chosen. The main contribution in this paper is not the problem of reducing the dimension of the data set, but rather the problem of communicating the data. Given n d number of transmittable real numbers, the optimal choice for the reconstruction of x from these numbers is simply given by the n largest (in magnitude) principal components. But because the a i s are all real-valued, we are required to quantize them prior to communication, and we wish then to transmit only the most significant quanta. In this paper we derive a mapping of the PCA algorithm to a quantized counterpart. 3. Quantized Components. Let r N be the resolution of our system. For example, if r = 1, then we are communicating decimal integers. If r = 16, then we are communicating nibbles (i.e. half-bytes). Now, let K Z be the largest integer exponent of r such that max i {1,...,d} ( a i ) r K R [1,r 1]. (3)

4 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH With this definition of r and K, we are equipped to define the quantities by which we will decompose the principal components. We name the first quantity the quantized component, ( ) z i = arg min ζ Z a i ζ, (4) where Z = {r K ( r + 1), r K ( r + ),..., r K (r 1)}. As a result, we have that z i (r 1)r K, (5) In other words z i is obtained from the integer in the range [ r + 1, +r 1], which, when scaled by r K, minimizes the distance to the principal component a i. The second quantity, called the remainder component, is simply defined as y i = a i z i, (6) and therefore, y i < 1 rk. (7) The remainder component is equivalent to the round-off error between z i and a i. With these definitions, we define the quantized version of the original principal components to be a Q i z i, i = 1,..., d. And, in a manner similar to the reconstruction of x from its principal components, we can reconstruct a quantized version of x from its quantized principal components: x Q = m + a Q i e i. (8) We presume that sufficient resources may be allocated during a start-up period in which the transmitter and receiver can agree upon the real-valued mean and principal directions. Thereafter, regular transmission of quantized components commences. Now, the question remains, if we may only transmit one quantized component, which one should we pick?

QPCA 5 a i r K z i y i 345.6 1 3 45.6 345.6 4 1 336 9.6-984.1 1 3-1 15.9-94.1 1-9 -4.1 Table 1. Example Set of Principal, Quantized, and Remainder Components. Problem 1. Identify the quantized component which minimizes the error between x Q and x. In other words, solve ( arg min m + k {1,...,d} ) δ ik a Q i e i x (9) where 1, if i = k δ ik =, otherwise For the sake of clarity, we present Table 1 as an example of principal components and their corresponding value of K, as well as their quantized and remainder components. { } 4. Main Result. Let S = {1,..., d} and S z = s S z s z m, m S. Theorem 4.1. If S z = 1, i.e. n S such that z n > z m m S, m n, then n is the solution to Problem 1, i.e. z n is the optimal component to transmit. S z represents the cardinality of S z.

6 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH Proof. Define the cost function J(a Q ) = = = = = x Q x = (m + a Q i e i) (m + e i (a Q i a i ) (a Q i ) a Q i a i + a i zi z i(z i + y i ) + a i = Now, define a similar cost function ) (z i + z iy i + a i e i ) a i. (1) Hence, J k (a Q ) = m + δ ik a Q i e i x. J k (a Q ) = ) δ ik ((a Q i ) a Q i a i + a i = (a Q k ) a Q k a k + ) = (z k + z ky k + Taking n, m S, we may extend Eq. (11) to write a i a i. (11) J n J m = (z n + z ny n ) + (z m + z my m ) (1) = z n z n y n sgn(z n )sgn(y n ) + z m + z m y m sgn(z m )sgn(y m ),

QPCA 7 where sgn(z i ) indicates the sign (+ or ) of z i. Since sgn(z i )sgn(y i ) { 1, +1}, we may write J n J m zn + z n y n + zm + z m y m (13) Now, assume that z n > z m, which gives us z n = z m + αr K, (14) where α Z +. Hence, we wish to show that J n J m <. Substituting Eq. (14) into Eq. (13), J n J m ( z m + αr K ) + y n ( z m + αr K ) + z m + z m y m. Using Eq. (7), we conclude that J n J m < ( z m + αr K ) + ( z m + αr K )r K + +zm + z m r K ) < z m r K z m αr K r (α K α ( ) = 1 α )( z m r K + αr K. And finally, recalling that α 1, and the theorem follows. J n J m <, (15) { } Now, define S = = s S z sgn(zs ) = sgn(y s ) and S = S z \S =. Moreover, define } S y + = {s S = ys y l, l S =. In other words, S y + refers to the principal component(s) with the largest quantized and remainder components which also have equal signs.

8 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH Theorem 4.. If S z > 1 and S y + >, i.e. n S y + such that y n y m, z n = z m m S z and sgn(z n ) = sgn(y n ), then n is the solution to Problem 1, i.e. z n is the optimal component to transmit. Proof. By assumption, S z > 1, and it is a direct consequence of the proof of Theorem 4.1 that we should choose between the elements of S z for a quantized component to transmit. Let n S y + and m S z. Recalling Eq. (1), J n J m = z n z n y n + z m + z m y m = z n y n + z m y m = z n ( y n y m sgn(z m )sgn(y m )). It is true that either m S or m S =. When m S, sgn(z m )sgn(y m ) = 1 and therefore J n J m = z n ( y n + y m ) <. (16) On the other hand, when m S =, sgn(z m )sgn(y m ) = +1 and J n J m = z n ( y n y m ). (17) Note again that n S + y (i.e. y n y m ). Hence, Eq. (17) becomes J n J m. (18) Furthermore, J n J m = only when y n = y m and z n = z m. In other words, a n = a m and clearly the two costs are equal. { } Finally, define Sy = s S y s y l, l S. Theorem 4.3. If S z > 1 and S + y =, i.e. sgn(z s) sgn(y s ) s S z and n S y such that z n = z m and y n y m m S z, then n is the solution to Problem 1, i.e. z n is the optimal component to transmit.

QPCA 9 Proof. As was the case in Theorem 4., S z > 1, but S = and S + y are empty (indicating that the signs of the quantized and remainder components differ for all those in S z ). We prove then that the optimal quantized component to transmit is the one with largest z i and the smallest y i. Let n S y and m S z. Recalling again Eq. (1), J n J m = z n z n y n + z m + z m y m = z n y n + z m y m = z n ( y n y m ), and the theorem follows. Theorem 4.1 tells us that we should transmit the quantized component largest in magnitude. If this is not unique, Theorem 4. tells us to send the z i which also has the largest remainder component which points in the same direction as its quantized component (i.e. sgn(z i ) = sgn(y i )). According to Theorem 4.3, if no such remainder component exists (i.e. all y i point opposite of their quantized counterparts), then we send the z i with the smallest remainder component. When a unique largest (in magnitude) quantized component exists, it is a direct result of Eqs. (6) and (7) that it corresponds to the largest (in magnitude) principal component. In other words, Theorem 4.1 tells us to send the quantized component of the largest principal component. As a consequence of Eq. (6), when the remainder component has the same sign as the quantized component, the corresponding principal component is larger (again, in magnitude) than the quantized component. Moreover, given z i and that z i and y i have matching signs, a i is maximized by the largest value possible for y i. Theorem 4. tells us that, given the z i which are largest in magnitude, transmit the z i which has matching remainder and quantized component signs and has the largest y i. In other words, Theorem 4. tells us to transmit the z i corresponding to the largest a i.

1 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH Similarly, given z i and mismatching signs of z i and y i, a i is maximized by the smallest value possible for y i. Theorem 4.3 tells us that, given the z i which are largest in magnitude, transmit the z i which has mismatching remainder and quantized component signs and has the smallest y i. In other words, Theorem 4.3 tells us to transmit the z i corresponding to the largest a i. To summarize: Theorem 4.1, 4., and 4.3 tell us, given a set of principal, quantized, and remainder components, which quantized component should be transmitted. This optimal quantized component is always the one corresponding the principal component largest in magnitude. 5. Iterative Algorithm. Naturally, in a practical application, we will want to sent a succession of quantized components rather than just one, and so an iterative algorithm is necessary. Fortunately, the nature of our approach very easily lends itself this, and the algorithm is presented below: Given a sample set of training data. Compute the sample mean and principal directions. Given a new sample x and the system resolution r: Compute x s principal components a i, i = 1,..., d. Set b i = a i Compute K via Eq. (3) Compute the quantized components z i via Eq. (4) Compute the remainder components y i = b i - z i For the number of desired transmissions Determine the optimal quantized component z opt via Thms. 1-3 Transmit z opt Set b opt = y opt Recompute z opt from b opt via Eq. (4) Recompute K via Eq. (3) End The simplicity of this algorithm is appealing. Moreover, at each iteration, there is little to compute, as little changes from one loop to the next. Indeed the real burden of this approach comes at the first step in the computation of the principal directions. In the

QPCA 11 next section, we will discuss some of the complexity issues associated with this problem as well as some methods for reducing it through a particular image compression example. 6. Image Compression Examples. We apply the proposed quantization scheme to a problem in which images are to be compressed and transmitted over low-bandwidth channels. Two separate data sets are used. The first set is comprised of 1 35x35 pixel grayscale images of very similar scenes (Figure 4). The second is comprised of 118 64x96 pixel grayscale images of very different scenes (Figure 5). The images themselves are represented in two different ways. In the first method, as is common practice in image processing [4][5], the images are broken into 8x8 distinct blocks. Principal components and directions are computed over these 64-pixel pieces, and the quantization algorithm is applied to each block. At each additional iteration, one more quantized component is transmitted per block of the image. In the second method, principal directions and components are computed over the entire image, as a whole. Under this method, only a single quantized component is communicated per iteration. There are computational advantages/disadvantages associated with employing either method. In the first case, a greater number of transmissions is required for the same drop in error, but computing the principal directions is easy, and the memory required to hold them is small. In the second case, the mean-squared error drops off dramatically fast, but computing and maintaining the principal directions can be practically intractable. In other words, the computational burden associated with computing larger and larger principal directions can be justified by the lower number of z i s needed to reconstruct the image. Figure 1 shows successive iterations of the algorithm on selected images from our two data sets. Each pairing in the figure ( (a) with (b), (c) with (d), etc. ) shows two progressions of an image, where the first progression is based on block representations of the image, and the second is based on full-image representations. The value of r was

1 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH set to 16, meaning that at each iteration, 4 additional bits of information per block were transmitted. (a) (b) (c) (d) (e) (f) (g) (h) Figure 1. Progression of Quantized Images - { Version (Image Set, Image Number) Iteration } (a) Block (1,4) 1 3 9 7 4 1 (b) Whole (1,4) 1 3 8 14 5 (c) Block (,3) 1 5 1 15 3 5 8 (d) Whole (,3) 1 1 4 5 9 14 (e) Block (,8) 1 5 1 15 3 5 8 (f) Whole (,8) 1 1 4 5 9 14 (g) Block (,1) 1 5 1 15 3 5 8 (h) Whole (,1) 1 1 4 5 9 14

QPCA 13 Note that the PCA algorithm and our Quantized PCA algorithm rely on the agreed knowledge between the sender/receiver of what the data mean and what the principal directions are. Though these values may indeed not be stationary, it is possible to update them in an online fashion. This can be accomplished, for example, by using the Generalized Hebbian Algorithm [1][6]. Hence, even in a changing environment, as for mobile robots, a realistic model of the surroundings can be maintained. Note moreover that whether the image sets are broken into blocks or not, the mean-squared error of any image drops off at approximately a log-linear rate with respect to the number of quantized components transmitted. In the following section, we formalize this observation and show how to predict the linear slope. 7. Error Regression. With a small amount of a priori information, we show how it is possible to predict the rate at which the log of the mean-squared error falls off. This is useful because given this rate, we may determine ahead of time how many transmissions are needed. At each iteration of our algorithm, we transmit one quantized component. As a result, the error between the true principal components a R d and the receiver s reconstruction â is reduced along one of its d dimensions. Per iteration, we can compute the error of our quantization and reconstruction as e j = x ˆx = (a i â i,j )e i, (19) where x = d (a ie i )+m, ˆx = d (â i,je i )+m, and â i,j is the receiver s ith reconstructed principal component after the jth iteration. The mean-squared error at the jth iteration then is mse j = 1 d = 1 d ( ) (a i â i,j )e i () ( a i â i,j ) (1)

14 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH By our previous definitions, we note that a i â i,j = y i,j, i.e. the ith remainder component after the jth iteration, and mse j = 1 d More useful to us is the logarithm of the mean square error, which we will denote lmse j. Taking the first difference of lmse j gives y i,j () lmse j = lmse j lmse j 1 (3) = ( log r ( 1 ) ( d ) + log r( y i,j ) log r ( 1 ) d ) + log r( y i,j 1 ) = log r ( y ) i,j y i,j 1 { Now, define the total set of indexes L = {1,..., d}, and let L be given by L = i } L a i =. Clearly, these components will never have any affect on the error as y i,j = for all iterations j and i L. It is intuitively clear, and a direct result of Theroem 4.1, that the value of K never increases over iterations. (For example, see Figure.) In fact, K is a constant over blocks (4) Value of K for Image Set, Image 8, r=16; Value of K 4 6 8 4 6 8 1 Iteration Figure. Example of the Value of K over Iterations.

QPCA 15 of iterations. After an initial start-up period (e.g. after 41 iterations in Figure ), the length of these blocks will be the same on average. For image sets that are very similar, these block lengths can be very short. For dissimilar image sets, they will be longer, bounded above by d. Over these blocks of iterations, y i,j will change value at most once. It is thus possible to compute the total change in error over an entire block, rather than the change at each iteration. The question then is, how can we characterize the lengths of these blocks, and how much should we expect the lmse to change? If we let K j denote the value of K at the jth iteration, and define M κ as the set of iterations for which K j is some constant κ we can define L κ 1 = { i L\L yi,j = y i,k j, k M κ } (5) and L κ = { } i L\L yi,j y i,k j, k M κ, j k. (6) L L κ 1 Lκ = L and each set is disjoint. Naturally, L does not need to be indexed by κ as it is invariant over iterations. We now make the assumption that j and i / L, the z i,j s are independently and identically distributed (with respect to both i and j), and z i,j r K j is uniformly distributed over the range {,..., r 1}. With these definitions, we use Eq. (3) to construct an average change in log-mean-squared error. We compare the difference in MSE between n, the last iteration of M κ, and m, the last iteration of M κ+1. This is the average lmse over the M κ interval. ( i L y i,n + i L y lmse ave = log κ i,n + 1 i L κ r i L y i,m + i L κ+1 y i,m + 1 i L κ+1 Recalling that y i = i L, and hence ( i L y lmse ave = log κ i,n + 1 i L κ r i L κ+1 y i,m + 1 i L κ+1 y i,n y i,n ). (7) y i,m ). (8) y i,m

16 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH We moreover have that y i,j = γ i,j r K j, where γ i,j [, 1 ). By definition, y i,j is constant over M κ, and hence y i,m = y i,n i L κ+1 1. In other words, y i,m = γ i,n r κ, if i L κ+1 1 γ i,m r κ+1, if i L κ+1 Similarly, γ i,p r κ 1, if i L κ 1 y i,n = γ i,n r κ, if i L κ where p is the last iteration of M κ 1. Hence, ( i L γi,pr (κ 1) + lmse ave = log κ 1 i L γi,nr κ ) κ r i L κ+1 γ 1 i,n rκ + i L κ+1 γ i,m r(κ+1) (9) The γ i,j s cancel out, which gives ( i L r κ r + lmse ave = log κ 1 i L r κ ) κ r i L κ+1 r κ + 1 i L κ+1 r κ r (3) To continue, we first draw some conclusions about the relative lengths of L κ 1 and Lκ. Obviously, there are r elements in the set {,..., r 1}. For all j M κ and for all i / L, the probability that z i,j = is 1 r. The existence of z i,j s that are zero over the interval M κ implies that the corresponding remainder components at the end of the last interval block (i.e. y i,j where j M κ+1 ) are equal to the remainder components at the end of the current interval block (i.e. y i,j where j M κ ). In other words, these remainder components do not change for a certain K j, hence belong to L κ 1, and lmse over this interval is not increased or decreased by QPCA as a result of these quantized components. Another set of indexes also belongs to L κ 1. These correspond to the quantized components which under the interval M κ would have a z i value of 1r κ, but actually will not be transmitted until the subsequent M κ 1 interval block. For example, consider the following, simple three dimensional system:

QPCA 17 Suppose r = 1 and y = [543 151 931]. At this iteration, we calculate that K = 3. Thus, z = [ 1 1]. We transmit, therefore, z 1 = (which in turn makes y 1 = 543). Now, according to Theorem 4., we transmit z = 1 ( y = 51). But we do not transmit z 3 = 1 yet. Instead, K should be re-evaluated to K =, and we transition from M 3 to M. Now, z = [5 3 9] and we transmit z 3 = 9. From this example, we see that some of the z i,j s in an interval M κ that are equal to r κ should not be transmitted until we enter M κ 1. Those z i s which are transmitted are those which in the previous interval M κ+1 had the corresponding remainder component y i,j r κ. Given the assumption that y i,j and z i,j are uniformly distributed, this constitutes half of the z i,j s in M κ, with z i,j = 1r κ. Those that are not transmitted in the M κ interval had a remainder component y i,j < r κ. Consequently, half of the probable 1 r z i,j in M κ that are equal to r κ are not transmitted in the M κ interval. In other words, the set of indexes L κ 1 is a fraction of the total number of available indexes from L\L, namely 1 r +.5 r = 1.5 r. And L makes up the remaining fraction, 1 1.5 r. By definition, L κ 1 + Lκ is invariant over κ, and so we define the length ratio of L i as ρ i = L i L i. (31) Specifically, We plug this back into Eq. (3), ρ 1 = 1.5 r (3) ρ = 1 1.5 r. (33) ( ρ1 r + ρ ) lmse ave = log r ρ 1 + ρ r = log r ( r ) =. (34)

18 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH In other words, lmse ave drops by after M κ iterations, and M κ = L κ. For our examples, we used images with d = 3136 and d = 6144 for image sets 1 and, respectively. The number of sample images used to compute the mean and principal directions was 1 and 118. Consequently, the space spanned by our two image sets was 11 and 117 dimensional, which is equal to L 1 + L. For example, with r = 1, we can say then that the average length of L for the first image set was and for image set it was 11ρ = 11(1 1.5 r ) = 9.35, 117ρ = 117(1 1.5 r ) = 99.45. We computed a slope for the error from our experimental results for image set 1 and, with r equal to 1, 4, and 18, as shown in Figure 3. Note that the experimental results coincide with the computed results to a very high degree. In fact, the predicted slope of the error has a percentage error less than.. Several important facts should be noted. First, the length of L κ deviates from the average in the early start-up region of the algorithm, and hence so does the slope. In the start-up region, the error will drop off much faster, the length of M κ are much shorter here. Second, in general, image samples will not span such a small subspace of the original d-dimensional space. Consequently, the average length of L grows and error drops off at a lower rate. Third, as can be seen in Figure 3, the lmse does not drop off at constant rate, but has humps. These humps have a frequency equal to L κ. 8. Conclusions. We have proposed a method for transmitting quantized versions of the principal components associated with a given data set in an optimal manner. This method was shown to be applicable to the problem of image compression and transmission over low-bandwidth communication channels. We also show how the progression of the error

QPCA LMSE for Image Set 1, with Prediction, r = 1 5 Plot of error for each image from Image Set 1. 5 1 4 logr(mse) logr(mse) Plot of error for each image from Image Set 1. 1 Predicted slope of the error for r=1. Plot of error for each image from Image Set 1. 6 8 15 5 LMSE for Image Set 1, with Prediction, r = 18 5 r LMSE for Image Set 1, with Prediction, r = 4 5 log (MSE) 19 1 Predicted slope of the error for r=4. 15 Predicted slope of the error for r=18. 1 4 6 Iteration 8 1 1 (a) Iteration 6 8 14 1 1 (b) LMSE for Image Set, with Prediction, r = 1 5 4 4 Iteration 5 6 7 8 (c) LMSE for Image Set, with Prediction, r = 4 5 3 LMSE for Image Set, with Prediction, r = 18 1 Predicted slope of the error for r=1. 15 Plot of error for each image from Image Set. 5 1 4 logr(mse) Plot of error for each image from Image Set. 5 logr(mse) r log (MSE) Plot of error for each image from Image Set. 6 8 1 Predicted slope of the error for r=4. 15 Predicted slope of the error for r=18. 1 4 Iteration 6 8 1 (d) 4 Iteration 6 (e) 8 1 14 1 3 4 5 Iteration 6 7 (f ) Figure 3. Measured and Predicted Error for QPCA on (a) Image Set 1 (IS1), r = 1 (b) IS1, r = 4 (c) IS1, r = 18 (d) IS, r = 1 (e) IS, r = 4 (f) IS, r = 18. per iteration of our algorithm can be accurately predicted, based on the parameters of the algorithm. REFERENCES [1] L. Chen, S. Chang: An adaptive learning algorithm for principal component analysis, IEEE Transactions on Neural Networks, v6, i5, pp.155-163, 1995. [] X. Du, B.K. Ghosh, P. Ulinski: Decoding the Position of a Visual Stimulus from the Cortical Waves of Turtles, Proceedings of the 3 American Control Conference, v1, i1, pp.477-8, 3. [3] R. Duda, P. Hart, D. Stork: Pattern Classification, John Wiley and Sons, Inc., N.Y., 1. [4] M. Kunt: Block Coding of Graphics: A Tutorial Review, Proc. of IEEE, v68, i7, pp.77-86, 198. [5] M. Marcellin, M. Gormish, A. Bilgin, M. Boliek: An Overview of JPEG-, Proc. of IEEE Data Compression Conference, pp.53-541,. 8

D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH [6] T. D. Sanger: Optimal unsupervised learning in a single-layer linear feed-forward neural network, Neural Networks, v, i6, pp.459-473, 1989. [7] C. Undey, A Cinar: Statistical Monitoring of Multistage, Multiphase Batch Processes, IEEE Control Systems, v, i5, pp.4-5,.

QPCA 1 Figure 4. Image Set 1. Figure 5. Image Set.