Limiting spectral distribution of XX matrices

Size: px
Start display at page:

Download "Limiting spectral distribution of XX matrices"

Transcription

1 Limiting spectral distribution of XX matrices Arup Bose Sreela Gangopadhyay Arnab Sen July 23, 2009 Abstract The methods to establish the limiting spectral distribution (LSD) of large dimensional random matrices includes the well known moment method which invokes the trace formula. Its success has been demonstrated in several types of matrices such as the Wigner matrix and the sample covariance matrix. In a recent article Bryc, Dembo and Jiang (2006) [7] establish the LSD for random Toeplitz and Hankel matrices using the moment method. They perform the necessary counting of terms in the trace by splitting the relevant sets into equivalence classes and relating the limits of the counts to certain volume calculations. Bose and Sen (2008) [6] have developed this method further and have provided a general framework which deals with symmetric matrices with entries coming from an independent sequence. In this article we enlarge the scope of the above approach to consider matrices of the form A p = n XX where X is a p n matrix with real entries. We establish some general results on the existence of the spectral distribution of such matrices, appropriately centered and scaled, when p and n = n(p) and p/n y with 0 y <. As examples we show the existence of the spectral distribution when X is taken to be the appropriate asymmetric Hankel, Toeplitz, circulant and reverse circulant matrices. In particular, when y = 0, the limits for all these matrices coincide and is the same as the limit for the symmetric Toeplitz derived in Bryc, Dembo and Jiang (2006) [7]. In other cases, we obtain new limiting spectral distributions for which no closed form expressions are known. We demonstrate the nature of these limits through some simulation results. Keywords: Large dimensional random matrix, eigenvalues, sample covariance matrix, Toeplitz matrix, Hankel matrix, circulant matrix, reverse circulant matrix, spectral distribution, bounded Lipschitz metric, limiting spectral distribution, moment method, volume method, almost sure convergence, convergence in distribution. AMS Subject Classification: 60F05, 60F5, 62E20, 60G57. Stat Math Unit, Kolkata, Indian Statistical Institute Stat Math Unit, Kolkata, Indian Statistical Institute Dept. of Statistics, U.C. Berkeley.

2 Introduction. Random matrices and spectral distribution Random matrices were introduced in mathematical statistics by Wishart (928) [20]. In 950 Wigner successfully used random matrices to model the nuclear energy levels of a disordered particle system. The seminal work, Wigner (958) [9] laid the foundation of this subject. This theory has grown into a separate field now, with application in many branches of sciences that includes areas as diverse as multivariate statistics, operator algebra, number theory, signal processing and wireless communication. Spectral properties of random matrices is in general a rich and attractive area for mathematicians. In particular, when the dimension of the random matrix is large, the problem of studying the behavior of the eigenvalues in the bulk of the spectrum has arisen naturally in many areas of science and has received considerable attention. Suppose A p is a p p random matrix. Let λ,...,λ p R (or C) denote its eigenvalues. When the eigenvalues are real, our convention will be to always write them in ascending order. The empirical spectral measure µ p of A p is the random measure on R (or C ) given by µ p = p p δ λi, () i= where δ x is the Dirac delta measure at x. The random probability distribution function on R (or C) corresponding to µ p, is known as the Empirical Spectral Distribution (ESD) of A p. We will denote it by F Ap. If {F Ap } converges weakly (as p tends to infinity), either almost surely or in probability, to some (nonrandom) probability distribution, then that distribution is called the Limiting Spectral Distribution (LSD) of {A p }. Proving existence of LSD and deriving its properties for general patterned matrices has drawn significant attention in the literature. We refer to Bai (997) [] Bose and Sen (2008) [6] for information on several interesting situations where the LSD exists and can be explicitly specified. Possibly the two most significant matrices whose LSD have been extensively studied, are the Wigner and the sample covariance matrices. For simplicity, let us assume for the time being that all random sequences under consideration are independent and uniformly bounded.. Wigner matrix. In its simplest form, the Wigner matrix W n (s) (Wigner, 955, 958) [8, 9], of order n is an n n symmetric matrix whose entries on and above the diagonal are i.i.d. random variables with zero mean and variance one. Denoting those i.i.d. random variables by {x ij : i j}, we can visualize the Wigner matrix as W (s) n = It is well known that almost surely, x x 2 x 3... x (n ) x n x 2 x 22 x x 2(n ) x 2n.. (2) x n x 2n x 3n... x n(n ) x nn LSD of n /2 W (s) n is the semicircle law W, (3) with the density function p W (s) = 2π 4 s 2 if s 2, 0 otherwise. 2 (4)

3 2. Sample covariance matrix (S matrix). Suppose {x jk : j,k =,2,...} is a double array of i.i.d. real random variables with mean zero and variance. The matrix A p (W) = n W p W p where W p = ((x ij )) i p, j n (5) is called a sample covariance matrix (in short an S matrix). Let I p denote the identity matrix of order p. The following results are well known. See Bai (999) [] or Bose and Sen (2008) [6] for moment method and Stieltjes transform method proofs. To have the historical idea of the successive development of these results see Bai and Yin (988) [3] for (a) and Marčenko, and Pastur (967) [3], Grenander and Silverstein (977) [9], Wachter (978) [7], Jonsson (982) [], Yin and Krishnaiah (985) [22] and Yin (986) [2] for (b). (a) If p and p/n 0 then almost surely, LSD of n p (A p(w) I p ) is the semicircle law given in (4) above. (6) (b) If p and p/n y (0, ) then almost surely, LSD of A p (W) is the Marčenko-Pastur law MP y given in (8) below. (7) The Marčenko-Pastur law, MP y, has a positive mass y at the origin if y >. Elsewhere it has a density: 2πxy (b x)(x a) if a x b, MP y (x) = (8) 0 otherwise where a = a(y) = ( y) 2 and b = b(y) = ( + y) 2. Suppose p = n. Then W n above is a square matrix with i.i.d. elements, and the Wigner matrix W n (s) is obtained from W n by retaining the elements above and on the diagonal of W n and letting symmetry dictate the choice of the rest of the elements of W n (s). We loosely say that W n is the asymmetric version of W n (s). It is interesting to note the following: (i) The LSD of n /2 W (s) n in (3) and that of n p ( n W pw p I p) in (6) are identical. (ii) If W and M are random variables obeying respectively, the semicircle law and the Marčenko-Pastur law with y =, then M = D W 2 in law. That is, the LSD of n /2 W n (s) and that of n W pw p bears this squaring relationship when p/n..2 The problem and its motivation In view of the above discussion, it thus is natural to study the LSD of matrices of the form A p (X) = (/n)x p X p where X p is a p n suitably patterned (asymmetric) random matrix. Asymmetry is used loosely. It just means that X p is not necessarily symmetric. In particular one may ask the following questions. (i) Suppose p/n y, 0 < y <. When does the LSD of A p (X) = n X px p exist? (ii) Suppose p/n 0. When does the LSD of n p (A p(x) I p ) exist? Note that unlike the S matrix, the mean of A p (X) is not in general equal to I p and hence other centering/normalisations may also need to 3

4 be allowed. (iii) Suppose X p is a p n (asymmetric) patterned matrix and A p (X) = n X px p for which the limit in (ii) holds. Call it A. Now consider the square symmetric matrix X n (s) obtained from X p by letting p = n and symmetrising it by retaining the elements of X p above and on the diagonal and allowing symmetry to dictate the rest of the elements. Suppose as n, the LSD of n /2 X n (s) exists. Call it B. In general, is there any relation between B and A? A further motivation to study the LSD of n X px p comes from wireless communications theory. Many results on the information-theoretic limits of various wireless communication channels make substantial use of asymptotic random matrix theory, as the size of the matrix increases. An extended survey of results and works in this area may be found in Tulino and Verdu (2004) [6]. Also see Silverstein and Tulino (2006) [5] for a review of some of the existing mathematical results that are relevant to the analysis of the properties of random matrices arising in wireless communications. The study of the asymptotic distribution of the singular values is seen as an important aspect in the analysis and design of wireless communication channels. A typical wireless communication channel may be described by the linear vector memoryless channel: y = X p θ + ǫ where θ is the n-dimensional vector of the signal input, y is the p-dimensional vector of the signal output, and the p dimensional vector ǫ is the additive noise. X p, in turn, is the p n random matrix, generally with complex entries. Silverstein and Tulino (2006) [5] emphasize the asymptotic distribution of the squared singular-values of X p under various assumptions on the joint distribution of the random matrix coefficients where n and p tend to infinity while the aspect ratio, p/n y, 0 < y <. In their model, generally speaking, the channel matrix X p can be viewed as X p = f(a,a 2,...,A k ) where {A i } are some independent random (rectangular) matrices with complex entries, each having its own meaning in terms of the channel. In most of the cases they studied, some A i have all i.i.d. entries, while the LSD of the other matrices A j A j,j i are assumed to exist. Then the LSD of X p X p is computed in terms of the LSD of A j A j,j i. Specifically, certain CDMA channels can be modeled (e.g. see Tulino and Verdu (2004) [6, Chapter 3], Silverstein and Tulino (2006) [5]) as X p = CSA where C is a p p random symmetric Toeplitz matrix, S is a matrix with i.i.d. complex entries independent of C, and A is an n n deterministic diagonal matrix. One of the main theorems in Silverstein and Tulino (2006) [5] establishes the LSD of X p = CSA for random p p matrix C, not necessarily Toeplitz, under the added assumption that the LSD of the matrices CC exists. When C is a random symmetric Topelitz matrix, the existence of the LSD of CC is immediate from the recent work of Bryc, Dembo and Jiang (2006) [7] and, Hammond and Miller (2005) [0]. This also motivates us to study the LSD of X p X p matrices in some generality where X p is not necessarily square and symmetric..3 The moment method We shall use the method of moments to study this convergence. To briefly describe this method, suppose {Y p } is a sequence of random variables with distribution functions {F p } such that E(Yp h ) β h for every positive integer h, and {β h } satisfies Carleman s condition (see Feller, 966, page 224) [8]: β /2h 2h =. (9) h= 4

5 Then there exists a distribution function F, such that for all h, β h = β h (F) = x h df(x) (0) and {Y p } (or equivalently {F p }) converges to F in distribution. We will often in short write β h when the underlying distribution F is clear from the context. Now suppose {A p } is a sequence of p p symmetric matrices (with possibly random entries), and let, by a slight abuse of notation, β h (A p ) denote the h-th moment of the ESD of A p. Suppose there is a sequence of nonrandom {β h } h= satisfying Carleman s condition such that, (M) First moment condition: For every h, E[β h (A p )] β h and (M2) Second moment condition: For every h, Var[β h (A p )] 0. Then the LSD is identified by {β h } h= and the convergence to LSD holds in probability. This convergence may be strengthened to almost sure convergence by strengthening (M2), for example, by replacing variance by the fourth moment and showing that (M4) Fourth moment condition: For every h, E[β h (A p ) E(β h (A p ))] 4 = O(p 2 ). For any symmetric p p matrix A with eigenvalues λ,... λ p, the trace formula p β h (A) = p λ h i = p Tr(A h ) i= is invoked to compute moments. Moment method proofs are known for the results on Wigner matrix and the sample covariance matrix given earlier. See for example, Bose and Sen (2008) [6]..4 Brief overview of our results Let Z be the set of all integers. Let {x α } α Z be an input sequence where Z = Z or Z 2 and {x α } are independent with Ex α = 0 and Ex 2 α =. Let L p : {,2,...,p} {,2,...,n = n(p)} Z be a sequence of functions which we call link functions. For simplicity of notation, we will write L for L p. We shall later impose suitable restriction on the link functions L p and sufficient probabilistic assumptions on the variables. Consider the matrices X p = ((x Lp(i,j))) i p, j n and A p = A p (X) = (/n)x p X p. The function L p defines an appropriate pattern and we may call these matrices patterned matrices. The assumptions on the link function restrict the number and manner in which any element of the input sequence may occur in the matrix. We shall consider two different regimes: Regime I. p, p/n y (0, ). In this case, we consider the spectral distribution of A p = A p (X) = n X p X p. Regime II. p, p/n 0. In this case we consider the spectral distribution of (np) /2 (X p X p ni p ) = n p (A p I p ) where I p is the identity matrix of order p. 5

6 We will use the following assumptions on the input sequences in two regimes. Assumption R. In Regime I, {x α } are independent with mean zero and variance one, and are either uniformly bounded or identically distributed. Assumption R2. In Regime II, {x α } are independent with mean zero and variance. Further, λ is such that p = O(n /λ ) and sup α E( x α 4(+/λ)+δ ) < for some δ > 0. The moment method requires all moments to be finite. However, in Regime I, our basic assumption will be that the input sequence has finite second moment. To deal with this, we first prove a truncation result which shows that in Regime I, under appropriate assumption on the link function, without loss of generality, we may assume the input sequence to be uniformly bounded. The situation in Regime II is significantly more involved and there we assume existence of higher moments. We then establish a negligibility result which implies that for nonzero contribution to the limiting moments, only summands in the trace formula which are pair matched matter. See next section for details on matching. The existence of the limit of the sum of pair matched terms and then the computation of the limit establishes the LSD. Quite interestingly, under reasonable assumptions on the allowed pattern, if the limit of empirical moments exist, they automatically satisfy Carleman s condition and thus ensures the existence of the LSD. However, we are unaware of any general existing method of imposing suitable restrictions on the link function to guarantee the existence of limits of moments. As examples of our method, we let X p to be the nonsymmetric Toeplitz, Hankel, reverse circulant and circulant matrices. We show that the LSD exists in both Regimes I and II, thereby answering questions (i) and (ii) in the previous section affirmatively for these matrices. The LSD in Regime I are all new. However, in Regime II, the LSD of all four matrices above are identical to the LSD obtained by Bryc, Dembo and Jiang (2006) [7] for the symmetric Toeplitz matrix T n (s). This implies that the answer to question (iii) is not straightforward and needs further investigation. Closed form expressions for the LSD or for its moments do not seem to be easily obtainable. We provide a few simulations to demonstrate the nature of the LSD and it is an interesting problem to derive detailed properties of the LSD in any of these cases..5 Examples In this section we will consider some specific nonsymmetric matrices of interest. In particular, let X p be the asymmetric versions of the four matrices Toeplitz, Hankel, circulant and reverse circulant. Then it turns out that in Regime II, the LSD for n p (A p(x p ) I p ) exists and is identical to the LSD for n /2 T n (s) obtained by Bryc, Dembo and Jiang (2006) [7] for the symmetric n n Toeplitz matrix T (s) n. In Regime I, the LSD for A p (X p ) exist but they are all different. The proofs of the results given below are presented in Section The asymmetric Toeplitz matrix The n n symmetric Toeplitz matrix is defined as T n (s) = ((x i j )). It is known that if the input sequence {x i } has mean zero and variance one and is either independent and uniformly bounded or, is i.i.d., then n /2 T (s) n converges weakly almost surely to L T (say). 6

7 See for example Bose and Sen (2008) [6], Bryc, Dembo and Jiang (2006) [7] and Hammond and Miller (2005) [0]. It may be noted that the moments of L T may be represented as volumes of certain subsets of the unit hypercubes but are not known in any explicit form. Let T p = T = ((x i j )) p n, A p (T) = n T p T p. Note that T p is the nonsymmetric Toeplitz matrix. Theorem (i) [Regime I] Assume R holds. If p n y (0, ) then empirical spectral distribution F Ap(T) converges in distribution almost surely to a nonrandom distribution which does not depend on the distribution of {x i }. (ii) [Regime II] Assume R2 holds. Then F q n p (Ap(T) Ip) converges in distribution to L T almost surely..5.2 The asymmetric Hankel matrix Suppose {x i,i =, ±, ±2,...} is an independent input sequence. Let H n (s) = ((x i+j )) be the n n symmetric Hankel matrix. It is known that if {x i } has mean zero and variance one and is either uniformly bounded or i.i.d., then F n /2 H n (s) converges weakly almost surely. See for example Bryc, Dembo and Jiang (2006) [7] and Bose and Sen (2008) [6]. Now let H = H p n be the asymmetric Hankel matrix where the (i,j)th entry is x i+j if i > j and x (i+j) if i j. Let H p (s) = ((x i+j )) p,n be the rectangular Hankel matrix with symmetric link function. Let We then have the following Theorem. A p (H) = n H p H p and A p(h (s) ) = n H (s) p H(s) p. Theorem 2 (i) [Regime I] Assume R holds. If p n y (0, ) then F Ap(H) converges almost surely to a nonrandom distribution which does not depend on the distribution of {x i }. (ii) [Regime II] Assume R2 holds. Then F q n p (Ap(H) Ip) converges almost surely to L T. The same limit continues to hold if A p (H) is replaced A p (H (s) )..5.3 The asymmetric reverse circulant matrix The symmetric Reverse Circulant R (s) n has the link function L(i,j) = (i+j) mod n. The LSD of n /2 R (s) n has been discussed in Bose and Mitra (2003) [5] and Bose and Sen (2008) [6]. In particular it is known that if {x i } are independent with mean zero and variance and are either (i) uniformly bounded or (ii) identically distributed, then the LSD R of n /2 R (s) n exists almost surely and has the density, f R (x) = x exp( x 2 ), < x < with moments β 2h+ (R) = 0 and β 2h (R) = h! for all h 0. 7

8 Let R p (s) be the p n matrix with link function L(i,j) = (i + j) mod n and R p = R p n be the asymmetric version of R p (s) with the link function L(i,j) = (i + j) mod n for i j = [(i + j) mod n] for i > j. So, in effect, the rows of R p are the first p rows of an (asymmetric) Reverse Circulant matrix. Let We then have the following theorem. A p (R) = n R p R p and A p(r (s) ) = n R (s) p R(s) p. Theorem 3 (i) [Regime I] Assume R holds. If p n y (0, ) then F Ap(R) converges almost surely to a nonrandom distribution which does not depend on the distribution of {x i }. (ii) [Regime II] Assume R2 holds. Then F q n p (Ap(R) Ip) converges almost surely to L T. The same limit continues to hold if A p (R) is replaced A p (R (s) ). ().6 The asymmetric circulant matrix The square circulant matrix is well known in the literature. Its eigenvalues can be explicitly computed and are closely related to the periodogram. Its LSD is the bivariate normal distribution. See for example Bose and Mitra [5]. Its symmetric version is treated in Bose and Sen (2008) [6]. Let C p = C p n = ((x L(i,j) )) where L(i,j) = (n i + j) mod n, and A p (C) = p C p C p. Theorem 4 (i) [Regime I] Assume R holds. If p n y (0, ) then F Ap(C) converges almost surely to a nonrandom distribution which does not depend on the distribution of {x i }. (ii) [Regime II] Assume R2 holds. Then F q n p (Ap(C) Ip) converges almost surely to L T. 2 Basic notation, definitions and assumptions Examples of link functions that correspond to matrices of interest are as follows: 0. Covariance matrix: L p (i,j) = (i,j).. Asymmetric Toeplitz matrix: L p (i,j) = i j. 2. Asymmetric Hankel matrix: L p (i,j) = sgn(i j)(i + j) where sgn(l) = { if l 0, if l < 0 (2) 3. Asymmetric Reverse Circulant: L p (i,j) = sgn(i j)(i + j) mod n. 4. Asymmetric Circulant: L p (i,j) = (n + j i) mod n. For any set G, #G and G will stand for the number of elements of G. We shall use the following general assumptions on the link function. Assumptions on link function L: 8

9 A. There exists positive integer Φ such that for any α Z and p, (i) #{i : L p (i,j) = α} Φ for all j {,2,...,n} and (ii) #{j : L p (i,j) = α} Φ for all i {,2,...,p}. A. For any α Z and p, #{i : L p (i,j) = α} for all j {,2,...,n}. B. k p α p = O(np) where k p = #{α : L p (i,j) = α, i p, j n}, α p = max α Z L p (α). Assumption A stipulates that with increasing dimensions, the number of times any fixed input variable appears in any given row or column, remains bounded. Assumption A stipulates that no input appears more than once in any column. Assumption B makes sure that no particular input appears too many times in the matrix. Clearly A implies A(i) but B is not related to either A or A. Consider the link function L p (i,j) = (,) if i = j and L p (i,j) = (i,j) if i j. This link function satisfies A(ii) and A but not B. On the other hand if L p (i,j) = for all (i,j) then it satisfies B but not A. It is easy to verify that the link functions listed above satisfy these assumptions. It may also be noted that the symmetric Toeplitz link function does not satisfy Assumption A. Towards applying the moment method, we need a few notions, most of which are given in details in Bose and Sen (2008) [6] in the context of symmetric matrices. These concepts and definitions will remain valid in Regime II also, with appropriate changes. The trace formula. Let A p = n X p X p. Then the h-th moment of ESD of A p is given by p Tr A h p = p n h i,i 2,...,i h n x Lp(i,i 2 )x Lp(i 3,i 2 ) x Lp(i 2h,i 2h )x Lp(i,i 2h ). (3) Circuits. Any function π : {0,,2,,2h} Z + is said to be a circuit if (i) π(0) = π(2h), (ii) π(2i) p 0 i h and (iii) π(2i ) n i h. The length l(π) of π is taken to be (2h). A circuit depends on h and p but we will suppress this dependence. Matched Circuits. Let ξ π (2i ) = L(π(2i 2),π(2i )), i h and ξ π (2i) = L(π(2i),π(2i )), i h. These will be called L-values. A circuit π is said to have an edge of order e ( e 2h) if it has an L-value repeated exactly e times. Any π with all e 2 will be called L-matched (in short matched). For any such π, given any i, there is at least one j i such that ξ π (i) = ξ π (j). Define for any circuit π, h X π = x ξπ(2i )x ξπ(2i). (4) i= 9

10 Due to the mean zero and independence assumption, if π has at least one edge of order one then E(X π ) = 0. Equivalence relation on circuits. Two circuits π and π 2 of same length are said to be equivalent if their L values agree at exactly the same pairs (i,j). That is, iff { ξ π (i) = ξ π (j) ξ π2 (i) = ξ π2 (j) }. This defines an equivalence relation between the circuits. Words. Equivalence classes arising from the above equivalence relation may be identified with partitions of {,2,,2h} to any partition we associate a word w of length l(w) = 2h of letters where the first occurrence of each letter is in alphabetical order. For example, if h = 3, then the partition {{,3,6}, {2,4}, {5}} is represented by the word ababca. Let w denote the number of different letters that appear in w. The notion of order e edges, matching, nonmatching for π, carries over to words in a natural manner. For instance, the word ababa is matched. The word abcadbaa is nonmatched, has edges of order, 2 and 4 and the corresponding partition is {{,4,7,8}, {2,6}, {3}, {5}}. As pointed out, it will be enough to consider only matched words since E(X π ) = 0 for nonmatched words. The following fact is obvious. Word Count. #{w : w is of length 2h and has only order 2 edges with w = h} = (2h)! 2 h h!. (5) The class Π(w). Let w[i] denote the i-th entry of w. The equivalence class corresponding to w will be denoted by Π(w) = {π : w[i] = w[j] ξ π (i) = ξ π (j)}. The number of partition blocks corresponding to w will be denoted by w. If π Π(w), then clearly the number of distinct L-values equals w. Notationally, #{ξ π (i) : i 2h} = w. Note that now, with the above definition, we may rewrite the trace formula as p Tr A h p = p n h X π, π: π circuit X π = p n h w π Π(w) where the first sum is taken over all words with length (2h). After taking expectation, only the matched words survive and E [ p Tr A h ] p = p n h w matched EX π. π Π(w) To deal with (M4), we need multiple circuits. The following notions will be useful for dealing with multiple circuits: Jointly Matched and Cross-matched Circuits. k circuits π,π 2,,π k are said to be jointly matched if each L-value occurs at least twice across all circuits. They are said to be cross-matched if each circuit has at least one L-value which occurs in at least one of the other circuits. The class Π (w). Define for any (matched) word w, Π (w) = {π : w[i] = w[j] ξ π (i) = ξ π (j)}. (6) 0

11 Note that Π (w) Π(w). However, as we will see, Π (w) is equivalent to Π(w) for asymptotic considerations, but is easier to work with. Vertex and generating vertex. Each π(i) of a circuit π will be called a vertex. Further, π(2i),0 i h will be called even vertices or p-vertices and π(2i ), i h will be called odd vertices or n-vertices. A vertex π is said to be generating if either i = 0 or w[i] is the position of the first occurrence of a letter. For example, if w = abbcab then π(0), π(), π(2), π(4) are generating vertices. We will call an odd generating vertex π(2i ) Type I if π(2i) is also an even generating vertex. Otherwise we will call π(2i ) a Type II odd generating vertex. Obviously, number of generating vertices in π is w +. By Assumption A on L function given earlier, a circuit of a fixed length is completely determined, up to a finitely many choices by its generating vertices. Hence, in Regime I under the assumption A, we obtain the simple but crucial estimate Π (w) = O(n w + ). Let [a] denote the integer part of a. For Regime II, we will further assume A. In that case, it shall be shown that Π w + +[ (w) = O(p 2 ] n [ w 2 ] ). 3 Regime I In Regime I, p such that p/n y (0, ) and we consider the spectral distribution of A p = A p (X) = n X p X p. There are two main theorems in this section. Theorem 5 is proved under Assumption B. We show that the input sequence may be taken to be bounded without loss of generality. This allows us to deal with bounded sequences in all examples later. This result is proved sketchily in Bose and Sen (2008) [6], with special reference to the sample covariance matrix. We provide a detailed proof for clarity and completeness. Theorem 6 is proved under Assumptions A and B. We show that the ESD of {A p (X)} is almost surely tight and any subsequential limit is sub Gaussian. Invoking the trace formula, we show that the LSD exists iff the moments converge. Further, in the limit, only pair matched terms potentially contribute. To prove Theorem 5, we will need the following notion and result. The bounded Lipschitz metric d BL is defined on the space of probability measures as: d BL (µ,ν) = sup{ fdµ fdν : f + f L } (7) where f = sup x f(x), f L = sup f(x) f(y) / x y. x =y Recall that convergence in d BL implies weak convergence of measures. The following inequalities provide estimate of the metric distance d BL in terms of trace. Proofs may be found in Bai and Silverstein (2006) [2] or Bai (999) [] and uses Lidskii s theorem (see Bhatia, 997, page 69) [4]. Lemma (i) Suppose A, B are n n symmetric real matrices. Then d 2 BL(F A,F B ) ( n n i= ) 2 λ i (A) λ i (B) n n (λ i (A) λ i (B)) 2 n Tr(A B)2. (8) i=

12 (ii) For any square matrix S, let F S denote its empirical spectral distribution. Suppose X and Y are p n real matrices. Let A = XX and B = Y Y Then d 2 BL (F A,F B ) ( p p i= ) 2 λ i (A) λ i (B) 2 p 2 Tr(A + B)Tr[(X Y )(X Y ) ]. (9) Theorem 5 Let p, p n y (0, ) and Assumption B holds so that α pk p = O(np). Suppose for every bounded mean zero and variance one i.i.d. input sequence, F n X px p converges weakly to some fixed nonrandom distribution G almost surely. Then the same limit continues to hold if the input sequence is i.i.d. with mean zero and variance one. Proof of Theorem 5. Without loss of generality we shall assume that Z = Z and also write X for X p. For t > 0, denote µ(t) = E[x 0 I( x 0 > t)] = E[x 0 I( x 0 t)] and let σ 2 (t) = Var(x 0 I( x 0 t)) = E[x 2 0I( x 0 t)] µ(t) 2. Since E(x 0 ) = 0 and E(x 2 0 ) =, we have µ(t) 0 and σ(t) as t and σ2 (t). Define bounded random variables x i = x ii( x i t) + µ(t) σ(t) = x i x i σ(t) where x i = x i I( x i > t) µ(t) = x i σ(t)x i. (20) It is easy to see that E( x 2 0 ) = σ2 (t) µ(t) 2 0 as t tends to infinity. Further, {x i } are i.i.d. bounded, mean zero and variance one random variables. Let us replace the entries x Lp(i,j) of the matrix X p by the truncated version x L (respectively x p(i,j) L p(i,j) ) and denote this matrix by Y (respectively X p ). By triangle inequality and (9), d 2 ( BL F n X px ) p,f n Y Y 2d 2 ( BL F n X px ) ( ) p,f n σ(t) 2 Y Y + 2d 2 BL F n Y Y,F n σ(t) 2 Y Y 2 p 2 n 2 Tr(X px p + σ(t)2 Y Y )Tr(X p σ(t)y )(X p σ(t)y ) + 2 p 2 n 2 Tr(Y Y + σ(t) 2 Y Y )Tr(σ(t)Y Y )(σ(t)y Y ). To tackle the first term on the right side above, Tr(X p X p + σ(t)2 Y Y ) n k= x2 L + p(i,k) σ(t)2 p = p i= α n k p ( kp i= x2 i k p α n k p ( kp i= x2 i k p ) + p i= k= i= ) + α n k p ( kp n k= x 2 L p(i,k) n (x Lp(i,k) x Lp(i,k)) 2 i= x2 i I{ x i > t} k p ) ( kp i= + µ(t) α p k x ) ( kp ) i i= p + α p k x2 i p. k p k p Therefore using α p k p = O(np) and the SLLN, we can see that (np) (Tr(X p X p + σ(t)2 Y Y )) is bounded. Now, 2

13 np Tr(X p σ(t)y )(X p σ(t)y ) = np Tr( X p X p ) ( np α pk p k p which is bounded by CE( x 2 i ) almost surely, for some constant C. Here we use the condition α pk p = O(np) and SLLN for { x 2 i }. Since E( x2 i ) 0 as t we can make the right side tend to 0 almost surely, by first letting p tend to infinity, and then letting t tend to infinity. This takes care of the first term. To tackle the second term, 2 [ ( T 2 = Tr (σ(t) 2 n 2 p 2 )Y Y + Y Y ) ) Tr ( (σ(t)y Y )(σ(t)y Y ) )] 2 = n 2 p 2(σ(t)2 + )(σ(t) ) 2 (Tr(Y Y )) 2 2 p n n 2 p 2(σ(t)2 + )(σ(t) ) 2 ( x 2 L p(i,k) )2 [ i= k=( kp ) ( kp ) 2 (σ(t) )2 i= n 2 p 2(σ(t)2 + ) σ(t) 2 α p k x2 i i= p + α p k x2 i I{ x i > t} p k p k p ( kp ) ( i= + µ(t) α p k x kp )] 2 i i= p + α p k x2 i p. k p k p Again using the condition α p k p = O(np) and σ(t) as t we get T 2 0 almost surely. This completes the proof of the theorem. To investigate the existence of the LSD of A p = A p (X) = (/n)x p X p, in view of Theorem 5, we shall assume that the input sequence is i.i.d. bounded, with mean zero and variance. Recall the two formulae given earlier: p Tr A h p = p n h X π, π: π circuit X π = p n h w π Π(w) k p i= x 2 i ), where the first sum is taken over all words with length (2h) and E [ p Tr A h ] p = p n h w matched π Π(w) EX π. Theorem 6 Let A p = (/n)x p X p where the entries of X p are bounded, independent with mean zero and variance and L p satisfies Properties A and B. Let p/n y, 0 < y <. Then (i) If w is matched word of length (2h) with an edge of order 3, then p n h π Π(w) EX π 0. (ii) For each h, [ E p Tr A h ] p p n h Π (w) 0. w matched w =h (iii) For each h, p Tr A h p E [ p Tr A h ] a.s. p 0. (iv) If we denote β h = lim sup p w matched p n h Π (w), then {β h } h satisfies Carleman s condition. w =h 3

14 The sequence of ESD, {F Ap(X) } is almost surely tight. Any subsequential limit is sub Gaussian. The LSD exists iff lim p w matched p n h Π (w) exists for each h. The same LSD continues to hold if the input w =h sequence is i.i.d. (not necessarily bounded) with mean zero and variance one. Remark Without assuming anything further on the link function L p, the above limits need not exist. However in Section.5, we have seen several examples where the limits do indeed exist. Proof of Theorem 6. For the special case of covariance matrix, a proof can be found in Bose and Sen (2008) [6]. The same arguments may be adapted to general link functions. For the sake of completeness, here are the essential steps. Recall that circuits which have at least one edge of order contribute zero. Thus, consider all circuits which have at least one edge of order at least 3 and all other edges of order at least 2. Let N h,3 + be the number of such circuits of length (2h). Suppose first that p = n. Then y =. In this case π has uniform range, π(i) n, i 2h. Then, from the arguments of Bryc, Dembo and Jiang (2006) [7], n (+h) N h, Now for general y > 0, the range of π(i) is not same for every i. For odd vertices, it is from to n and for even vertices, it is to p. However, this case is easily reduced to the previous case (p = n) as follows: let Π(w) be the possibly larger class of same circuits but with range π(i) max(p,n), i 2h. Then, there is a constant C, such that p n h w has one edge of order at least 3 Π(w) C[max(p,n)] (h+) w has one edge of order at least 3 from the previous case. This proves (i). Statement (ii) is then a consequence. For (iii), it is enough to show that E [ p Tr A h p Ep Tr A h p] 4 = O(p 2 ). Π(w) 0 The proof is essentially same as the proof of Lemma 2 (b) in Bose and Sen (2008) [6]. For the sake of completeness, we give the full proof here. We write the fourth moment as p 4 E[ TrA h p E(TrA h p) ] 4 = p 4 n 4h π,π 2,π 3,π 4 E[ 4 (X πi EX πi )]. If (π,π 2,π 3,π 4 ) are not jointly matched, then one of the circuits, say π j, has an L-value which does not occur anywhere else. Also note that EX πj = 0. Hence, using independence, E[ 4 i= (X π i EX πi )] = E[X 4 πj i=,i j (X π i EX πi )] = 0. Further, if (π,π 2,π 3,π 4 ) is jointly matched but is not cross-matched then one of the circuits, say π j is only self-matched, that is, none of its L-values is shared with those of the other circuits. Then again by independence, E[ 4 (X πi EX πi )] = E[(X πj EX πj )]E[ i= 4 i=,i j i= (X πi EX πi )] = 0. So it is clear that for non-zero contribution, the quadruple of circuits must be jointly matched and cross matched. Observe that the total number of edges in each circuit of the quadruple is (2h). So total number of 4

15 edges over 4 circuits is (8h). Since they are at least pair matched, there can be at most (4h) distinct edges i.e. distinct L-values. In Bryc, Dembo and Jiang (2006) [7] the authors estimated the number of quadruples of circuits which are jointly matched and cross matched. They used a method of counting which they applied only to Toeplitz and Hankel link function. But that method works as well for any general link function satisfying Assumption A. Using this method, we can say that apart from π (0), π 2 (0), π 3 (0) and π 4 (0) which are always generating vertices, there can be at most (4h 2) many generating vertices to determine (4h) distinct L-values. So total number of generating vertices obtained using Bryc, Dembo and Jiang (2006) [7] method is at most (4h + 2). Since p n y, 0 < y <, it is easy to see that there is a constant K which depends on y and h, such that p 4 E[ Tr A h p E(Tr A h p) ] 4 p 4h+2 K p 4h+4 = O(p 2 ). (2) This guarantees that if there is convergence, it is almost sure. So (iii) is proved. By Assumption A, for any matched word w of length (2h) with w = h, we have Π (w) n h+ Φ h which combined with the fact (5) yields the validity of Carleman s condition, proving (iv). The other claims of the theorem are easy consequences of all the discussion so far. 4 Regime II In this case, p and p/n 0. From the experience of the existing results for the sample covariance matrix (see for example Bai and Silverstein (2006) [2]), the following scaled and centered version is required. Define N p = N p (X) = ( n p )/2 ( n n X px p I p) = p (A p(x) I p ). (22) Straightforward extension of Theorem 5 is not possible to this situation. In Theorem 7 we show how the matrix N p defined above may be approximated by the following matrix B p = B p (X) with bounded entries under additional moment assumption. (B p ) ij = n ˆx np L(i,l)ˆx L(j,l), if i j and (B p ) ii = 0, (23) where x α = x α I( x α ǫ p n /4 ), ˆx α = x α E x α and ǫ p will be chosen later. l= Theorem 8 establishes the relationship between the existence of the LSD and the limits of the empirical moments. In Theorem 9 we prove an interesting result in Regime II. Suppose L () and L (2) are two link functions satisfying Assumptions A(ii), A and B and they agree on the set {(i,j) : i p, j n and i < j}. Then we show that the corresponding matrices N p (X) have identical LSD. In Theorem 0 we work with Assumption A and show the closeness of LSD in probability when two link functions agree on the above set. Theorem 7 Let p,n so that p/n 0 and Assumption R2 holds.suppose L p satisfies Assumptions A and B. Then there exists a nonrandom sequence {ǫ p } with the property that ǫ p 0 but ǫ p p /4 as p such that D(F Bp,F Np ) 0 a.s. where D is the metric for convergence in distribution on the space of all probability distribution functions. 5

16 Proof of Theorem 7. Let X p = (( x L(i,j) )) i p, j n and Ñp = np ( X p X p ni p ). sup x F Np (x) F Ñp (x) p rank(x p X p ) p p i= j= n η L(i,j) where η α = I( x α > ǫ p n /4 ). The first inequality above follows from Bai (999) [, Lemma 2.6]. Write q p = sup α P( x α > ǫ p n /4 ). We claim that there exists a sequence ǫ p 0 going to zero arbitrarily slowly such that q p ǫ p n (+/λ)+δ/8. (24) To establish the claim, for simplicity, assume n = n(p) is an increasing function of p. Fix any ǫ > 0. We have n (+/λ)+δ/8 sup α P( x α > ǫn /4 ) ǫ 4(+/λ) δ/2 sup E x α 4(+/λ)+δ/2 I( x α > ǫn /4 ) 0 (25) α since the random variables { x α 4(+/λ)+δ/2 } are uniformly integrable. Given m, by (25) find an integer p m such that n n m := n(p m ) implies n (+/λ)+δ/8 sup P( x α > m n /4 ) m. α Define ǫ p = /m if p m p < p m+ and ǫ p = n(p ) (+/λ)+δ/8 for p < p. Note that by choosing the integers in the sequence p < p 2 < as large as we want, we can make ǫ p go to zero as slowly as we like. Clearly, ǫ p satisfies the inequality (24). For any β > 0, and with Y i independent Bernoulli with E(Y i ) q p, P(sup x F Np (x) F Ñp (x) > β) P(p p i= j= n η L(i,j) > β) k p P(α p p Y i > β) i= k p (βp) α p EY i by Markov inequality i= (βp) α p k p q p Cnq p = o(n (/λ+δ/8) ) = o(p (+λδ/8) ) where the constant C is such that α p k p Cβnp. Hence by Borel-Cantelli lemma, sup F Np (x) F Ñp (x) 0 a.s. x Let ˆX p = ((ˆx L(i,j) )) i p, j n and ˆN p = np ( ˆX p ˆX p ni p ). 6

17 Using Lemma (i) and (ii), sup x F ˆN p (x) F Ñp (x) ( p p i= λ i ( ˆN p ) λ i (Ñp) ) 2 np 3 Tr( Xp X p + ˆX p ˆX p ) Tr ( E( Xp )(E( X p )). Using the moment condition, the condition on the truncation level, the condition α p k p = O(np), and an appropriate strong law of large numbers for independent random variables, it is easy to show that the above expression tends to 0 almost surely. We omit the tedious details. On the other hand, d 2 BL (F ˆN p,f Bp ) p Tr( ˆN p B p ) 2 2np 2 p ( n (ˆx 2 L(i,l) Eˆx2 L(i,l) )) 2 + 2np 2 i= l= Note that for every i, j, there exists α such that p ( n ( Eˆx 2 L(i,l) )) 2 = M + N say. 0 Eˆx 2 L(i,j) = E x2 αi( x α > ǫ p n /4 ) + (E x α I( x α > ǫ p n /4 )) 2 ǫ 2 pn /2(Ex4 α + E 2 x 2 α). (26) From (26), it is immediate that N n2 p ( sup 2np 2 nǫ 4 (Ex 4 α + E 2 x 2 α) ) 2 0 since ǫp p /4. p α Now we will deal with the first term, M. p ( n i= l=(ˆx 2 L(i,l) Eˆx2 L(i,l) ))2 = α where a α,b α,α 0. Obviously, i= l= a α (ˆx 2 α Eˆx2 α )2 + α =α b α,α (ˆx 2 α Eˆx2 α )(ˆx2 α Eˆx2 α ) = T p + T 2p (say) #{α Z : a α } k p and #{(α,α ) Z 2 : α α,b α,α } k 2 p. Also, a α α p for all α and b α,α Φ 2 α p for all α α. Hence, 4n 2 p 4 ET p 2 = p p 4n 2 p 4 a 2 αe(ˆx 2 α Eˆx 2 α) 4 + α p 4n 2 p 4 α =α a α a α E(ˆx 2 α Eˆx 2 α) 2 E(ˆx 2 α Eˆx2 α )2 p sup α p α 2 pk p 4n 2 p 4 sup E(ˆx 2 α Eˆx2 α )4 + α p α 2 pk p 4n 2 p 4(n/4 ǫ p ) 4 Ex 4 α + sup α p α 2 pkp 2 4n 2 p 4 sup E 2 (ˆx 2 α Eˆx2 α )2 α α 2 pk 2 p 4n 2 p 4 E2 x 4 α < where we use the fact that α p k p = O(np) and α p p. Thus, T p /(2np 2 ) 0 a.s. 7

18 It now remains to tackle T 2p. Let y α = ˆx 2 α Eˆx 2 α, α Z. Then {y α } are mean zero independent random variables. p 4n 2 p 4 ET 2 2p = p p 4n 2 p 4 E( α =α b α,α y α y α ) 2 4n 2 p 4 α =α b 2 α,α Ey2 α y2 α kp 2 sup α2 p α 4n 2 p 4 E2 x 2 α <. p Thus by Borel-Cantelli lemma, T 2p /(np 2 ) 0 almost surely and this completes the proof. Remark 2 (a) From results of Bai and Yin (988) [3], it is known that for the special case of sample covariance matrix, finite fourth moment is needed for the above approximation to work, and inter alia, for the existence of the LSD almost surely. Here the link function is more general, necessitating slightly higher moments. (b) If we carefully follow the above proof, finiteness of the 4( + /λ) + δ-th moment was only needed in proving that sup x F Np (x) F Ñp (x) a.s. 0. If we impose the weaker assumption sup α Ex 4 α <, then q p ǫ p /n for suitably chosen sequence {ǫ p } 0 and sup x F Np (x) F Ñp P (x) 0 holds and hence D(F Bp,F Np ) 0 in probability. Having approximated N p by B p, we now need to establish the behaviour of the moments of B p. This is done through a series of Lemma, finally leading to Theorem 8. In the subsequent discussion, we will use the following notation. Π (w) := {π Π(w) : π(2i 2) π(2i) i h}, Π (w) := {π Π (w) : π(2i 2) π(2i) i h}. Analogous to Π(w) and Π (w), we define, for several words w,w 2,...,w k, Π(w,w 2,...,w k ) := {(π,π 2,...,π k ) : w i [s] = w j [l] ξ πi (s) = ξ πj (l), i,j k}, Π (w,w 2,...,w k ) := {(π,π 2,...,π k ) : w i [k] = w j [l] ξ πi (k) = ξ πj (l), i,j k}. (Definition of ξ π has been given earlier.) Moreover, Π (w,w 2,...,w k ) := { (π,π 2,...,π k ) Π(w,w 2,...,w k ) : π i (0) π i (2),...,π i (2h 2) π i (2h), i k }. For Lemma 2-5, we will always assume that the link function L satisfies Assumption A(ii) and A. Lemma 2 Fix h and a matched word w of length 2h. Then we have where K h is some constant depending on h. w + +[ Π (w) K h p 2 ] n [ w 2 ], (27) 8

19 Proof of Lemma 2. Let k be the number of odd generating vertices, denoted by π(2i ),π(2i 2 ),...,π(2i k ) where i < i 2 <... < i k < h. We wish to emphasize that i k < h since {ξ π (2h ),ξ π (2h)} cannot be a matched edge as ξ π (2h ) ξ π (2h) by A. Recall the definition of Type I and Type II generating vertex given in Section 2. Let t be the number of Type I odd generating vertices. Since total number of generating vertices apart from π(0) is w, we have t [ w /2]. Now, fix a Type II generating vertex π(2i ). Also, suppose we have already made our choices for the vertices π(j), j < 2i which come before π(2i ). Since π(2i) is not a generating vertex, ξ π (2i) = ξ π (j) for some j < 2i. (28) (Note that since π(2i 2) π(2i), we cannot have ξ π (2i) = ξ π (2i ) by Assumption A ). Now that the value of ξ π (j) has been fixed, for each value of π(2i), there can be at most Φ many choices for π(2i ) such that (28) is satisfied. Thus, we can only have at most pφ many choices for the generating vertex π(2i ). In short, there are only O(p) possibilities for a Type II odd generating vertex to choose from. With the above crucial observation before us, it is now easy to conclude #even generating vertices+#type II vertices Π (w) = O(p n #Type I vertices w + +[ ) = O(p 2 ] n [ w 2 ] ). Lemma 3 (i) For every h even, p E Tr B h p w matched, w =h p h/2 n h/2 Π (w) 0. (ii) For every h odd, lim p p E TrB h p = 0. Proof of Lemma 3. Let ˆX π be as defined in (4) with x α replaced by ˆx α. From the fact that Eˆx α = 0 and B ii = 0 for all i, we have p E Tr Bp h = p (+h/2) n h/2 w matched EˆX π. π Π(w), π(2i 2) π(2i), i h Fix a matched word w of length 2h. It induces a partition on 2h L-values ξ π (),ξ π (2),ξ π (3),...,ξ π (2h) resulting in w many groups (partition blocks) where the values of ξ π within a group are same but across the group they are different. Let C k be the number of groups of size k. Clearly, C 2 + C C 2h = w and 2C 2 + 3C hC 2h = 2h. Note that Thus if π Π(w), sup α Eˆx 2 α = o() and sup E ˆx α k (ǫ p n /4 ) k 2 k 2. α E ˆx ξπ()ˆx ξπ(2)... ˆx ξπ(2h) (ǫ p n /4 ) 0.C 2+.C (2h 2).C 2h (ǫ p n /4 ) 2h 2 w. 9

20 Using Lemma 2, π Π(w), π(2i 2) π(2i) i h ( E ˆx ξπ()ˆx ξπ(2)... ˆx ξπ(2h) (ǫ p n /4 ) 2h 2 w w + +[ K h p 2 ] n [ w ]) 2 Case I. Either w < h or w = h with h odd. Then p +h/2 n h/2 Case II. If w = h with h even, then lim p p +h/2 n h/2 π Π(w), π(2i 2) π(2i) i h π Π(w), π(2i 2) π(2i) i h = (ǫ p ) 2h 2 w n h/2 ( w /2 [ w /2]) w + +[ p 2 ]. E ˆx ξπ()ˆx ξπ(2)... ˆx ξπ(2h) 0. (29) Eˆx ξπ()ˆx ξπ(2)... ˆx ξπ(2h) = lim p p +h/2 n h/2 Π (w) ( + o()) h = lim p p +h/2 n h/2 Π (w) = lim p p +h/2 Π nh/2 (w). This completes the proof. Fix jointly matched and cross-matched words (w,w 2,w 3,w 4 ) of length (2h) each. Let κ = total number of distinct letters in w,w 2,w 3 and w 4. (30) Lemma 4 For some constants C h depending on h, # { (π,π 2,π 3,π 4 ) Π(w,w 2,w 3,w 4 ) : π i (0) π i (2),...,π i (2h 2) π i (2h),i =,2,3,4 } { Ch p 2+2h n 2h if κ = 4h or 4h +κ 4+[ C h p 2 ] n [ κ 2 ] if κ 4h 2. (3) Proof of Lemma 4. Case I. κ = 4h or 4h. Since the circuits are cross-matched, upon reordering the circuits, making circular shift and counting anti-clockwise if necessary, we may assume, without loss of generality, ξ π (2h) does not match with any L-value in π when κ = 4h. Because of cross matching, when κ = 4h, we may further assume that ξ π2 (2h) does not match with any L-value in π or π 2. We first fix the values of the p-vertices π i (0), i 4, all of which are even generating vertices. Then we scan all the vertices from left to right, one circuit after another. We will then, as argued in Bryc, Dembo and Jiang (2006) [7], obtain a total (4h + 2) generating vertices instead of (4h + 4) generating vertices which we would have obtained by usual counting. In our dynamic counting, we scan the L-values in the following order: ξ π (),ξ π (2),ξ π (3),...,ξ π (2h),ξ π2 (),ξ π2 (2),...,ξ π2 (2h),ξ π3 (),...,ξ π4 (2h). 20

21 From the arguments given in Lemma 2, it is clear that for an odd vertex π i (2j ) to be Type I, both ξ πi (2j ) and ξ πi (2j) have to be the first appearances of two distinct L values. So, total number of Type I n-generating vertices is at most κ/2. Case II. κ 4h 2. We again apply the crucial fact that for an odd vertex π i (2j ) to be Type I, we need both ξ πj (2j ) and ξ πj (2j) to be the first appearances of two distinct L-values, while the circuits being scanned from left to right and one after another. Since there are exactly κ distinct L-values, the number of Type I odd vertices is not more than κ/2. Combining this with the fact that total number of generating vertices equals (κ + 4), we get the required bound. Lemma 5 For each fixed h, E [ p (Tr Bp h E(Tr Bh p ))] 4 <. p= Proof of Lemma 5. As argued earlier and from the fact that the diagonal elements of B p are all zero, we have E [ p (Tr Bp h E(Tr Bp)) h ] 4 = p 4 2h n 2h ( 4 E (ˆX πi EˆX πi ) ) where the outer sum is over all quadruples of words (w,w 2,w 3,w 4 ), each of length 2h and which are jointly matched and cross-matched. The inner sum is over all quadruples of circuits { (π,π 2,π 3,π 4 ) Π(w,w 2,w 3,w 4 ) : π i (0) π i (2),...,π i (2h 2) π i (2h),i =,2,3,4 }. Note that by definition of κ, κ 4h for any jointly matched quadruple of words (w,w 2,w 3,w 4 ) of total length 8h. Fix w,w 2,w 3,w 4, jointly matched and cross-matched. Case I. κ = 4h or 4h. Then the maximum power with which any ˆx α can occur in 4 ˆX i= πi is bounded by 4. But since sup α E(ˆx α ) 4 <, we immediately have E ( 4 i= (ˆX πi EˆX πi ) ) <. Thus, by Lemma 4, i= p 4 2h n 2h : κ {4h,4h} ( 4 E (ˆX πi EˆX πi ) ) = p 4 2h n 2h O(p 2+2h n 2h ) = O(p 2 ). i= Case II. Suppose now that κ = 4h k, k 2. Borrowing notation from Lemma 3, we have These two equations immediately give It is also easy to see that C 2 + C C 8h = κ and 2C 2 + 3C hC 8h = 8h. C 3 + 2C (8h 2)C 8h = 8h 2κ = 2k. C 2k+2+i = 0 i and C 5 + 2C (2k 2)C 2k+2 2k 2. Since the input variables are truncated at ǫ p n /4 and since sup α E(x 4 α) <, sup α E 4 i= ˆX πi E(ˆx α ) 4 n C 5 +2C (2k 2)C 2k+2 4 = O(n 2k 2 4 ). 2

22 Thus, by Lemma 4, for any k 2, p 4 2h n 2h : κ {4h k} ( 4 E (ˆX πi EˆX πi ) ) p 4 2h n 2h C h p i= 4h k+ 4+[ 2 ] n [ 4h k 2 ] O(n k 2 ) = O(p 4 2h p 2h 4h k+ 4+[ p 2 ] p [ 4h k 2 ] O(p k 2 ) = O(p 4 4h k (4+4h k)+ p 2 ) = O(p ( k 2 +) ). For a quick explanation of the first equality above, just note that total power of n in the previous expression is negative. We are now ready to summarize the results of Theorem 7 and Lemma 2 5 in the following theorem in Regime II. Theorem 8 Let p,n so that p/n 0 and Assumption R2 holds. Suppose L p satisfies Assumptions A(ii), A and B. Suppose {ǫ p } satisfying {ǫ p } 0 and ǫ p p /4 is appropriately chosen. Then (i) For every h even, p E TrB h p w matched, w =h (ii) For every h odd, lim p p E TrB h p = 0. (iii) For each h, p TrB h p Ep TrB h p a.s. 0. p (+h/2) n h/2 Π (w) 0. (iv) β 2h lim sup p w matched p (+h) n h Π (w), satisfies Carleman s condition. w =2h As a consequence, the sequence {F Np } is almost surely tight. Every subsequential limit is symmetric and sub Gaussian. The LSD of {F Np } exists almost surely, iff lim w matched p (+h) n h Π (w) exists. These give the (2h)th moment of the LSD. w =2h Below we will deal with two matrices with different link functions L () or L (2). The corresponding relevant quantities will now be denoted with added superscripts () and (2) respectively. Theorem 9 Let L () and L (2) be two link functions satisfying Assumptions A(ii), A and B and agreeing on the set {(i,j) : i p, j n, i < j}. Then, for each matched word w of length (4h) with w = 2h, Π () p h+ n h (w) Π (2) (w) 0. Hence, in Regime II under Assumption R2, F N(i) p,i =,2 have identical asymptotic behaviour. Proof of Theorem 9. Define for each link function L (i), i =,2 Γ (i) j Now, it is enough to prove that for a fixed j, := {π Π (i) (w) : π(2j + ) p}, j =,2,...,2h. p h+ n h Γ(i) j 0. 22

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

A characterization of trace zero symmetric nonnegative 5x5 matrices

A characterization of trace zero symmetric nonnegative 5x5 matrices A characterization of trace zero symmetric nonnegative 5x5 matrices Oren Spector June 1, 009 Abstract The problem of determining necessary and sufficient conditions for a set of real numbers to be the

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Notes on metric spaces

Notes on metric spaces Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

More information

4. How many integers between 2004 and 4002 are perfect squares?

4. How many integers between 2004 and 4002 are perfect squares? 5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH ZACHARY ABEL 1. Introduction In this survey we discuss properties of the Higman-Sims graph, which has 100 vertices, 1100 edges, and is 22 regular. In fact

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

The Bivariate Normal Distribution

The Bivariate Normal Distribution The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +... 6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

0.1 Phase Estimation Technique

0.1 Phase Estimation Technique Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Lecture L3 - Vectors, Matrices and Coordinate Transformations S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

Some probability and statistics

Some probability and statistics Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

Lecture 13: Martingales

Lecture 13: Martingales Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

Tiers, Preference Similarity, and the Limits on Stable Partners

Tiers, Preference Similarity, and the Limits on Stable Partners Tiers, Preference Similarity, and the Limits on Stable Partners KANDORI, Michihiro, KOJIMA, Fuhito, and YASUDA, Yosuke February 7, 2010 Preliminary and incomplete. Do not circulate. Abstract We consider

More information

Triangle deletion. Ernie Croot. February 3, 2010

Triangle deletion. Ernie Croot. February 3, 2010 Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)

More information

Separation Properties for Locally Convex Cones

Separation Properties for Locally Convex Cones Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces

More information

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

Math 55: Discrete Mathematics

Math 55: Discrete Mathematics Math 55: Discrete Mathematics UC Berkeley, Fall 2011 Homework # 5, due Wednesday, February 22 5.1.4 Let P (n) be the statement that 1 3 + 2 3 + + n 3 = (n(n + 1)/2) 2 for the positive integer n. a) What

More information

Lecture 13 - Basic Number Theory.

Lecture 13 - Basic Number Theory. Lecture 13 - Basic Number Theory. Boaz Barak March 22, 2010 Divisibility and primes Unless mentioned otherwise throughout this lecture all numbers are non-negative integers. We say that A divides B, denoted

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails 12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

More information

9.2 Summation Notation

9.2 Summation Notation 9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

More information

1 The Line vs Point Test

1 The Line vs Point Test 6.875 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Low Degree Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz Having seen a probabilistic verifier for linearity

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

ON SOME ANALOGUE OF THE GENERALIZED ALLOCATION SCHEME

ON SOME ANALOGUE OF THE GENERALIZED ALLOCATION SCHEME ON SOME ANALOGUE OF THE GENERALIZED ALLOCATION SCHEME Alexey Chuprunov Kazan State University, Russia István Fazekas University of Debrecen, Hungary 2012 Kolchin s generalized allocation scheme A law of

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited Physics 116A Winter 2011 The Matrix Elements of a 3 3 Orthogonal Matrix Revisited 1. Introduction In a class handout entitled, Three-Dimensional Proper and Improper Rotation Matrices, I provided a derivation

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

Spectra of Sample Covariance Matrices for Multiple Time Series

Spectra of Sample Covariance Matrices for Multiple Time Series Spectra of Sample Covariance Matrices for Multiple Time Series Reimer Kühn, Peter Sollich Disordered System Group, Department of Mathematics, King s College London VIIIth Brunel-Bielefeld Workshop on Random

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

Mathematical Induction

Mathematical Induction Mathematical Induction (Handout March 8, 01) The Principle of Mathematical Induction provides a means to prove infinitely many statements all at once The principle is logical rather than strictly mathematical,

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information