Chapter 2 Fundamentals of Statistical Analysis

Transcription

1 Chapter Fundamentals of Statistical Analysis To make this book self-contained, this chapter will review relevant mathematical concepts used in this book. We first review basic probability and statistical concepts used in this book. Then we introduce mathematic notations for statistical processes with multiple variable and variable reduction methods. We will then go through some statistical analysis approaches such as the MC method and the spectral stochastic method. Finally, we will discuss some fast techniques to compute some of random variables with log-normal distributions. 1 Basic Concepts in Probability Theory An understanding of probability theory is essential to statistical analysis. In this section, we will explain some basic concepts in probability theory [13] first. More details and other stochastic theories can be found in [13]. 1.1 Experiment, Sample Space, and Event Definition.1. A experiment is any process of observation or procedure that can be repeated (theoretically) an infinite number of times and has a well-defined set of possible outcomes. Definition.. A sample space is the set of all possible outcomes of an experiment. Definition.3. An event is a subset of the sample space of an experiment. Consider the following experiments as examples: Example 1. Tossing a coin. Sample space: S Dfhead or tailg or S Df0, 1g, where 0 represents a tail and 1 represents a head. R. Shen et al., Statistical Performance Analysis and Modeling Techniques for Nanometer VLSI Designs, DOI / , Springer Science+Business Media, LLC 01 15

2 16 Fundamentals of Statistical Analysis 1. Random Variable and Expectation Usually, we are interested in some value associated with a random event rather than the event itself. For example, in the experiment of tossing two dice, we only care about the sum of the two dice, not the outcome of each die. Definition.4. A random variable X on a sample space S is a real-valued function X W S! R. Definition.5. A discrete random variable is a random variable that takes only a finite or countably infinite number of values (arises from counting). Definition.6. A continuous random variable is a random variable whose set of assumed values is uncountable (arises from measurement). Let X be a random variable and let a R.Theevent X D a represents the set fs S j X.s/ D ag and the probability of this event is written as Pr.X D a/ D X Pr.s/: sswx.s/da Example. Continuous random variable. A CPU is picked randomly from a group of CPUs whose area should be 1 cm. Due to some error in the manufacture process, the area of a chip could vary from chip to chip in the range 0.9 cm to 1.05 cm, excluding the latter. Let X denote the area of a selected chip. Possible outcomes: 0:9 X < 1:05: Example 3. Refer to the previous example. The area of a selected chip is a continuous random variable. The following table gives the area in cm of 100 chips. It lists the observed values of the continuous random variable, the corresponding frequencies, and their probabilities. Area X Number of chips Pr.a X<b/ Total Definition.7. The expectation EŒX,or, of a discrete random variable X is EŒX D D X i i Pr.X D i/; where the sum is taken over all values in the range of X. If P i jij Pr.X D i/ converges, then the expectation is finite. Otherwise, the expectation is said to be unbounded. E.X/ is also called the mean value of the probability distribution.

3 1 Basic Concepts in Probability Theory Variance and Moments of Random Variable Theorem.1. Markov s inequality. For a random variable X that takes on only nonnegative values and for all a>0, we have Pr.X a/ EŒX a : Proof. Let X be a random variable such that X 0 and let a>0. Define a random variable I by I D ( 1; if X a, 0; otherwise, where EŒI D Pr.I D 1/ D Pr.X a/ and I X a : (.1) The expectations of both sides of (.1) are given by the inequality EŒI D Pr.X a/ E X D EŒX a a ; where we used Lemma.3. ut Definition.8. The kth moment of a random variable X is EŒX k.thevariance of X is VarŒX D E.X EŒX / D E X X EŒX C.EŒX / D EŒX EŒX EŒX C.EŒX / D EŒX.EŒX / ; and the standard deviation of X is defined as The area under each curve is 1..X/ D p VarŒX : Theorem.. Chebyshev s inequality. For any a > 0 and a random variable X, we have Pr.jX EŒX ja/ VarŒX a :

4 18 Fundamentals of Statistical Analysis Proof. Note that Pr.jX EŒX ja/ D Pr.X EŒX / a and the random variable.x EŒX / definition of variance to obtain Pr.X EŒX / a E.X EŒX / > 0. Use Markov s inequality and the a D VarŒX a as required. Corollary.1. For any t>1and a random variable X, we have ut Pr jx EŒX jt.x/ 1 t Pr jx EŒX jt EŒX VarŒX t.eœx / : Proof. The results follow from the definitions of variance and standard deviation and Chebyshev s inequality. ut 1.4 Distribution Functions Definition.9. A discrete probability distribution is a table (or a formula) listing all possible values that a discrete variable can take on, together with the associated probabilities. Definition.10. The function f.x/ is called a probability density function (PDF) for the continuous random variable X, if Z b a f.x/dx D Pr.a X b/ (.) for any values of a and b. That is to say, the area under the curve of f.x/between any two ordinates x D a and x D b is the probability that X lies between a and b. It is easy to see that the total area under the PDF curve bounded by the x-axis is equal to 1: Z 1 1 f.x/dx D 1: (.3)

5 1 Basic Concepts in Probability Theory 19 Definition.11. For a real-valuerandomvariablex, the probability distribution is completely characterized by its cumulative distribution function (CDF): F.x/ D Z x 1 f.t/dt D PrŒX x ; x R; (.4) which describes probabilities for a random variable to fall in the intervals of. 1;x. 1.5 Gaussian and Log-Normal Distributions Definition.1. A Gaussian distribution (also called normal distribution) is denoted as N.; /,where,asusual, identifies the mean and the variance. The PDF is defined as follows: f.xi ; / D / 1 p e.x : (.5) The CDF of the standard normal distribution is denoted with.x/ and can be computed as an integral of the PDF:.x/ D p 1 Z x e t = dt D 1 xp 1 C erf ; x R; (.6) 1 where erf is the complementary error function. Definition.13. If X is distributed normally with mean and variance,then the exponential of XY D exp.x/ follows log-normal distribution. Thatistosay, a log-normal distribution is a probability distribution of a random variable whose logarithm is normally distributed. The PDF and CDF of a log-normal distribution are as follows: 1 f.xi ; / D x p.lnx / e ; x > 0; (.7) F X.xI ; / D 1 erf lnx lnx p D : (.8) More details about the sum of multiple log-normal distribution is given in Sect. 4 of Chap..

6 0 Fundamentals of Statistical Analysis 1.6 Basic Concepts for Multiple Random Variables Definition.14. Two random variables X and Y are independent if Pr..X D x/ \.Y D y// D Pr.X D x/ Pr.Y D y/ for all x;y R. Furthermore, the random variables X 1 ;X ;:::;X k are mutually independent if for any subset I f1;;:::;kg and any values x i for i I,we have! \ Pr X i D x i D Y Pr.X i D x i /: ii ii Theorem.3. Linearity of expectations. Let X 1 ;X ;:::;X n be a finite collection of discrete random variables with finite expectations. Then " # X E X i D X i i EŒX i : Proof. We use induction on the number of random variables. For the base case, let X and Y be random variables. Use the law of total probability to get EŒX C Y D X i D X i C X i X.i C j/ Pr..X D i/\.y D j// j X i Pr..X D i/\.y D j// j X j Pr..X D i/\.y D j// j D X i i X j Pr..X D i/\.y D j// C X j j X i Pr..X D i/\.y D j// D X i i Pr.X D i/ C X j j Pr.Y D j/ D EŒX C EŒY : ut Linearity of expectations holds for any collection of random variables, even if they are not independent. Furthermore, if P 1 id1 EŒjX ij converges, then it can be shown that

7 1 Basic Concepts in Probability Theory 1 " 1 # X 1X E X i D EŒX i : id1 id1 Lemma.1. Let c be any constant and X a random variable. Then EŒcX D c EŒX : Proof. The case c D 0 is trivial. Suppose c 0.Then EŒcX D X i i Pr.cX D i/ D c X i.i=c/ Pr.X D i=c/ D c X k k Pr.X D k/ D c EŒX as required. If X and Y are two random variables, their covariance is ut Cov.X; Y / D EŒ.X EŒX /.Y EŒY / D EŒ.Y EŒY /.X EŒX / D Cov.Y; X/: Theorem.4. For any two random variables X and Y, we have VarŒX C Y D VarŒX C VarŒY C Cov.X; Y /: Proof. Use the linearity of expectations, and the definitions of variance and covariance, to obtain VarŒX C Y D E.X C Y EŒX C Y / D E.X C Y EŒX EŒY / D E.X EŒX / C.Y EŒY / C.X EŒX /.Y EŒY / D E.X EŒX / C E.Y EŒY / C EŒ.X EŒX /.Y EŒY / D VarŒX C VarŒY C Cov.X; Y / as required. ut

8 Fundamentals of Statistical Analysis Theorem.4 can be extended to a sum of any finite number of random variables. For a collection X 1 ;:::;X n of random variables, it can be shown that " # X Var X i D X i i VarŒX i C X i X Cov.X i ;X j /: Theorem.5. For any two independent random variables X and Y, we have j>i EŒX Y D EŒX EŒY : Proof. Let the indices i and j assume all values in the ranges of X and Y, respectively. As X and Y are independent random variables, then EŒX Y D X X ij Pr..X D i/\.y D j// i j D X X ij Pr.X D i/ Pr.Y D j/ i j " # 3 X D i Pr.X D i/ 4 X j Pr.Y D j/ 5 i D EŒX EŒY j as required. Corollary.. For any independent random variables X and Y, we have ut Cov.X; Y / D 0 and VarŒX C Y D VarŒX C VarŒY : Proof. As X and Y are independent, then so are X EŒX and Y EŒY.Forany random variable Z, wehave EŒZ EŒZ D EŒZ EŒEŒZ D 0: Using Theorem.5, the covariance of X and Y is Cov.X; Y / D EŒ.X EŒX /.Y EŒY / D EŒ.X EŒX / EŒ.Y EŒY / D 0:

9 Multiple Random Variables and Variable Reduction 3 Conclude via the latter equation and Theorem.4 that VarŒX C Y D VarŒX C VarŒY C Cov.X; Y / D VarŒX C VarŒY as required. ut Definition.15. For a collection of random variables, X D X 1 ;:::;X n, the covariance matrix nn is defined as 0 1 Var.X 1 / Cov.X 1 ;X / ::: Cov.X 1 ;X n / Cov.X ;X 1 / Var.X 1 / ::: Cov.X ;X n / D : : ::: : B Cov.X n 1 ;X 1 / Cov.X n 1 ;X / ::: Cov.X n 1 ;X n / A Cov.X n ;X 1 / Cov.X n ;X / ::: Var.X n / When X 1 ;:::;X n are mutually independent random variables, it can be shown by induction that " # X Var X i D X VarŒX i : i i And the covariance matrix is a diagonal matrix in this case. Multiple Random Variables and Variable Reduction.1 Components of Covariance in Process Variation In general, process variation can be classified into two categories [13]: inter-die and intra-die. Inter-die variations are variations from die to die, while intra-die variations correspond to variability within a single chip. Inter-die variations are global variables and, hence, affect all the devices on a chip in the similar fashion. For example, it can cause channel lengths of all the devices on the same chip smaller. Intra-die variations may affect devices differently on the same chip. For example, it can cause some devices with smaller gate oxide thicknesses and others with larger gate oxide thicknesses. The intra-die variations may exhibit spatial correlation. For example, it is more likely for devices located close to each other to have similar characteristics.

10 4 Fundamentals of Statistical Analysis Fig..1 Grid-based model for spatial correlations Gate 1 Gate Gate 3 Gate 5 Gate 4 In general, we can model parameter variation as follows, ı total D ı inter C ı intra ; (.9) where ı inter and ı intra represent the inter-die variation and intra-die variation, respectively. In general [13, 95, 169], ı inter and ı intra can be modeled as Gaussian random variables with normal distribution. In this chapter, we will discuss both Gaussian and non-gaussian cases. Note that due to global effect of inter-die variation, single random variable ı inter is used for all gates/grids in one chip. For ı intra, the value of parameter p located at.x; y/ can be modeled as normally distributed random variable [101] dependent on location: p D p C ı x C ı y C ; (.10) where p is the mean value (nominal design parameter value) at.0; 0/ and ı x and ı y stand for gradients of the parameter indicating the spatial variations of p along x and y directions, respectively. represents the random intra-chip variation. Due to spatial correlations in the intra-chip variation, the vector of all random components across the chip has a correlated multivariate normal distribution, N.0; /, where is the covariance matrix of the spatially correlated parameters. A grid-based method is introduced by [13] for the consideration of correlation. In the grid-based method, the intra-die spatial correlation of parameters is partitioned into p n row p n col D n grids. Since devices close to each other are more likely to have similar characteristics than those placed far away, grid-based methods assume a perfect correlation among the devices in the same grid, high correlations among those in close grids, and low to zero correlations in faraway grids. For example, in Fig..1, Gate 1 and Gate have sizes shown to be exaggeratedly large. They are located in the same grid square, and hence, their parameter variations such

11 Multiple Random Variables and Variable Reduction 5 as the variations of their gate channel length are assumed to be always identical. Gate 1 and Gate 3 lie in neighboring grids, and hence, their parameter variations are not identical but highly correlated due to their spatial proximity. For example, when Gate 1 has a larger than nominal gate channel length, Gate 3 is more likely to have a larger than nominal gate channel length. On the other hand, Gate 1 and Gate 4 are far away from each other; their parameters can be assumed as weakly correlated or uncorrelated. For example, when Gate 1 has a larger than nominal gate channel length, the gate channel length for Gate 4 may be either larger or smaller than nominal. With the grid-based model, we can use a single random variable p.x; y/ to model a parameter variation in a single grid at location.x; y/. As a result, n random variables are needed for each type of parameter, where each represents the value of a parameterin one of the n grids. In addition, we assume that correlation only exists among the same type of parameters in different grids. Note that this assumption is not critical and can easily be removed. For example, gate length L for transistors in the ith grid is correlated with those in nearby grids, but is uncorrelated with other parameters such as gate oxide thickness T ox in any grid including the ith grid itself. For each type of parameter, a correlation matrix of size n n represents the spatial correlation of this parameter. Notice that the number of grid partitions needed is determined by the process, not the circuit. So we can apply the same correlation model to different designs under the same process.. Random Variable Decoupling and Reduction Due to correlation, a large number of random variables involved in VLSI design can be reduced. After the random variable decoupling via correlation, one may further reduce the cost of statistical analysis by the spectral stochastic method as discussed in Sect. 3. Since the random variables are correlated, this correlation should be removed before using the spectral stochastic method. In this part, we first present the theoretical basis for decoupling the correlation of random variables. Proposition.1. For a set of zero-mean Gaussian-distributed variables whose covariance matrix is, if there is a matrix L satisfying D LL T,then can be represented by a set of independent standard normal distributed variables as D L. Proof. According to the characteristics of normal distribution, linear transformation does not impact on the zero mean of the variables and yield another normal distribution. Thus, we only need to prove the covariance matrix remains unchanged during the transformation. According to the definition of covariance, cov.l/ D E L.L/ T D LE T L T : (.11)

12 6 Fundamentals of Statistical Analysis Since is subject to standard normal distribution, LE T L T D LL T D n: (.1).3 Principle Factor Analysis Technique Note that the solution for decoupling is not unique. For example, Cholesky decomposition can be used to seek L since the covariance matrix is always a semipositive definite matrix. However, Cholesky decomposition cannot reduce the number of variables. PFA [74] can substitute Cholesky decomposition when variable reduction is needed. Eigendecomposition on the covariance matrix yields D LL T ; L D p 1 e 1 ; :::; p n e n ; (.13) where f i g are eigenvalues in order of descending magnitude, and fe i g are corresponding eigenvectors. PFA reduces the number of components in by truncating L using the first k items. The error of PFA can be controlled by k: err D np i idkc1 ; (.14) np i id1 where bigger k leads to a more accurate result. PFA is efficient, especially when the correlation length is large. In our experiments, we set the correlation length being eight times the width of wires. As a result, PFA can reduce the number of variables from 40 to 14 with an error of about 1% in an example with 0 parallel wires..4 Weighted PFA Technique One idea is to consider the importance of the outputs during the reduction process when using PFA. Recently, the weighted PFA (wpfa) technique has been used [04] to obtain variable reduction efficiency. If a weight is defined for each physical variable i, to reflect its impact on the output, then a set of new variables are formed: D W; (.15)

13 Multiple Random Variables and Variable Reduction 7 where W D diag.w 1 ; w ; :::; w n / is a diagonal matrix of weights. As a result, the covariance matrix of,. / now contains the weight information, and performing PFA on. / leads to the weighted variable reduction. Specifically, we have. / D E W.W/ T D W./W T ; (.16) and denote its eigenvalues and eigenvectors by i and ei. Then, the variables can be approximated by the linear combination of a set of independent dominant variables : kx q D W 1 W 1 i e i i : (.17) The error controlling process is similar to (.14) but uses the weighted eigenvalues i. id1.5 Principal Component Analysis Technique We first briefly review the concept of principal component analysis (PCA), which is used here to transform the random variables with correlation to uncorrelated random variables [75]. Suppose that x is a vector of n random variables, x D Œx 1 ;x ; :::; x n T, with covariance matrix and mean vector x D Œ x1 ; x ; :::; xn. To find the orthogonal random variables, we first calculate the eigenvalue and corresponding eigenvector. Then, by ordering the eigenvectors in descending order eigenvalues, the orthogonal matrix A will be obtained. Here, A is expressed as A D e1 T ;et ; :::; T et n ; (.18) where e i is the corresponding eigenvector to eigenvalue i, which satisfies i e i D e i ;i D 1; ; :::; n; (.19) and i < i 1 ;id ; 3; :::; n: (.0) With A, we can perform the transformation to get orthogonal random variables y, y D Œy 1 ;y ; :::; y n T by using y D A.x x /; (.1)

14 8 Fundamentals of Statistical Analysis where y i is a random variable with Gaussian distribution. The mean, yi,is0and the standard deviation, yi,is p i on the condition that [75] Here, because of the orthogonal property of matrix A, e T i e i D 1; i D 1; ; :::; n: (.) A 1 D A T : (.3) To reconstruct the original random variables, we use the following equation: x D A T y C x : (.4) 3 Statistical Analysis Approaches 3.1 Monte Carlo Method Monte Carlo techniques [41] are usually used to estimate the value of a definite, finite-dimensional integral of the form Z G D g.x/f.x/dx; (.5) S where S is a finite domain and f.x/ is a PDF over X, i.e., f.x/ 0 for all X and R S f.x/dx D 1. We can accomplish the MC estimation for the value of G by drawing a set of independent samples X 1 ;X ; :::; X MC from f.x/and by applying XMC G MC D.1=MC / g.x i /: (.6) The estimator G MC above is a random variable. Its mean value is the integral G to estimate, i.e., E.G MC / D G, making it an unbiased estimator. The variance of G MC is Var.G MC / D =MC,where is the variance of the random variable g.x/ given by Z D g.x/f.x/dx G : (.7) S id1

15 3 Statistical Analysis Approaches 9 We can use the standard deviation of G MC to assess its accuracy in estimating G. If the sample number MC is sufficiently large, then by the Central Limit Theorem, has an approximate standard normal distribution (N.0; 1/). Hence, G MC G = p MC P G 1:96p G MC G C 1:96p 0:95; (.8) MC MC where Phis the probability measure. Equation i (.8)showsthatG MC will be in the interval G 1:96 p MC ;GC 1:96 p MC with 95% confidence. Thus, one can use the error measure jerrorj p (.9) MC in order to assess the accuracy of the estimator. 3. Spectral Stochastic Method Using Stochastic Orthogonal Polynomial Chaos One recent advance in fast statistical analysis is to apply stochastic OPC [187] to the nanometer-scale integrated circuit analysis. Based on the Askey scheme [196], any stochastic random variable can be represented by OPC, and the random variable with different probability distribution type is associated with different types of orthogonal polynomials. Hermite polynomial chaos (Hermite PC or HPC) utilizes a series of orthogonal polynomials (with respect to the Gaussian distribution) to facilitate stochastic analysis [197]. These polynomials are used as the orthogonal base to decompose a random process in a similar way that sine and cosine functions are used to decompose a periodic signal in a Fourier series expansion. Note that for the Gaussian and log-normal distributions, Hermite polynomial is the best choice as they lead to exponential convergence rate [45]. For non-gaussian and non-lognormal distributions, there are other orthogonal polynomials such as Legendre for uniform distribution, Charlier for Poisson distribution, and Krawtchouk for binomial distribution [44, 187]. For a random variable y./ with limited variance, where D Œ 1 ; ; ::: n is a vector of zero-mean orthogonal Gaussian random variables, the random variable can be approximated by truncated Hermite PC expansion as follows [45]: y./ D PX kd0 a k Hk n./; (.30)

16 30 Fundamentals of Statistical Analysis where n is the number of independent random variables, Hk n./ is n-dimensional Hermite polynomials, and a k are the deterministic coefficients. The number of terms P is given by px.n 1 C k/š P D ; (.31) kš.n 1/Š kd0 where p is the order of the Hermite PC. Similarly, a random process v.t; / with limited variance can be approximated as v.t; / D PX kd0 a k Hk n./: (.3) If only one random variable/process is considered, the one-dimensional Hermite polynomials are expressed as follows: H 1 0./ D 1; H 1 1./ D ;H1./ D 1; H 1 3./ D 3 3; ::: : (.33) Hermite polynomials are orthogonal with respect to Gaussian weighted expectation (the superscript n is dropped for simple notation): hh i./; H j./i DhH i./iı ij ; (.34) where ı ij follow: is the Kronecker delta and h; i denotes an inner product defined as Z 1 hf./; g./i D p f./g./e 1 T d: (.35)./ n Similar to Fourier series, the coefficient a k for random variable y and a k.t/ for random process v.t/ can be found by a projection operation onto the HPC basis: a k D hy./; H k./i hh k./i ; (.36) a k.t/ D hv.t; /; H k./i hh k./i ; 8k f0; :::; P g: (.37) Once we obtain the Hermite PC, we can calculate the mean and variance of random variable y./ by one-time analysis as (one Gaussian variable case): E.y.// D y 0 Var.y.// D y1 Var. 1/ C y.t/var 1 1 D y1 C y : (.38)

17 3 Statistical Analysis Approaches 31 Similarly, for random process v.t; / (one Gaussian variable case), the mean and variance are as follows: E.v.t; // D v 0.t/ Var.v.t; // D v 1.t/Var. 1/ C v.t/var 1 1 D v 1.t/ C v.t/: (.39) One critical problem remains so far is how to obtain the coefficients of Hermite PC in (.36) and(.37) efficiently. There are two kinds of techniques to calculate the coefficients of Hermite PC in (.36) and(.37), which are collocation-based spectral stochastic method and Galerkin-based spectral stochastic method. In short, we classify in the later part of the book as collocation-based and Galerkin-based methods. 3.3 Collocation-Based Spectral Stochastic Method The collocation method is mainly based on computing the definite integral of a function [70]. The Gaussian quadrature is the commonly used method. We can compute the coefficients a k and a k.t/ in (.36)and (.37), respectively. We review this method by using the Hermite polynomial shown below. Our objective is to determine the numerical solution of the integral equation hy./; H j./i (x can be a random variable or random process). In our problem, this is one-dimensional numerical quadrature problem based on Hermite polynomials [70]. Thus, we have hy./; H k./i Dp 1 Z y./h k./e 1 d./ PX y. i /H i. i /w i : (.40) id0 Here we have only a single random variable. i and w i are Gaussian-Hermite quadrature abscissas (quadrature points) and weights. The quadrature rule states that if we select the roots of the P th Hermite polynomial as the quadrature points, the quadrature is exact for all polynomials of degree P 1 or less for (.40). This is called (P 1)-level accuracy of the Gaussian-Hermite quadrature. For multiple random variables, a multidimensional quadrature is required. The traditional way of computing a multidimensional quadrature is to use a direct tensor product based on one-dimensional Gaussian Hermite quadrature abscissas

18 3 Fundamentals of Statistical Analysis and weights [16]. With this method, the number of quadrature points needed for n dimensions at level P is about.p C 1/ n, which is well known as the curse of dimensionality. Smolyak quadrature [16], also known as sparse grid quadrature, is used as an efficient method to reduce the number of quadrature points. Let us define a onedimensional sparse grid quadrature point set 1 P D f i ; ; :::; P g, which uses P C 1 points to achieve degree P C 1 of exactness. The sparse grid for an n- dimensional quadrature at degree P chooses points from the following set: n P D [. i 1 1 i n 1 /; (.41) P C1jijP Cn where jij D P n j D1 i j. The corresponding weight is w i 1:::i n j i1 :::j in D. 1/ P Cn jij n 1 n C P jij m w i m jim ; (.4) n 1 where is the combinatorial number and w is the weight for the n C P jij corresponding quadrature points. It has been shown that interpolation on a Smolyak grid ensures a bound for the mean-square error [16] je P jdo N r P.logN P /.rc1/.n 1/ ; where N P is the number of quadrature points and r is the order of the maximum derivative thatexists for the delay function. The number of quadrature points increases as O. n P.P /Š It can be shown that a sparse grid at least with level P is required for an order P representation. The reason is that the approximation contains order P polynomials for both y./ and H j./. Thus, there exists y./h j./ with order P,which requires a sparse grid of at least level P with an exactness degree of P C 1. Therefore, level 1 and level sparse grids are required for linear and quadratic models, respectively. The number of quadrature points is about n for the linear model and n for the quadratic model. The computational cost is about the same as the Taylor-conversion method, while keeping the accuracy of homogeneous chaos expansion. In addition to the sparse grid technique, we can also employ several accelerating techniques. Firstly, when n is too small, the number of quadrature points for sparse grid may be larger than that of direct tensor product of a Gaussian quadrature. For example, if there are only two variables, the number is 5 and 15 for level 1 and sparse grid, compared to 4 and 9 for direct tensor product. In this case, the sparse grid will not be used. Secondly, the set of quadrature points (.41) may contain the same points with different weights. For example, the level sparse grid for three variables contains four instances of the point (0,0,0). Combining these points by summing the weights reduces the computational cost of y. i /.

19 4 Sum of Log-Normal Random Variables Galerkin-Based Spectral Stochastic Method The Galerkin-based method is based on the principle of orthogonality that the best approximation of y./ is obtained when the error,./, definedas is orthogonal to the approximation. That is,./ D y./ y (.43) <./; H k./ >D 0; k D 0;1;:::;P; (.44) where H k./ are Hermite polynomials. In this way, we have transformed the stochastic analysis process into a deterministic form, whereas we only need to compute the corresponding coefficients of the Hermite PC. For the illustration purpose, considering two Gaussian variable D Œ 1 ;,we assume that the charge vector in panels can be written as a second-order (p D ) Hermite PC, we have y./ D y 0 C y 1 1 C y C y 3.1 1/ C y 4. 1/ C y 5. 1 /; (.45) which will be solved by (.44). Once the Hermite PC of y./ is known, the mean and variance of y./ can be evaluated trivially. Given an example, for one random variable, the mean and variance are calculated as E.y.// D y 0 ; Var.y.// D y1 Var./ C y Var. 1/ D y1 C y : (.46) In consideration of correlations among random variables, we apply PCA Sect..5 to transform the correlated variables into a set of independent variables. 4 Sum of Log-Normal Random Variables Leakage current distribution is usually with log-normal distribution. Due to the exponential convergence rate, Hermite PC can be used to represent log-normal variables and the sum of log-normal variables [109].

20 34 Fundamentals of Statistical Analysis 4.1 Hermite PC Representation of Log-Normal Variables Let g./ be the Gaussian random variable and l./ be the random variable obtained by taking the exponential of g./, l./ D e g./ ;g./ D ln.l.//: (.47) For a log-normal random variable I l, let the mean and the variance of g./ as g and g, then the mean and variance of l./ are g C g l D e ; (.48) l D e. gcg/ h i e g 1 ; (.49) respectively. A general Gaussian variable g./ can always be represented in the following affine form: nx g./ D i g i ; (.50) id0 where i are orthogonal Gaussian variables. That is, h i j idı ij, h i id0, and 0 D 1 and g i is the coefficient of the individual Gaussian variables. Note that such form can always be obtained by using Karhunen Loeve orthogonal expansion method [45]. In our problem, we need to represent the log-normal random variable l./ by using the Hermite PC expansion form: where l 0 D exp g C g l./. Therefore, we have l./ D PX kd0 l k Hk n./; (.51). To find the other coefficients, we can apply (.36) on l k.t/ D hl.t;/; H k./i hh k./i ; 8k f0; :::; P g: (.5) As was shown in [44], l./ can be written as l./ D hh k. g/i hh i./i D exp4 g C 1 nx j D1 where n is the number of independent Gaussian random variables. g j 3 5; (.53)

21 4 Sum of Log-Normal Random Variables 35 The log-normal process can then be written as l./ D l 0 C nx i g i C id1 where g i is defined in (.50). nx id1 j D1 1 nx. i j ı ij / h. i j ı ij / i g ig j CA ; (.54) 4. Hermite PC Representation with One Gaussian Variable In this case, D Œ 1. For the second-order Hermite PC (P D /, following (.54), we have l./ D l 0 1 C g 1 C 1 g 1 1 : (.55) Hence, the desired Hermite PC coefficients, I 0;1;, can be expressed as l 0 ;l 0 g, and 1 l 0 g, respectively. 4.3 Hermite PC Representation of Two and More Gaussian Variables For two random variables (n D ), assume that D Œ 1 ; is a normalized uncorrelated Gaussian random variable vector that represents random variable g./: g./ D g C 1 1 C : (.56) Note that h. i j ı ij / idhi j idh i ih j id1: Therefore, the expansion of the log-normal random variables using second-order Hermite PCs can be expressed as l./ D l 0 1 C 1 1 C C / C. 1/ C 1 1 ; (.57) where l D l 0 D exp g C 1 1 C 1 :

22 36 Fundamentals of Statistical Analysis Hence, the desired Hermite PC coefficients, I 0;1;;3;4;5, can be expressed as l 0 ;l 0 1 ;l 0 ; 1 l 0 1 ; 1 l 0,andl 0 1, respectively. Similarly, for four Gaussian random variables, assume that D Œ 1 ; ; 3 ; 4 is a normalized, uncorrelated Gaussian random variable vector. The random variable g./ can be expressed as g D g C 4X i i : (.58) As a result, the log-normal random variable l./ can be expressed as l./ D l 0 C 4X i i C id1 4X id1 id1 1 i 1 i C 4X id1 j D1 1 4X i j i j CA ; (.59) where! l D l 0 D exp 0 C 1 4X i : id1 Hence, the desired Hermite PC coefficients can be expressed using the equation (.59) above. 5 Summary The discussion of preliminary in probability theory is required to understanding statistical analysis and modeling for VLSI design in nanometer region. In this chapter, we introduced the relevant fundamentals employed in statistical analysis. First, we presented the basic concepts and components such as mean, variance, and covariance due to process variation. After that, we reviewed techniques for the statistical variable decoupling and reduction in PFA/PCA analysis. We further discussed the spectral stochastic analysis required for extraction, mismatch, and yield analysis used in the later chapters. We also discussed different methods to estimate the sum of random variables required for leakage current estimation.

23