Adaptive Filtering and Change Detection for Streaming Data

Size: px
Start display at page:

Download "Adaptive Filtering and Change Detection for Streaming Data"

Transcription

1 Adaptive Filtering and Change Detection for Streaming Data Dean Bodenham Imperial College London 12 May, 2012 Supervisors: N. Adams and N. Heard

2 Abstract Detecting change in streaming data provides two significant challenges. First, the large amount of data requires our detection algorithms to be online, and second, the initial and final underlying distributions of the data are often unknown, as is the point where the distributions change from one to the other. Standard techniques customarily involve certain tunable parameters that control the importance of false alarms over delayed detections or entirely missed changes. However, it is often not clear what values these control parameters should be. In this report we introduce a framework utilizing exponential forgetting factors in an attempt to reduce the dependence on control parameters. We compare these methods to two well-known sequential detection methods using simulation studies.

3 Contents 1 Introduction Report structure Change detection Introduction Performance Measures Control charts CUSUM EWMA Estimating the parameters of the known distribution Restarting an algorithm after a changepoint Discussion Adaptive Filtering Introduction Mean and Variance with a forgetting factor The forgetting factor mean x N,λ The purpose of the forgetting factor λ The value of λ Expectation and variance of x N,λ The forgetting factor variance s 2 N,λ Expectation and variance of s 2 N,λ Recursive definitions of x N,λ and s 2 N,λ Optimal forgetting factor λ The data Definition of optimal λ Solving for optimal λ Results for optimal λ: Example Discussion Adaptive Filtering Change Detection Introduction Change Detection of using Chebyshev s inequality Chebyshev s inequality Prediction intervals: notation Change detection: assuming normality Change detection for x N,λ assuming normality Change detection for s 2 N,λ assuming normality Numerically evaluating the distribution function of s 2 N,λ i

4 4.3.4 A prediction interval for s 2 N,λ Covariance of the chi-squared random variables Box s Theorem Dependent sum to independent sum The cost in accuracy of assuming independence Discussion Adaptive forgetting factor Introduction Adaptive forgetting factors Updating adaptive Choosing a cost function C N The derivative with respect to adaptive Gradient descent Separate forgetting factors for mean and variance Optimal adaptive Results for optimal : Example Comparison of λ 51,N and 51,N Discussion Results Description of simulation experiments The data Procedure for calculating ARL 0 and ARL Control parameters Results Change detection of the mean Change detection of the variance Discussion Conclusion Future Work Derivations The weight The derivation of u N,λ Recursive derivation of u N, Non-recursive definition Recursive definition Derivation of v N,λ K2 and K4 derivations Summary of equations The validity of assuming a zero mean The derivation of Var[s 2 N,λ The derivation of σ 2 k Covariance calculation z k are linearly dependent Proof that covariance matrix is positive semi-definite ii

5 1.10 Optimal fixed λ Non-recursive forms of equations Recursive definitions of N, and Ω N, Adaptive λ Gradient of adaptive m N,λ iii

6 Chapter 1 Introduction There are many real-world problems that require the sequential detection of a change in a process. In its simplest form, one attempts to detect a change in the mean of a sequence, where the change could be either abrupt or gradual, as shown in Figure 1.1. x_t x_t observation number t observation number t Figure 1.1: An abrupt change in the mean left, and a gradual change in the mean right. A data stream consists of a potentially unending sequence of data tuples. Streaming data has two challenging characteristics: first, a high arrival rate, and second, the potential to behave unexpectedly. An example of the first problem is provided by plastic card fraud detection [37, where one large bank processes 6000 plastic card transactions per second. Such a high frequency of data places a premium on efficient computation, which should ideally be sequential. Moreover, fraud detection in this context needs to be online, in the sense that decisions are required as transactions occur. The second problem, of unexpected behaviour, can manifest in many ways, since data streams do not change in a predictable manner. Returning to the example of fraud detection, it is known that fraudsters change their strategy over time, in an attempt to be unpredictable while testing bank defences. Numerous other applications require streaming data analysis [11, [27 [10. This report is concerned with change detection methods for streaming data. The character of streaming data demands that such methods be efficient in terms of memory and computation and operate sequentially. An additional possibility is that the change detector will operate on the stream unsupervised in such a way that when a change is flagged the detector will automatically restart. Indeed, the final chapter introduces a framework for unsupervised restarting that will be further explored in future work. However, restarting raises further issues not directly addressed in this report, which primarily focuses on sequential univariate change detection problems, where some or all of the characteristics of the pre-change distributions are known. 1

7 Traditional approaches to sequential change detection, such as CUSUM, often have control parameters. The literature does not provide especially good guidance on the setting of these parameters. For an unsupervised change detector operating on a data stream, the difficulty of setting control parameters becomes more important. Adaptive filtering [18 provides estimation methodology for keeping an estimator close to a time-varying process. A nice feature of adaptive filtering, in theory, is that the parameter controlling the time-varying component of estimation can be determined in a principled and efficient manner. The novel work in this report involves exploring the potential of adaptive filters for change detection. Specifically, we try to use adaptive filters to handle control parameters. While there is also the possibility of our adaptive filters being used to deal with autocorrelated or slowly drifting processes, this will only be explored in future work. 1.1 Report structure Chapter 2 gives an introduction to change detection. We start by describing performance measures ARL 0 and ARL 1, and then discuss two well-known algorithms, CUSUM and EWMA. Finally, we give a brief review of the literature comparing CUSUM and EWMA. Chapter 3 gives a description of adaptive filtering using an exponential forgetting factor. We define the forgetting factor mean and the forgetting factor variance. A discussion of the optimal value for the forgetting factor completes the chapter. Chapter 4 combines the previous two chapters, and shows how we can perform change detection using forgetting factors. Chapter 5 introduces the idea of a variable adaptive forgetting factor in an attempt to improve on the performance of the the fixed forgetting factor change detectors. Experiments were performed to compare the different algorithms using the performance measures discussed in Chapter 2. These are described in Chapter 6, and the results are discussed. Finally, Chapter 7 summarises our results and discusses potential future work. Several results and derivations are used in Chapters 3, 4 and 5. These have been included in Appendix 1. 2

8 Chapter 2 Change detection 2.1 Introduction The goal of a change detection algorithm is to detect a change in the probability distribution of a sequence of random observations. One example is the following situation: suppose we have observations x 1, x 2,..., x N generated from the random variables X 1, X 2,..., X N which are independent and distributed according to distribution D 0 or D 1, such that X 1, X 2..., X τ D 0, 2.1 X τ+1,..., X N D 1, 2.2 where the changepoint τ is unknown. Two statistics that are commonly monitored for change are the mean and variance. Figure 2.1 shows two graphs illustrating a change in the mean and a change in the variance of a sequence of observations x_t 2 x_t observation number t observation number t Figure 2.1: A change in the mean left, and a change in the variance right. The vertical lines denote the true change points. The data are generated from a N0, 1 distribution, with an increase in the mean left/variance right at time τ 50 and with N 100. However, this particular example assumes we have a fixed data set of size N, with only one change point τ, and know both the pre-change and post-change distributions and perhaps the parameters of those distributions. Here N is not very large N 100, and all the observations can be stored and then analysed in order to detect τ with an offline algorithm. Standard references for online sequential change detection and change detection with adaptive filtering are [3 and [15. In recent years, the problem of detecting changes in data streams has arisen in areas such as astronomy [11, IP traffic analysis [27 and finance [10. A data stream is a potentially unending sequence of ordered data points x 1, x 2,... generated from random variables 3

9 X 1, X 2,..., that may or may not be infinite. Detecting changes in a data stream presents a few challenges: 1. the amount of observations may be very large, making it impractical to store and then analyse the data in an offline manner, 2. the data may be arriving at a very high rate, requiring the change-detection algorithms to be computationally efficient, 3. the pre-change and/or post-change distributions are usually unknown, or at least their parameters may be unknown, 4. there may be multiple change points, requiring us to consider restarting the algorithm once a change has been detected. This is further discussed in Section 2.5. The potentially large amount of data forces us to abandon attempts at offline analysis, and instead we only consider online change detection algorithms. In this report we shall assume that the distributions and pre-change parameters are known, but the post-change parameters are unknown. Our main method of interest uses estimation with a forgetting factor, as defined in Section 3.2 and applied to change detection in Chapter 4. We shall also review two well-known algorithms, CUSUM and EWMA, and compare the performance of these algorithms to our forgetting factor algorithms. In order to compare the algorithms we need some measure of performance, which we discuss in the next section. 2.2 Performance Measures The perfect change detection algorithm would not detect a change until one has occurred, and when a change does occur, it would detect that change immediately. However, this ideal will never be achieved due to stochastic variation. In practice there will be times when an algorithm detects a change when none has occurred a false alarm, and when a change actually does occur, there will always be some delay in that change being detected. This gives rise to two standard performance measures, ARL 0 and ARL 1, where ARL stands for average run length. We define the ARL 0 of an algorithm to be the average time between false alarms raised by that algorithm, while we define the ARL 1 to be the average delay between a change occuring and that change being detected. We can define these more precisely as follows: let the data stream be defined as in Equations 2.1 and 2.2, so that the changepoint is at time τ. Let τ denote the time when the change is detected. Then we can define these measures as ARL 0 E[ τ X 1, X 2, D 0, ARL 1 E[ τ τ X 1,..., X τ D 0 ; X τ+1,..., X τ D 1. Ideally we would like our algorithms to have a high ARL 0 and a low ARL 1. However, tuning an algorithm s parameters to achieve a desirable value for one of these measures will have a negative effect on the other measure. We will revisit this topic in Chapter 5. 4

10 2.3 Control charts The idea of a control chart was first described in [35, with the original motivation being the detection of change in manufacturing processes for the purposes of quality control. A control chart consists of points z 1, z 2,... representing a statistic and control limits a, b, with a < b. When z k a, b we say the process is in control, and when z k a, b we say that the process is out of control. Note that when we want to represent a sequence of statistics we use z k, to distinguish from observations x k. We call a the upper control limit UCL and b the lower control limit LCL. We call τ the detected changepoint of the data stream if z τ a, b, but z t a, b for all t < τ, while we reserve the letter τ for the true changepoint. Figure 2.2 is an example of a control chart, and the data stream has a changepoint at τ x_t observation number t Figure 2.2: A control chart Two well-known control chart schemes are CUSUM and EWMA, first described in [29 and [30, respectively, and discussed below CUSUM The Cumulative Sum CUSUM algorithm was first described in [29. It has since been proved to be optimal [28 in the sense defined in [24. Suppose we have observations x 1, x 2,... sampled from a distribution with known mean µ and variance σ 2. We choose two parameters, d and B, and we define the intermediate variable T k The expected value of this variable is T k x k µ dσ, E[T k E[x k µ dσ, E[T k dσ. Now, supposing we wish to detect an increase in the mean, we define the CUSUM S 0 0, S k+1 max{0, S k + T k+1 }, 5

11 and a change is detected when S k > Bσ. Similar logic is used for detecting a decrease in the mean. Although [28 showed that CUSUM is optimal, this is only the case when both the preand post-change distributions are known. If this is not the case, we do not have such strong theoretical guarantees. Moreover, the sensitivity of the change detector when deployed in practice will depend on the choice of the parameters d and B. According to [33, three common pairs of choices are d 0.25, B 8.00, d 0.50, B 4.77, d 1.00, B Although these pairs of parameters may each perform well in a given situation, it is not obvious, given a data stream, which pair should be used. Indeed, as a stream evolves and changes occur, it is unlikely that a fixed pair of parameters will continue to be optimal after each change. This is part of the motivation for introducing our forgetting factor scheme in Chapter 4, and later our adaptive forgetting factor scheme in Chapter 5: it is hoped that these two schemes together will allow the control parameters to be chosen by the algorithm during an initial monitoring period, and thereby remove our dependence on choosing parameters beforehand EWMA The EWMA Exponentially Weighted Moving Average control chart is another online change detection method, first described in [30. Suppose we have observations x 1, x 2,... sampled from a distribution with known mean µ and variance σ 2. We then define the new variables Z 1, Z 2,..., by Z 0 µ Z k 1 rz k 1 + rx k where r [0, 1 acts as an exponential forgetting factor. It can be shown [30 that the standard deviation of Z k is r [ σ Zk 1 1 r 2k σ. 2 r If we wanted to detect an increase in the mean, a change would be detected when Z k > µ + Lσ Zk where L is a control parameter chosen to give the algorithm a desired performance in terms of ARL 0 or ARL 1. It is also possible to modify EWMA to perform two-sided detection. According to [25, the parameter r is usually chosen to be 0.05 < r < 0.5 for detecting small shifts, while L is usually chosen to be close to 3. The original paper [30 showed that, in practice, EWMA is good at detecting small shifts in the process mean. Interestingly, [25 showed that the properties of EWMA are similar to those of CUSUM schemes, a point further discussed in [26. However, this was the case 6

12 when the an optimal choice of parameters were used for EWMA in comparison to a CUSUM scheme using a seemingly arbitrary not necessarily optimal choice of parameters. We will come back to this topic later. For our present purpose, a central difficulty with both CUSUM and EWMA is the selection of control parameters. We shall return to this in Chapter Estimating the parameters of the known distribution In Section 2.3.1, we mentioned that CUSUM is optimal at detecting a change, but only when the pre- and post-change distributions parameters are known. With a data stream, however, we may not know what the distributions are, let alone the values of the parameters. However, there are cases where we may confidently model a process by a given family of distributions, but we will not know the values of the parameters. For example, we may know that a given set of observations are sampled from a normal distribution Nµ, σ 2, but we do not know the values of µ and σ 2. In these cases, we could try and estimate the values of the parameters during an initial monitoring period, assuming that no change occurs. This is called a burn-in period. However, if the burn-in period is too short, our parameter estimates will be inaccurate, which will then lead to poor performance of the change detection algorithm. An extensive literature review of this approach can be found in [21, where the key issues that are discussed include sample size for the monitoring period, the impact of parameter estimation on algorithm performance, and other possible approaches. This approach will be the subject of future work, but during this report we shall assume that the pre-change distribution parameters are known. 2.5 Restarting an algorithm after a changepoint Suppose we are monitoring a data stream x 1, x 2,..., which we assume is sampled from a distribution D, and detect a change at τ. Once a change has been detected at time τ, this signals that at least the parameter values of the distribution have changed. In a streaming data context, it is likely that we would wish to continue monitoring the new data stream x τ, x τ+1,... for future changes. This leads to a problem: our change detection algorithm requires the distribution s parameter values, but it is rare that we will know the values of the post-change parameters, and we cannot use the pre-change parameter values since a change has occurred. In this situation, one approach would be to estimate the post-change parameters during a burn-in period, and then use these estimates to detect a change in the new stream x τ, x τ+1,.... However, as mentioned in Section 2.4, this approach will be part of future work. Therefore, during this report we only consider change detection up until the first changepoint. 2.6 Discussion We have discussed the problems facing streaming data change detection, and for this report we have made the following compromises: 7

13 we assume the pre-change parameters are known we only monitor a stream for a change until the first changepoint Future work will see us build on this foundation and attempt to develop an algorithm not requiring the knowledge of parameters and capable of continuing to monitor the data stream for further changes after an inital change has been detected. The idea of a control chat, which will be central to our forgetting factor algorithms, was described. We then discussed the performance measures ARL 0 and ARL 1 and given a brief review of two popular algorithms, CUSUM and EWMA. In the next chapter we introduce the idea of using a forgetting factor to adaptively estimate the mean and variance. 8

14 Chapter 3 Adaptive Filtering 3.1 Introduction Consider a sequence of values x 1,..., x N,... emitted by a time-varying process and assume that the nature of the time-variation is unknown. Now suppose we want a good estimate of the process mean at each time point, as data arrives. Simple averaging over the sequence will not necessarily provide a good estimate of the current mean since the older data is not as informative as more recent data. The purpose of adaptive filtering [18 is to provide estimation methodology that handles time-variation by placing more emphasis on recent observations. In this chapter we describe and develop methods based on exponential forgetting factors [1, parameters which control the trade-off between current and historic data. For simple statistics, such as the mean, adaptive filtering methodology provides efficient sequential estimation and is hence well-suited to the demands of data streams. This chapter considers adaptive filtering for the mean and variance with a fixed forgetting factor λ [0, 1. The problem of selecting this parameter is similar to that of choosing the parameters for CUSUM and EWMA in Sections and However, an elegant and efficient scheme for tuning λ sequentially is given in Section 5. The following chapter will explore how to construct a change detector based on adaptive filtering with a fixed forgetting factor. 3.2 Mean and Variance with a forgetting factor We shall define both the forgetting factor sample mean x N,λ and the forgetting factor sample variance s 2 N,λ. Suppose we have a sequence of N observations x 1, x 2,..., x N R. In the rest of this chapter, and this report, we shall relax notation and not distinguish between observations x k and random variables X k, unless there is a specific need to do so The forgetting factor mean x N,λ The sample mean x N is defined as x N 1 N 9 x k. 3.1

15 The forgetting factor sample mean x N,λ is defined by x N,λ 1 with forgetting factor λ [0, 1, and the weight is defined by λ N k x k, 3.2 λ N k. 3.3 Figure 3.1 illustrates x N,λ for different values of λ, generated from data shown in Figure 3.2. Note that the sample mean x N corresponds to x N,λ with λ 1. It is clear from Figure 3.2 that the process undergoes an abrupt change and, as claimed earlier, simple averaging does not provide a good up-to-date estimate of the process mean, compared to the filtered estimates. 5 4 forgetting factor mean value of λ lambda1 lambda0.99 lambda0.95 lambda observation Figure 3.1: x N,λ for different values of λ when there is a jump in the mean of the data. The data used is shown in Figure The purpose of the forgetting factor λ If we consider 3.2 in its expanded form x N,λ 1 [ λ N 1 x 1 + λ N 2 x λ x N 1 + x N, 3.4 we see that the forgetting factor λ [0, 1 exponentially downweights older observations x 1, x 2,..., and thereby places more weight on more recent observations..., x N 1, x N. Therefore, the forgetting factor mean x N,λ is a random variable approximating the mean of the observations x 1,..., x N, but placing more weight on the more recent observations. 10

16 6 observation value observation Figure 3.2: The data used in Figure 3.1. x 1,..., x 50 N0, 1 and x 51,..., x 100 N5, The value of λ Considering Equation 3.4, we see that the closer λ is to 0, the more it downweights older observations, and the more weight it places on recent observations. The benefit of this is that the forgetting factor mean will be close to the mean of the last few most recent observations. However, a drawback of this is that it makes x N,λ more sensitive to outliers. On the other hand, if the forgetting factor mean is close to 1 say λ 0.99 the less weight there is on the most recent observations, and the closer that x N,λ is to the unweighted mean x N. This makes x N,λ less sensitive to outliers. The distinction between an outlier and a true change-point is subtle: change-points result in a persistent effect, whereas outliers are ephemeral. In principle, efforts could be made to robustify estimation against outliers [19; however, this is a subject deferred to potential future work Expectation and variance of x N,λ If we consider Equation 3.2, there are various possible choices for the weight. In this case, we have chosen to be defined as in 3.3 so that x N,λ is unbiased. In other words, suppose that for x 1,..., x N we have E[x k µ. Then E[ x N,λ µ. 3.5 This is shown in Section 1.1 in Appendix 1. Furthermore, suppose that for k 1, 2,..., N we have Var[x k σ 2. Then where in Section 1.2 in Appendix 1 it is shown that u N,λ 1 2 Var[ x N,λ u N,λ σ 2, 3.6 λ 2 N k 1 λ1 + λ N 1 λ N 1 + λ

17 The knowledge of the expected value and variance of x N,λ enables us to construct a control chart for x N,λ. This will be discussed in Section The forgetting factor variance s 2 N,λ We defined the forgetting factor sample mean x N,λ in order to have a good estimate of the sample mean with more weight placed on recent observations. It could also be useful to have a good estimate of the sample variance, with more weight placed on more recent observations, and so in a similar manner we define the the forgetting factor sample variance s 2 N,λ. First, we recall that the sample variance of x 1,..., x N is defined by s 2 N 1 N 1 [ 2. xk x N We then define the forgetting factor variance s 2 N,λ s 2 N,λ 1 v N,λ λ N k[ 2, x k x N,λ 3.8 where x N,λ is the forgetting factor mean and v N,λ is a weight defined as v N,λ 2λ1 λn 1 1 λ1 + λ. 3.9 The derivation for the weight v N,λ is given in Section 1.4 of Appendix forgetting factor variance value of λ lambda1 lambda0.99 lambda0.95 lambda observation Figure 3.3: s 2 N,λ for different values of λ, when there is a jump in the mean of the data. The data used is shown in Figure 3.2. In Figure 3.3, it is shown that the closer that λ is to 0, the faster s 2 N,λ drops back down to the true variance value of 1, after the jump in the mean occurs at τ 50 Figure 3.2. If, on the other hand, there is a jump in the variance of the data, as in Figure 3.5, we see Figure 3.4 that the closer that λ is to 0, the faster s 2 N,λ changes to the true variance value. 12

18 12 10 forgetting factor variance value of λ lambda1 lambda0.99 lambda0.95 lambda observation Figure 3.4: s 2 N,λ for different values of λ, when there is a jump in the variance of the data. The data used is shown in Figure observation value observation Figure 3.5: The data used in Figure 3.4. x 1,..., x 50 N0, 1 and x 51,..., x 100 N0, Expectation and variance of s 2 N,λ As for the mean, we are interested in constructing control charts for the variance. To do this, we require at least the first two moments of the distribution of s 2 N,λ. Again, suppose we have N independent observations x 1,..., x N with E[x k µ, Var[x k σ 2, 13

19 and furthermore, if we assume that x 1,..., x N all have the same fourth moment ν 4 E[x 4 k ν 4. Then our choice of the weight v N,λ gives E[s 2 N,λ σ 2, 3.10 Var[s 2 N,λ K 4 ν 4 + K 2 1µ 2 + σ 2 2, where K 2 and K 4 are functions of both N and λ given in Section 1.5 of Appendix 1. Note that the choice of v N,λ given in Equation 3.9 makes s 2 N,λ unbiased and gives us Equation This is the same argument as used for the mean in Section Again, the knowledge of the expected value and variance of s 2 N,λ change detector for s 2 N,λ. This will be discussed in Chapter 4. enables us to construct a Recursive definitions of x N,λ and s 2 N,λ As discussed in the Introduction, we want to handle streaming data and detect changes in an online manner. In order to sequentially detect changes, it would be useful to have sequential update equations in order for our algorithms to be more efficient. It is useful to note that we can define both x N,λ and s 2 N,λ recursively. For N 1, m N,λ λm N 1,λ + x N, 3.11 λw N 1,λ + 1, 3.12 x N,λ m N,λ, 3.13 and we define m 0,λ 0 and w 0,λ 0. Alternatively, we could define x N,λ recursively by x N,λ x N 1,λ + x N, with x 0,λ 0 and w 1,λ 1. The recursive definition for s 2 N,λ is s 2 N,λ 1 [ λv N 1,λ s 2 N 1,λ + v N,λ wn,λ 1 2 x N 1,λ x N. We do not have a convenient recursive equation for the weight v N,λ, but we do have the relation v N,λ [ 1 un,λ, see Section 1.4 of Appendix 1 and there is a recursive definition for the weight u N,λ defined in Equations 3.6 and 3.7 given by 2 2 wn,λ 1 1 u N,λ u N 1,λ +, see Section 1.3 of Appendix 1 in addition to the recursive equation for in Equation Therefore, we implicitly have a recursive equation for u N,λ. We therefore have recursive equations for x N,λ, s 2 N,λ and their weights which matches the needs of streaming data. 14

20 3.3 Optimal forgetting factor λ We have already remarked above that the forgetting factor λ [0, 1. If we consider the forgetting factor mean in Equation 3.4, we see that for λ 1, x N,1 x N, the unweighted mean. On the other hand, when λ 0, x N,0 x N, the Nth observation. Bearing these two extremes in mind, we see that the closer λ is to 0, the more weight there is on the more recent observations, while when λ is close to 1, the less emphasis there is on the more recent observations. It is natural to wonder if there is an optimal value of λ. In order to proceed with this question, we first need to define what we might mean by optimal, and since we are concerned with change detection, it is natural to consider the problem of finding an optimal λ with respect to a single change-point in the data The data Assume that we have a sequence with at least N observations Defining D N τ, suppose x {x 1, x 2,..., x τ, x τ+1,..., x N,... }. x 1, x 2,..., x τ Nµ 0, σ 2 0, 3.14 x τ+1,..., x τ+d, Nµ 1, σ 2 1, 3.15 If µ 0 µ 1 or σ 2 0 σ 2 1 we say there is a jump in the mean or variance of the data stream at time τ Definition of optimal λ Given the data stream above, we define { arg { inf λ [0,1 E [ x N,λ µ 0 λ 2 } A 0 for N τ, τ,n arg { inf λ [0,1 E [ x N,λ µ 1 2 } 3.16 A 1 for N > τ, A 0 x 1,..., x N Nµ 0, σ 2 0, A 1 x 1,..., x τ Nµ 0, σ 2 0, x τ+1,..., x N Nµ 1, σ 2 1. If more than one λ gives the same infimum, we take the larger λ value. We are mainly interested in the case N > τ, where we trying to find the optimal fixed λ that gives the smallest difference between µ 1 and the forgetting factor mean x N,λ at time N τ + D for a changepoint at τ. We shall often use the shorthand E [ x N,λ µ 0,1,N 2 A 0,1, where we define µ 0,1,N as µ 0,1,N { µ 0 for N τ, µ 1 for N > τ,

21 and A 0,1 as A 0,1 { A 0 for N τ, A 1 for N > τ. It is shown in Section 1.10 of Appendix 1 that for N > τ, E [ [ 2 1 x N,λ µ 2 2 A 0,1 λ D w τ,λ µ 1 + w D,λ µ 2 µ [ λ 2D w τ,λ 2σ1 2 + w D,λ 2σ This provides us with sufficient information in order to solve for λ τ,n, and we then define the optimal forgetting factor vector as λ τ λτ,1, λ τ,2,..., λ τ,n, Solving for optimal λ In order to find λ τ,n arg { inf λ [0,1 E[ x N,λ µ 0,1,N 2 A 0,1 }, 3.20 we numerically evaluate 3.19 for λ {0, δ, 2δ,..., L 1δ, Lδ 1}, where δ 1/L and L is a large number e.g. L Note that while we could try to find an analytical solution for 3.20 by trying to solve λ E[ x N,λ µ 0,1,N 2 A 0,1 0, 3.21 however the result of the derivative in 3.21 is not easy to solve for λ Results for optimal λ: Example The development above provides the general framework for reasoning about optimal fixed λ. To illustrate this using a specific example, suppose we have the sequence x 1, x 2,..., x 50 N0, 1, 3.22 x 51,..., x 100, N5, 1, 3.23 and if we find λ 50,N for each N 1, 2,..., 100,..., we can compute the vector λ 50 λ50,1, λ 50,2,..., λ 50,100,.... Figure 3.6 is a plot of λ. We notice that λ 1,50,..., λ 50,50 1, indicating that the optimal forgetting factor to use in this range is 1, which will give equal weight to all the random variables up until the change-point at 50. We then see that after the change-point at 50, the optimal forgetting factor drops almost to 0, before increasing slowly back towards So, for example, in this simulation the optimal forgetting factor that will give the smallest residual for the random variables x 1,..., x 75 is approximately In Chapter 5 we will reconsider this framework with a time-varying forgetting factor. 16

22 lambda_hat t Figure 3.6: A plot of the optimal forgetting factor λ 50 for the random variables given in Equations 3.22 and Discussion We have used an exponential forgetting factor λ [0, 1 to define the forgetting factor mean x N,λ and forgetting factor sample variance s 2 N,λ. These place more importance on recent observations than the unweighted mean and sample variance x N and s 2 N. Furthermore, it is shown that these have sequential definitions. Finally, we considered the idea of a theoretically optimal value for λ. In the next chapter we construct a change-detector using x N,λ and s 2 N,λ. 17

23 Chapter 4 Adaptive Filtering Change Detection 4.1 Introduction We now apply the ideas from Chapter 2 to change detection of the forgetting factor mean x N,λ and sample variance s 2 N,λ defined in Chapter 3. We briefly consider the general case, before considering the pre-change distributions to be normal. 4.2 Change Detection of using Chebyshev s inequality From Sections and 3.2.6, we have the result that given a sequence x 1, x 2,..., x N all with the same known expected value and variance and perhaps fourth moment, there are expressions for the expected value and variance of both x N,λ and s 2 N,λ. Although there is not much more that can be done without further information, Chebyshev s inequality can be used to obtain a prediction interval and hence construct a change detector Chebyshev s inequality Suppose X is a random variable with known expected value µ E[X and variance σ 2 Var[X. Then for any real number k > 0 we have Chebyshev s inequality [33 Pr X µ kσ 1 k This is equivalent to Pr X µ kσ 1 1 k, 2 Pr kσ X µ kσ 1 1 k, 2 Pr X µ kσ, µ + kσ 1 1 k Now suppose that for j 1, 2,..., N we have E[x j µ, Var[x j σ 2. 18

24 From Section we have for x N,λ : E[ x N,λ µ, Var[ x N,λ u N,λ σ 2. If we wanted a 99% prediction interval C 0.99 for x N,λ, then Equation 4.2 with k 10 gives C 0.99 µ 10σ u N,λ, µ + 10σ u N,λ. Then Pr x N,λ C , which gives us a simple test for detecting a change: 1. x N,λ C 0.99 in-control, 2. x N,λ C 0.99 out of control. Similarly, using the results in Section 3.2.6, we can obtain a prediction interval for the forgetting factor sample variance s 2 N,λ of x 1,..., x N. Simulation results for the performance of this detector are given in Section Prediction intervals: notation In general, instead of a 99% prediction interval, we shall consider a 1 q% prediction interval. There is slight abuse of notation in this definition. When we mean 1 q% for a 99% prediction interval, instead of 1 q 99, we shall rather mean 1 q 0.99, q 0.01 In other words, although it would be more appropriate to call it a 1 q prediction interval without the percentage sign, we do not do this. Although this is a small point, it is necessary to clarify our notation for later. 4.3 Change detection: assuming normality Although Chebyshev s inequality does give us a method for obtaining a prediction interval, it is likely that the bounds will be quite coarse, since it applies to random variables from any distribution. If, however, we know that our observations x 1,..., x N belong to a particular distribution, it could be possible to obtain better prediction intervals for x N,λ and s 2 N,λ. With this in mind, we investigate the case where the observations are sampled from a normal distribution. This assumption mirrors the usual development of detectors in the literature. Although not considered here, some approaches attempt to transform the distribution to standard normal random variables and proceed as we described [16. 19

25 4.3.1 Change detection for x N,λ assuming normality Suppose now that the random variables are normally distributed x 1,..., x N Nµ, σ 2. Then from Section we immediately have x N,λ Nµ, u N,λ σ Suppose we now want a prediction interval for x N,λ. Recall that the quantile function Q D for a distribution D is defined as the function, such that for any quantile α [0, 1, PrX Q D α α. Then using the quantile function for the normal distribution D Nµ, u N,λ σ 2, we obtain the 1 q% prediction interval as C Q D q/2, Q D 1 q/2. The quantile function for a normal distribution does not exist in closed form. However, numerical schemes for computing it are efficient and hence still applicable to streaming data change detection Change detection for s 2 N,λ We again start by assuming assuming normality x 1,..., x N Nµ, σ 2, and show that with this assumption we can write s 2 N,λ as a weighted sum of chi-squared variables. Recalling the definition of s 2 N,λ from Equation 3.8, s 2 N,λ 1 v N,λ we define the variables y 1,..., y N by λ N k[ 2, x k x N,λ y k x k x N,λ. 4.4 Since each x k Nµ, σ 2 is normal and x N,λ Nµ, u N,λ σ 2 is normal, each y k is normal with expectation and variance E[y k 0, 4.5 Var[y k σ 2 k, y k N0, σ 2 k, where the calculation σk 2 σk,n,λ 2 [1 + u N,λ 2λN k σ 2,

26 is given in Section 1.6 in Appendix 1. We next define the random variables t k, by If we then define the coefficients α k by this allows us to write s 2 N,λ as t k 1 σ k y k N0, 1, t 2 k χ 2 1. α k λn k v N,λ σ 2 k, 4.7 s 2 N,λ 1 λ N k[ 2, x k x N,λ v N,λ λ N k [y k 2, v N,λ λ N k v N,λ σ 2 kt 2 k, α k t 2 k, 4.8 which shows that s 2 N,λ is a weighted sum of chi-squared random variables. We also note that since σ 2 k > 0, v N,λ > 0, and λ N k > 0, we have α k > Numerically evaluating the distribution function of s 2 N,λ Unfortunately, there is no known analytic form for the density function or cumulative distribution function of a weighted sum of chi-squared random variables. However, there are several numerical algorithms for evaluating either the density or distribution function [8 and Section 18.8 in [22. We use the algorithm given by J. P. Imhof in [20 to evaluate the distribution function of s 2 N,λ. The algorithm described by Imhof numerically calculates the cumulative distribution function of s 2 N,λ β k zk 2 where n k N. The input for the algorithm is 1. the value of s 2 N,λ, 2. the vector of coefficents β 1,..., β N, β k > 0, z 2 k χ 2 n k 3. the vector of degrees n 1,..., n N which all equal 1 in our case, 21

27 and the output is the cdf F Imhof s 2 N,λ ; β 1,..., β N ; n 1,..., n N [0, 1. We will often abbreviate this to F Imhof s 2 N,λ. We make a point of specifying the input here, because in Section we use the algorithm for the same value of s 2 N,λ, but different values of β 1,..., β N. Imhof s algorithm performs well on all quantiles of the distribution Section 18.8 in [22 in terms of speed and accuracy, especially when N gets large. There is also an R package CompQuadForm [12 that implements Imhof s algorithm as well as other algorithms in C A prediction interval for s 2 N,λ Assuming we have a data stream of normal variables x 1, x 2,..., such that x 1, x 2,..., x τ Nµ 0, σ 2 0, x τ+1,..., x τ+d, Nµ 1, σ 2 1, we compute the forgetting factor sample variance s 2 N,λ for each N. We compute the coefficients, described above, and numerically evaluate the cumulate distribution function F Imhof s 2 N,λ using Imhof s algorithm. Supposing we want a 1 q% prediction interval, if q F Imhof s 2 N,λ 2, 1 q, 2 we say the process in control. If, however, F Imhof s 2 N,λ q 2, 1 q, 2 we say it is out of control. It is worth thinking about the difference between the two out-ofcontrol cases F Imhof s 2 N,λ < q 2, F Imhof s 2 N,λ > 1 q 2. If F Imhof s 2 N,λ < q/2, then the value of s2 N,λ is smaller than we expect it to be, or, in other words, s 2 N,λ has decreased. Analogously, F Imhofs 2 N,λ > 1 q/2 indicates an increase in the sample variance. This distinction will be important in Section Covariance of the chi-squared random variables In order for Imhof s algorithm and the other algorithms to work, the chi-squared random variables need to be independent. Although the random variables x 1,..., x N are assumed to be independent, the variables y k x k x N,λ are not, since it is shown in Section 1.7 of Appendix 1 that, for i j, [ Covy i, y j u N,λ 1 λ N i + λ N j σ 2. 22

28 It can be shown, for certain values of λ, that Covy i, y j 0. Then, since t k 1 σ k y k, Covt i, t j 1 Covy i, y j, σ i σ j 1 σ i σ j 0, [ u N,λ 1 λ N i + λ N j σ 2, 4.9 shows that the t k are dependent, and therefore the chi-squared variables t 2 k are dependent. However, there is a theorem we can use to write the weighted sum of these dependent chi-squared variables as a weighted sum of independent chi-squared variables Box s Theorem An extension of Cochran s Theorem, stated by G. E. P. Box in [6 is the following: Theorem If z denotes a column vector of N random normal variables z 1,..., z N having expectation zero and distributed in a multinormal distribution with N N variance-covariance matrix V, and if Q z Mz is any real quadratic form of rank R N, then Q is distributed like a quantity R X ν j ξj 2, j1 where each ξj 2 χ 2 variate is distributed independently of every other, and the ν k are the R real nonzero eigenvalues of the matrix U V M. This provides the option of handling dependence, albeit at some computational cost, rather than simply asserting independence for convenience. Issues related to this will be explored in the following sections Dependent sum to independent sum We reparametrize s 2 N,λ one more time in order to conveniently use this theorem. We define the new variables z 1,..., z N by z k λ N k v N,λ y k,

29 then s 2 N,λ s 2 N,λ α k t 2 k, λ N k v N,λ σ 2 kt 2 k, λ N k zk. 2 v N,λ y k 2, We note that [ λ E[z k E N k y k, v N,λ 0, using Equation 4.5. Now, if we use Theorem in Section above for s 2 N,λ N z2 k, the matrix M is just the identity matrix, and so the coefficients ν j are the eigenvalues of V, the covariance matrix of the z 1,..., z N. Now for i j the off-diagonal entries of V, λ Covz i, z j Cov N i λ N i v N,λ y i, v N,λ λ N j λ N j v N,λ y j v N,λ Covy i, y j λ 2N i+j Covy i, y j v N,λ λ 2N i+j v N,λ [ u N,λ 1 λ N i + λ N j σ 2, and for i j the diagonal entries of V, Covz i, z i Varz i [1 λn i + u N,λ 2λN i σ 2. v N,λ However, since the z k are linearly dependent Section 1.8 in Appendix 1, this then implies that the covariance matrix V is of rank N 1, and so we should expect to have only N 1 nonzero eigenvalues ν 1,..., ν N 1. Since any covariance matrix is positive-semidefinite see Section 1.9 in Appendix 1, the ν j are all positive. 24

30 So, we can write s 2 N,λ as a sum in the following three ways: s 2 N,λ α k t 2 k, 4.11 zk, 2 N 1 ν j ξj j1 In Equation 4.11 it is the weighted sum of N dependent chi-squared variables t k with coefficients α k, while in equation 4.12 it is the weighted sum of N 1 independent chi-squared variables ξ j with coefficients ν j using Theorem The cost in accuracy of assuming independence It is worth questioning the benefit of taking the dependence of the the chi-squared variables t k into account, and applying Box s Theorem. In order to obtain the coefficients ν j of the independent chi-squared variables ξ j, we need to construct the N N covariance matrix V, which takes a large amount of memory for large N, and find the non-zero eigenvalues of V, which is computationally expensive for large N. However, if we ignore the dependence we can simply use the coefficients α k, which are easily computable. If we take dependence into account, we define the cumulative distribution function F dep as F dep s 2 N,λ F Imhof s 2 N,λ; ν 1,..., ν N 1 ; 1,..., 1, 4.13 where F Imhof is the cumulative distribution function using Imhof s algorithm Section Note that in this case the degrees of all the chi-squared variables is 1. On the other hand, if we ignore the dependence and simply use the coefficients α k, the cumulative distribution function F indep is defined as F indep s 2 N,λ F Imhof s 2 N,λ; α 1,..., α N ; 1,..., It is reasonable to expect there to be a difference between F dep s 2 N,λ and F indeps 2 N,λ. We define the quantiles q F dep s 2 N,λ, 4.15 and we the investigate the relative difference between assuming independence and respecting dependence by plotting q Findep F 1 dep q q Specifically, Figures 4.1, 4.2 and 4.3 respectively show the value of this relative error for q 0.995, 0.5 and 0.02 approximately, for λ 0.8, 0.85, 0.9, 0.95 and We say approximately because that is the approximate quantile for a specific value of s 2 N,λ. 25

31 These figures show that assuming independence is acceptable for the upper and middle quantiles Figures 4.1 and 4.2, respectively, but the approximation is poor for the lower quantiles Figure 4.3. Although not shown here, similar figures are obtained for q < Therefore, following our discussion in Section 4.3.4, assuming independence will be acceptable if we are trying to detect an increase in the sample variance, but unacceptable if we are trying to detect a decrease. However, our concern is primarily for streaming data where computational efficiency is paramount. Therefore, we will only attempt to detect an increase in the sample variance, and change detectors for the forgetting factor sample variance will be based on the assumption of independence in order to avoid the need for an eigendecomposition every time a data point arrives F_dep F_indep / F_dep value of lambda lambda0.8 lambda0.85 lambda0.9 lambda0.95 lambda Number of chi squared variables Figure 4.1: q F indep F 1 dep q /q for quantile q approximately F_dep F_indep / F_dep value of lambda lambda0.8 lambda0.85 lambda0.9 lambda0.95 lambda Number of chi squared variables Figure 4.2: q F indep F 1 dep q /q for quantile q 0.5 approximately. 26

32 0.40 F_dep F_indep / F_dep value of lambda lambda0.8 lambda0.85 lambda0.9 lambda0.95 lambda Number of chi squared variables Figure 4.3: q F indep F 1 dep q /q for quantile q 0.02 approximately. 4.4 Discussion After briefly outlining a way to use Chebyshev s Inequality to construct a change-detector for x N,λ or s 2 N,λ for any distribution, we then focus on contructing prediction intervals when we can assume that the pre-change distribution is normal and the mean and variance are known. For x N,λ this is straightforward, but for s 2 N,λ there are two possible formulations. After investigating its accuracy, we favour the simpler method for evaluating s 2 N,λ, which is also faster to implement, at the expense of only being able to detect an increase in the variance. 27

33 Chapter 5 Adaptive forgetting factor 5.1 Introduction In the previous chapter our forgetting factor change detection algorithms still places responsibility on the user to choose a value for λ [0, 1, and while one value of λ might be appropriate for a given data stream, it almost certainly will not be the optimal value for all other data streams. In other words, the optimal value of λ for our change detection algorithm could depend on the data stream we are monitoring. Furthermore, when a change occurs and we restart the algorithm, we will be monitoring a different stream since we detected a change, and the optimal value of λ is likely to be different for this new stream. We therefore do the following: given a data stream, one approach would be to use an initial sequence of observations x 1, x 2,..., x B, which we assume does not contain a change-point in order to choose a value of λ to be used for future monitoring of that stream. We call this initial period a burn-in period see Section 2.4. We then choose a cost function that characterises a property we would like to minimise e.g. deviation from the mean, or variance, and find a sequence of forgetting factors λ 1, λ 2,..., λ t,..., λ B, at each time t that minimises our cost function using the observations x 1,..., x t. The last step involves using the sequence λ 1,..., λ B to choose a fixed λ [0, 1. This procedure moves the user s responsibility from choosing a parameter in the range [0, 1, to choosing what statistic of the data to minimise. Below we describe our method for initialising and updating an adaptive forgetting factor, based on the formulation of the forgetting factor mean and sample variance described in Chapter 3. We introduce a class of cost functions, and describe how we can find the minimum of these cost functions. Following the development of these methods we shall describe the results of an extensive simulation study involving change detection with forgetting factor estimation and then compare performance amongst the algorithms. 5.2 Adaptive forgetting factors In [1, adaptive forgetting factors called self-tuning regulators are introduced in order to provide a method to control systems with unknown but constant parameters, with the main 28

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review. Pierre Granjon The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Time Series and Forecasting

Time Series and Forecasting Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables. Generation of random variables (r.v.) Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

LS.6 Solution Matrices

LS.6 Solution Matrices LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

More information

18.6.1 Terms concerned with internal quality control procedures

18.6.1 Terms concerned with internal quality control procedures 18.6.1 Terms concerned with internal quality control procedures Quality assurance in analytical laboratories Quality assurance is the essential organisational infrastructure that underlies all reliable

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Optimal linear-quadratic control

Optimal linear-quadratic control Optimal linear-quadratic control Martin Ellison 1 Motivation The lectures so far have described a general method - value function iterations - for solving dynamic programming problems. However, one problem

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

So let us begin our quest to find the holy grail of real analysis.

So let us begin our quest to find the holy grail of real analysis. 1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers

More information

Numerical Methods for Option Pricing

Numerical Methods for Option Pricing Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly

More information

5.1 Radical Notation and Rational Exponents

5.1 Radical Notation and Rational Exponents Section 5.1 Radical Notation and Rational Exponents 1 5.1 Radical Notation and Rational Exponents We now review how exponents can be used to describe not only powers (such as 5 2 and 2 3 ), but also roots

More information

Mathematical Induction

Mathematical Induction Mathematical Induction (Handout March 8, 01) The Principle of Mathematical Induction provides a means to prove infinitely many statements all at once The principle is logical rather than strictly mathematical,

More information

DEALING WITH THE DATA An important assumption underlying statistical quality control is that their interpretation is based on normal distribution of t

DEALING WITH THE DATA An important assumption underlying statistical quality control is that their interpretation is based on normal distribution of t APPLICATION OF UNIVARIATE AND MULTIVARIATE PROCESS CONTROL PROCEDURES IN INDUSTRY Mali Abdollahian * H. Abachi + and S. Nahavandi ++ * Department of Statistics and Operations Research RMIT University,

More information

Nonparametric Methods for Online Changepoint Detection

Nonparametric Methods for Online Changepoint Detection Lancaster University STOR601: Research Topic II Nonparametric Methods for Online Changepoint Detection Author: Paul Sharkey Supervisor: Rebecca Killick May 18, 2014 Contents 1 Introduction 1 1.1 Changepoint

More information

Regions in a circle. 7 points 57 regions

Regions in a circle. 7 points 57 regions Regions in a circle 1 point 1 region points regions 3 points 4 regions 4 points 8 regions 5 points 16 regions The question is, what is the next picture? How many regions will 6 points give? There's an

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

A Hybrid Approach to Efficient Detection of Distributed Denial-of-Service Attacks

A Hybrid Approach to Efficient Detection of Distributed Denial-of-Service Attacks Technical Report, June 2008 A Hybrid Approach to Efficient Detection of Distributed Denial-of-Service Attacks Christos Papadopoulos Department of Computer Science Colorado State University 1873 Campus

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

The Exponential Distribution

The Exponential Distribution 21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

Understanding the Impact of Weights Constraints in Portfolio Theory

Understanding the Impact of Weights Constraints in Portfolio Theory Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris thierry.roncalli@lyxor.com January 2010 Abstract In this article,

More information

Financial TIme Series Analysis: Part II

Financial TIme Series Analysis: Part II Department of Mathematics and Statistics, University of Vaasa, Finland January 29 February 13, 2015 Feb 14, 2015 1 Univariate linear stochastic models: further topics Unobserved component model Signal

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

6 Scalar, Stochastic, Discrete Dynamic Systems

6 Scalar, Stochastic, Discrete Dynamic Systems 47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Real Roots of Univariate Polynomials with Real Coefficients

Real Roots of Univariate Polynomials with Real Coefficients Real Roots of Univariate Polynomials with Real Coefficients mostly written by Christina Hewitt March 22, 2012 1 Introduction Polynomial equations are used throughout mathematics. When solving polynomials

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Package cpm. July 28, 2015

Package cpm. July 28, 2015 Package cpm July 28, 2015 Title Sequential and Batch Change Detection Using Parametric and Nonparametric Methods Version 2.2 Date 2015-07-09 Depends R (>= 2.15.0), methods Author Gordon J. Ross Maintainer

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Solving Linear Systems, Continued and The Inverse of a Matrix

Solving Linear Systems, Continued and The Inverse of a Matrix , Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities Algebra 1, Quarter 2, Unit 2.1 Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities Overview Number of instructional days: 15 (1 day = 45 60 minutes) Content to be learned

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Analysis of a Production/Inventory System with Multiple Retailers

Analysis of a Production/Inventory System with Multiple Retailers Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University

More information

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012 Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Introduction. Appendix D Mathematical Induction D1

Introduction. Appendix D Mathematical Induction D1 Appendix D Mathematical Induction D D Mathematical Induction Use mathematical induction to prove a formula. Find a sum of powers of integers. Find a formula for a finite sum. Use finite differences to

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information