On Correlating Performance Metrics

Size: px
Start display at page:

Download "On Correlating Performance Metrics"

Transcription

1 On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are the basis of identifying and addressing computer performance related issues, from monitoring to capacity planning. IT professionals and performance analysts often face the questions of what metrics to look at, whether those metrics are related, and how to group them in meaningful way for further analysis, e.g., workload characterization. In this paper we present a number of techniques for correlating performance metrics, which could be sampled at different rates, at different lengths and overlapping intervals, and with different response times to a common system event. We also discuss how to assess the reliability of the correlations we compute and how to efficiently find relationships of interest.. Introduction For any given computer system it is likely that many different performance metrics are available. IT staffs and analysts often face the question of which to consider, whether they are related, which ones to monitor, and how to group them in meaningful ways for further analysis. For instance, You might want to know which disks are being used by a particular software package or application; You might want to figure out which resources are used by which http requests; You might want to know which remote node is sending packets to which remote node; You might want to investigate which performance metric values are affecting the application response time and causing it to exceed its threshold; You might want to identify the related resources used on different nodes to compute the end-to-end response time for an application; You might want to find out the time delays among the monitored values of different performance metrics for a given system event; You might need to know which processes are related, so you can properly describe each workload when doing workload characterization. The list can go on and on. One common characteristic of the above is that you want to link one object or phenomenon to another. While you may be able to answer some of these questions with probes or other specialized hardware or software, the power of the technique discussed here is that it has the potential to answer all of these questions and many others like them with relatively little pre-knowledge. The basic idea search for events with high correlation coefficients was described in another paper []. In this paper the emphasis is on addressing the

2 problems that happen when correlating real world data and how they can be handled. Many of the problems that occur during analysis happen because data has not been reliably produced. For example, performance metrics are often collected at different rates. A preprocessor is needed in this case before correlation can be performed. (This is discussed in Section 3.) Or more seriously, some metrics may not be collected continuously, due to hardware and software interruptions. Another complication is that some metrics have large variations and some only minor ones. Some may even be constant for a whole time/measurement interval. These differing characteristics create interesting challenges for identifying the relationships among performance metrics. We will deal with these issues in this paper. Let us first start with some definitions that will help us to quantify the discussion... Definitions The following concepts are used throughout this paper. Let X and Y be any two random variables with means (averages) x and y, and positive variances σ x and σ y. A typical example of the random variables that we will be comparing is the CPU consumption of two processes. Another typical comparison is between process page fault counts and disk activity. Definition. Variance: Let X be a random variable with a distribution F( x) : = Pr( X x), the variance of the distribution is defined by σ x : = E([ X E( X )] ) : = x x, where E(X) x is the expectation (mean) of X. The variance is a measure of the spread or scattering of a distribution. Definition. Covariance: Assume that X and Y have finite variances. Their covariance is defined as COV ( X, Y ) = xy x y. From the above definition we know that, if X and Y are independent, i.e., xy = x y, then COV(X, Y) =. (The converse is not true.) Since covariance is not a normalized number, when it is not equal to, we do not know how closely the two random variables are related. If we normalize the two variables by X * = ( X x) / σ and * Y = ( Y y) / σ, y x * * then they, X and Y, have mean and variance, which leads to a more informative relationship indicator: Definition 3. Correlation Coefficient. The correlation coefficient of X and Y is defined by C(X, Y) = COV( * X, Y * COV ( X, Y ) ) =. σ σ In other words, the correlation coefficient of two random variables is the covariance of the normalized two variables. Note that, C ( X, Y ) and that C(X, Y) = ± if and only if Y and X are related linearly, i.e., Y = ax + b. Note further that the correlation coefficient is independent of the origins and units of measurements. In other words, for any constants, a, b, c, d, with a >, c >, we have C(aX + b, cy + d) = C(X, Y). This property allows us to manipulate the scale of the data for a better visual representation of the relationship among performance metrics. One way to scale the measured data is to normalize it, so that the values are between and. For instance, let x y

3 a = /( X max X min ) and b = X X X ), () where min /( max min X X max ( min ) is the largest (smallest) value of the data, then the linear transformation ' X = ax + b will normalize the data, but not change the correlation coefficient. In other words, ' C ( X, Y ) = C( X, Y ). Y can also be changed, similarly. Please note that a and c have to be positive. Otherwise the manipulation will cause the curve to change its direction (first derivative), which will cause the correlation coefficient to change. (Though if both a and c are negative, the changes will cancel out.) Intuitively, high positive correlation of two random variables implies the peaks (and valleys) of their values tend to occur together. A linear transformation will not change this visual property. We will see in a latter section that the correlation coefficient is strongly affected by the trending or shape, i.e., the second derivative, of the performance curves. The basic technique here for finding relationships among different performance metrics is to find the correlation coefficient between pairs. When the correlation is high (close to ), we can assume that the two metrics represent resources that were working on the same task. Because correlation coefficients are invariant under linear transformations, the units of the metrics do not matter (that is, changing a metric from, say, bytes to MBytes will not change its correlation coefficient with another metric). However, correlations are sensitive to the time phase of the data. For this reason, the measurement interval should not be too small. There might be a delay between the various activities that complete a transaction. On the other hand, the measurement or sample interval should not be too large either. A very large sample interval not only reduces the potential number of samples for a given time interval, but also could link otherwise unrelated events. As a rule of thumb, the interval should be greater than times the average delay at different resources / devices... A Model for Presenting Performance Data Before we address the issues related to performance data correlation, let us first define time serial data stream, D: D = {( d, t ),( d, t ),...,( d n, t n )}. Generally, in performance data collection or measurement, it is true that t i+ ti = ti+ ti+, for i =,,, n. In other words, data or events are sampled at regular intervals. Unless stated otherwise we use this assumption in this paper. Now let us consider another data set, D ', ' ' D = {( d, t' ),( d', t' ),...,( d' ', t' ' )}. n n ' In general, t i may not equal to t i. When t ' i is a subsequence of t i (or vice versa) we will say the two sequences are in sync. When the sequences are not in sync, it is still possible to compute a correlation, but the results are less reliable. In the following discussion (unless stated otherwise) we assume that performance data is collected in sync.. Not All Metrics are the Same Consider a data set {( d, t ),( d, t ),...,( d n, t n )}. The first complexity we need to deal with is that the {d i } can have different meanings for different performance metrics. They might represent:. the current value of a fluctuating metric (e.g., memory utilization); or. the average value of a fluctuating metric since time t i- ; or 3

4 3. the current value of a cumulating metric (e.g., disk reads since system start up); or 4. an event count since t i-; Two Cumulative Sequences with CC=.998 When computing correlation coefficients it is best if: A. the data represents events that occurred during the same period of time, i.e., in sync; B. the data is of the same type or has been changed to the same type. For instance, if you try to compare an average value with an instantaneous value, you will not be getting the sharpest possible correlation, since the average represents work that occurred over a longer period of time. If both values are averages, then the correlation coefficient is more reliable when the averages are over the same number of samples. Another issue comes with cumulative metrics (which is how the majority of metrics are presented). Such metrics, of course, are always increasing (except when they rollover). Further, they represent events that happened before the current interval as well as events in the current interval. Because of this, correlating such metrics directly will underplay their variations. This means that, when computing correlations, one should transform such metrics into a sequence of differences and not use the metric itself. The correlation coefficients for the sequence of differences and for the original cumulative metrics are often quite different. Here is a simple example: Consider the following two sequences: their correlation coefficient (CC) is.998 (See Figure ) Series Series Figure. Two cumulative sequences appear to be highly correlated with correlation coefficient CC =.998. While the correlation coefficient of the sequences of their differences: is.49 (See Figure ) Two Sequences with CC = Series Series Figure. Although the two cumulative sequences in Figure appear to be highly correlated, the sequences of their differences are not. In fact, the CC =

5 It is appropriate that these latter two sequences have a negative correlation, since more often than not, one tends to increase its activity whenever the other decreases. It is interesting to note (though not directly relevant to this paper) that although two cumulative metrics are more likely to produce positive correlation, this is not always true. In fact two cumulative metrics could have close to zero correlation. For example, the two curves in Figure 3 have a correlation coefficient of.7..5e+44 E+44.5E+44 E+44 5E+43 Two Cumulative Series with CC=.7 previous sample. It is this new data set that should be used for computing the correlation coefficient. Correlating cumulative metrics directly would not give useful information Two Cumulative Series with CC= Series Series 8 Figure 4. Two cumulative series with correlation coefficient CC = Two Series with CC=-.7 Series Series Figure 3. Two cumulative sequences with correlation coefficient CC =.7. In theory, the two cumulative data sets could have a correlation coefficient anywhere from to (two straight lines, for instance). Figure 4 shows two cumulative series with correlation coefficient CC =.55. We could expect that, as the two curves' knees become closer, their correlation coefficient should approach. As we have mentioned before, two positively correlated cumulative performance data sets may not, in reality, be related. For each cumulative data set, if we take the difference of consecutive samples and form a new sequence based on the difference we will have a new data set in which each data sample captures the change since the Series 6 9 Series Figure 5. The two data series derived by taking differences of consecutive cumulative samples. The CC for the two series is.77, which is very different from the CC for the two original cumulative data sets, which have CC =.55 (see Figure 4)

6 3. Correlating Metrics with Different Sampling Rate Performance metrics are often sampled at different rates. Experience shows that even collectors programmed to sample at the same rate will eventually drift or skip for any number of reasons. When correlating two metrics with different sampling rates, one has to find a common interval in order to correlate them. First, we will assume that one interval is a multiple of the other. Let the two data sets be D and D : D = {[( d, t [( d,..., [( d ( d and ( n+ ), t ),( d ( n+ ) [( m ) n+ ] t ( mn), ( mn), t )]}, t ),( d [( m ) n+ ] ),...,( d ( n+ ), t ),( d t n, n ( n+ ) )] ),...,( d [( m ) n+ ], t D = {( d, t ),( d, t ),...,( d m, t m )}, t (n), (n) [( m ) n+ ] )] ),..., where for each data sample of D there are n data samples in D. We can only correlate data that represents the same period of time. So before we compute the correlation, the first sequence above must be modified so that n data samples that correspond to a particular data sample in the second sequence are merged or aggregated. This generally means that either one difference is computed or a sum (or average) is taken it depends on the nature of the first metric. Different ways of handling the additional information gained through the extra n samples in data set D for each sample in data set Dwill usually yield different correlation coefficient values. Since data set D has coarse sample intervals, the nature of the metric, cumulative, difference, or average, will determine whether the n data samples in D that correspond to a particular data sample in D are merged or aggregated. For instance, if both D and D are cumulative, then we need to take the differences between samples for both data sets. In this case, d d corresponds to d ( n + ) d, d 3 d corresponds to d (n+ ) d( n+ ), and so on. For the purpose of computing their correlation coefficient, the values of the skipped samples in D are irrelevant. 4. Correlating Data with Holes It would be simplest if the data we wanted to correlate had been collected continuously. But, quite frequently, there are interruptions in data collections, for either short periods (a few samples) or long periods (perhaps hours or days). In the cases where some sections of data (for either metric) are not measured/collected for more than one sample interval, those (time) sections for both should be removed from the correlation computation. We define a period that is not sampled for more than one sample interval as a hole in the measurement data set (Figure 6) Two Series with Holes Series Series Figure 6. Two data series each with one hole in it. The duration of the holes are represented by a value. 6

7 The question now becomes: what is the logical way of computing the correlation coefficient? There are several ways that the correlation coefficient can be computed. One method is to remove samples from the data sets when either metric contains a hole and correlate the remaining points. Another way is to compute the correlation coefficient for each continuous section and then compute the weighted average for those coefficients. While it may not be obvious, the first method will give more accurate results, as averaging correlation coefficients over pieces of intervals will give misleading answers. As a simple example (Figure 7), it could be that two metrics have a positive linear relationship on each of two subintervals (and thus would have a correlation coefficient of. on each subinterval), but that the parameters of the linear relationship differs on the two subintervals, which means overall the relationship is not linear and the correlation coefficient is less than.. In the example presented in Figure 7, the overall correlation coefficient for the two data sets is.. Figure 8 shows a scenario, based on the example in Figure 6, where holes are removed and continuous sections are connected for analysis. Note that, as we remove holes, the other metric s data intervals corresponding to the holes are removed as well. In this example, the two holes were artificially introduced. Before they were introduced, the correlation coefficient for the two data sets was.53. After the holes are removed, the two data sets presented in Figure 8 have correlation coefficient.5, which is close to the original one. Two Series with Linear Subintervals Series Series Figure 7. Although there are two subintervals in which the two data sets are linearly correlated, i.e., with correlation coefficient CC =, overall, the two data series are not correlated. Their correlation coefficient CC =.. Therefore, using the correlation coefficients of the subinterval to compute the overall correlation coefficient through, say, weighted average is not accurate. 7

8 ..8.6 Two Series After Removing the Holes estimated by finding what shift gives the maximal correlation coefficient. Note that, in general, the spill interval will be much larger than the delay, so the scenario described here will not happen very often. A spill interval is defined as one or more (usually more) consecutive sample intervals for which statistical aggregations, such as average, are often computed for the samples Series Series 3 Figure 8. The two data series after removing the holes existed in Figure 6. For the time intervals that either hole exists, samples for both data sets are removed. Note that it is not generally useful to interpolate or otherwise attempt to 'fill' a hole. In the first place, while it is easier to use data without holes, a hole does not have any impact on the reliability of the calculation of the correlation coefficient. Secondly, interpolation will tend to give more weight to the values bordering the holes: an 'advantage' that cannot be justified. 5. Correlating Events not Occurring at the Same Time Software execution is a chain event. Even with parallel processing on a multiprocessor machine some events happen before others. That is, correlated events may well happen with a delay. We could identify those potentially correlated events by systematically shifting the correlation interval. In other words, we could correlate the values of one metric with those of another, where the values of the second metric were collected at a later time than those of the first. When the correlation coefficient goes up after a shift, it is an indication that the two metrics support the same activity and that one runs with a delay relative to the other. The amount of the delay can be Note also that, since the length of the delay is affected by the system performance, i.e., how busy the system is, it is difficult to identify a constant delay. However, if one does find the amount of delay by shifting correlation intervals, the delay information could be used for many other performance management related activities, such as performance prediction for certain metrics. One problem in looking for delays by shifting is that when you are looking at a large number of metrics, it is hard to know which ones to test and how much to shift. Trying everything is too expensive. In order to discover metrics that are likely to be related with a delay, we decided to combine intervals (that is, make two or more neighboring intervals into one by adding the utilizations of those intervals) and see if the correlation between the two metrics goes up significantly. If it does, then there is a reasonable chance that the two metrics are related via a delay. Note that under ordinary circumstances the correlation will go up some when fewer intervals are used. So to suspect that there is a delayed correlation, one must see a significant jump in the correlation coefficient. Of course, this must be confirmed directly, i.e., by correlating the metrics after a shift. For instance, we collected data at -second intervals for about half an hour and found that the System metric Exception Dispatches and the CPU utilization of a process called LSASS had a correlation coefficient of -.45, i.e., it appeared that these two metrics were not correlated (Figure 9). 8

9 LSASS Exception Dispatches LSASS Exception Dispatches Figure 9. Metric Exception Dispatches and the CPU utilization of a process LSASS had a correlation coefficient of Data is sampled at -second intervals. But when we converted the data into -second intervals, we found that the correlation coefficient jumped to.94 (Figure ). We confirmed that there was a delayed correlation by going back to the original data and matching the process metrics with the Exception Dispatches numbers that were recorded one interval later. The correlation coefficient was.97. Now if we go back to Figure 9 and look at the activities of the two metrics we can clearly see that, more or less, Exception Dispatches numbers react to LSASS process activity one interval later. Another scenario to consider is that two metrics could at times correlate with a delay and at other times correlate without a delay or with a different delay. Which happens might depend on how busy the machine is. This means that one has to be cautious when using data taken at a high sampling rate. Merging the data into effectively a lower sampling rate might be an appropriate standard procedure. This should allow correlated metrics with small varying delays to be identified effectively. Figure. When data is sampled at -second intervals, metric Exception Dispatches and the CPU utilization of process LSASS had a correlation coefficient (CC) of.94. Here we have normalized the data to. (Note that normalization does not change the CC.) 6. Correlation Quality Let us define the ideal correlation coefficient between two metrics as the limit of their correlation coefficient when we compute it at an unlimited number of sample points for a given time interval. Of course, we cannot expect to find precisely the ideal correlation coefficient. We discuss here the relationship between the accuracy of computing the correlation coefficient (how close we are to the ideal correlation coefficient) and the number of samples used to compute it. We look at this in two ways:. How many samples do you need to use to ensure that the computed correlation coefficient is within a predetermined of the ideal correlation coefficient 95% of the time?. If we use N samples to compute the correlation coefficient, what is the 95% confidence interval? 9

10 Generally, for our purposes, such as performance analysis activities listed in section, it is not essential that correlation coefficient be computed very accurately. However, it is useful to know that it is reasonably correct. The reason the second question is of special interest is described in Section 6.. Note that one can never be absolutely sure that a computed correlation coefficient has any given accuracy. All we can say is that there is a certain probability of having a particular accuracy. 6. How Many Samples to Use In general, based on the central limit theorem [], the accuracy grows with the square root of the number of samples. That is, if you use 4 times as many samples, the accuracy is doubled. For instance, if we use samples to compute a correlation coefficient and obtain a value of.8, then the 95% confidence interval is ±., and if we use samples it is ±.5 and with 4 samples it is ±.3. As high accuracy is not essential for our purposes, generally 5- samples will give an acceptable result. The following example should give some sense of how increasing the number of samples improves the accuracy of the computation. We computed sets of estimated correlation coefficients as follows: we started with data that had 6 samples. Then for iterations we chose samples at random and computed an estimated correlation coefficient for each. We repeated the experiment for 4 samples, for 3 samples and 8 samples. Finally, we put the coefficients into histogram buckets of size.5 and drew the graphs in Figure. As you can see, the graphs get sharper as we increase the number of samples. That is, the standard deviation decreases which means our confidence in any one computation increases. With 4 data samples or more, as we can see in Figure, the graph of correlation coefficients becomes normal-like and therefore starts to become fairly reliable. For instance, if the correlation threshold had been chosen as.55, then the graph implies that data whose ideal correlation coefficient is around.85 would almost certainly identified as correlated. But, we can infer that as the ideal correlation coefficient comes closer to the correlation threshold, there is an increasing chance that we will erroneously compute the correlation coefficient as less than the threshold and thus mistakenly conclude that the two metrics' correlation coefficient is below the threshold. However, there is a significant degree of uncertainty in choosing correlation thresholds, and an error of this type is equivalent to choosing a slightly different threshold Samples 4 Samples 3 Samples 8 Samples Figure. The distribution of the computed correlation coefficient when, 4, 3 and 8 samples are each chosen times. The example of samples is especially interesting. With these few samples, the graph is very far from normal, as it is more likely than with more samples, that the correlation coefficient will be near one. This is because it is more likely that with a small number of samples, the chosen samples are linearly (or nearly linearly) related. The lesson is: samples are much too few to give reliable results. Note that these computations were done with randomly chosen samples. It is logical to assume that if the samples were chosen more cleverly, then the correlation coefficient that we compute would

11 be more accurate. However, it would be difficult to do this in a way that is more efficient than just taking more samples. 6. When There are Too Many Metrics As we have mentioned in section, IT staffs and analysts often face many, sometimes too many, performance metrics. To identify the relationships among the metrics is a daunting task, even for computers (to finish within reasonable amount of time). When there are so many metrics, it is impractical to compute the correlation coefficient between all pairs of metrics with the desired accuracy. However, since we only care when the correlation coefficient is above a predefined correlation threshold, we can do a preliminary calculation to determine which pairs appear to be poorly correlated. If we can eliminate many such pairs with a quick computation (that does not eliminate pairs of interest), we will have greatly sped up our calculations. The basic idea is to compute initially the correlation coefficient with a substantially fewer number of samples than we intend to use. While this will give us a less reliable value for the correlation coefficient, as long as we adjust for the greater margin of error, it should be reliable enough to eliminate many metric pairs that are not of interest, i.e., unlikely to have an ideal correlation coefficient above the correlation threshold. On the other hand, once we decide a pair might be of interest, we can compute the correlation coefficient with more samples to get a more reliable value. Let us assume that we have chosen a correlation threshold of T to determine whether or not a significant correlation exists between two metrics. That is, if the correlation coefficient is below T, we will assume that there is no relationship between the metrics and, if it is above T, we will assume that there is a relationship. We wish to use N samples (with N being small) to estimate the correlation coefficient. What lesser threshold should we use to determine whether or not it is possible that the ideal correlation coefficient is below T? If it might be greater than T, we will then make a more accurate computation on a greater number of samples. But, if we can be reasonably sure that it is less than T, we can proceed to other pairs of metrics, and will have saved a considerable amount of computation. Let us require that to discard a pair we have to be 95% certain that its ideal correlation coefficient is less than T. Then, if we assume that the sequence is normally distributed, the algorithm is:. compute a statistic Z = + ln T T ; This is Fisher's Z transformation that produces an approximately normal statistic.. compute its standard deviation which is: σ z = ; N 3 3. then the lower threshold will be exp exp ( * ( Z.64σ z ) ( * ( Z.64σ ) + z (This is a the first equation solved for T, with Z replaced by Z.64 σ z..64 is taken from a table and is the multiplier for the standard deviation to get 95% certainty.) For instance, say that we wish to consider pairs of metrics only if the correlation coefficient exceeds T =.8 and that we wish to estimate the correlation coefficient by looking at samples. Then Z will be.986, σ z will be.45, and we should set the lower threshold at.649. That is, if the computation of the correlation coefficient using samples is less than.649, we can be 95% certain that the ideal correlation coefficient is less than our threshold of.8, so we can assume that we do not have to consider the pair further..

12 7. Summary To understand what is happening within a single computer or a network of computers, it is often necessary to make connections among events and metrics. Generally, the metrics made available by the operating system give little help. A priori, computing correlation coefficients appears to be an important technique to discover this information. However, real world data creates many issues that need to be dealt with before this technique can be effectively used. The solutions that we have presented should make using correlation coefficients a practical tool in the arsenal of performance analyst. 8. Reference [] Ding, Yiping and Newman, Kenneth, Automatic Workload Characterization, CMG Proceedings, Vol. pp 53-66, Orlando, Florida, December,.. [] Kachigan, Sam Kash, Statistical Analysis, An Interdisciplinary Introduction to Univariate & Multivariate Methods, Radius Press, New York, 986.

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 andrew.goldstein@yale.edu Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.

More information

Performance Workload Design

Performance Workload Design Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. 7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Module 4: Data Exploration

Module 4: Data Exploration Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT

TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT Regression analysis calculus provide convenient tools for estimating the optimum point for retiring vehicles from a large fleet.

More information

Credibility and Pooling Applications to Group Life and Group Disability Insurance

Credibility and Pooling Applications to Group Life and Group Disability Insurance Credibility and Pooling Applications to Group Life and Group Disability Insurance Presented by Paul L. Correia Consulting Actuary paul.correia@milliman.com (207) 771-1204 May 20, 2014 What I plan to cover

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Avram Golbert 01574669 agolbert@gmail.com Abstract: While Artificial Intelligence research has shown great success in deterministic

More information

Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

More information

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

The Association of System Performance Professionals

The Association of System Performance Professionals The Association of System Performance Professionals The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the measurement

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

2 Integrating Both Sides

2 Integrating Both Sides 2 Integrating Both Sides So far, the only general method we have for solving differential equations involves equations of the form y = f(x), where f(x) is any function of x. The solution to such an equation

More information

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued

More information

The Method of Least Squares

The Method of Least Squares The Method of Least Squares Steven J. Miller Mathematics Department Brown University Providence, RI 0292 Abstract The Method of Least Squares is a procedure to determine the best fit line to data; the

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

THE STATISTICAL TREATMENT OF EXPERIMENTAL DATA 1

THE STATISTICAL TREATMENT OF EXPERIMENTAL DATA 1 THE STATISTICAL TREATMET OF EXPERIMETAL DATA Introduction The subject of statistical data analysis is regarded as crucial by most scientists, since error-free measurement is impossible in virtually all

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

The Trip Scheduling Problem

The Trip Scheduling Problem The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems

More information

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Measuring Line Edge Roughness: Fluctuations in Uncertainty Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as

More information

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Year 9 set 1 Mathematics notes, to accompany the 9H book. Part 1: Year 9 set 1 Mathematics notes, to accompany the 9H book. equations 1. (p.1), 1.6 (p. 44), 4.6 (p.196) sequences 3. (p.115) Pupils use the Elmwood Press Essential Maths book by David Raymer (9H

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

More information

Pricing complex options using a simple Monte Carlo Simulation

Pricing complex options using a simple Monte Carlo Simulation A subsidiary of Sumitomo Mitsui Banking Corporation Pricing complex options using a simple Monte Carlo Simulation Peter Fink Among the different numerical procedures for valuing options, the Monte Carlo

More information

Characterizing Task Usage Shapes in Google s Compute Clusters

Characterizing Task Usage Shapes in Google s Compute Clusters Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers

More information

Florida Math for College Readiness

Florida Math for College Readiness Core Florida Math for College Readiness Florida Math for College Readiness provides a fourth-year math curriculum focused on developing the mastery of skills identified as critical to postsecondary readiness

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

A synonym is a word that has the same or almost the same definition of

A synonym is a word that has the same or almost the same definition of Slope-Intercept Form Determining the Rate of Change and y-intercept Learning Goals In this lesson, you will: Graph lines using the slope and y-intercept. Calculate the y-intercept of a line when given

More information

Industry Environment and Concepts for Forecasting 1

Industry Environment and Concepts for Forecasting 1 Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

High School Algebra Reasoning with Equations and Inequalities Solve equations and inequalities in one variable.

High School Algebra Reasoning with Equations and Inequalities Solve equations and inequalities in one variable. Performance Assessment Task Quadratic (2009) Grade 9 The task challenges a student to demonstrate an understanding of quadratic functions in various forms. A student must make sense of the meaning of relations

More information

Behavioral Entropy of a Cellular Phone User

Behavioral Entropy of a Cellular Phone User Behavioral Entropy of a Cellular Phone User Santi Phithakkitnukoon 1, Husain Husna, and Ram Dantu 3 1 santi@unt.edu, Department of Comp. Sci. & Eng., University of North Texas hjh36@unt.edu, Department

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

In recent years, Federal Reserve (Fed) policymakers have come to rely

In recent years, Federal Reserve (Fed) policymakers have come to rely Long-Term Interest Rates and Inflation: A Fisherian Approach Peter N. Ireland In recent years, Federal Reserve (Fed) policymakers have come to rely on long-term bond yields to measure the public s long-term

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Citi Volatility Balanced Beta (VIBE) Equity Eurozone Net Total Return Index Index Methodology. Citi Investment Strategies

Citi Volatility Balanced Beta (VIBE) Equity Eurozone Net Total Return Index Index Methodology. Citi Investment Strategies Citi Volatility Balanced Beta (VIBE) Equity Eurozone Net Total Return Index Citi Investment Strategies 21 November 2011 Table of Contents Citi Investment Strategies Part A: Introduction 1 Part B: Key Information

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

How to Win the Stock Market Game

How to Win the Stock Market Game How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

amleague PROFESSIONAL PERFORMANCE DATA

amleague PROFESSIONAL PERFORMANCE DATA amleague PROFESSIONAL PERFORMANCE DATA APPENDIX 2 amleague Performance Ratios Definition Contents This document aims at describing the performance ratios calculated by amleague: 1. Standard Deviation 2.

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

In this article, we go back to basics, but

In this article, we go back to basics, but Asset Allocation and Asset Location Decisions Revisited WILLIAM REICHENSTEIN WILLIAM REICHENSTEIN holds the Pat and Thomas R. Powers Chair in Investment Management at the Hankamer School of Business at

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Copyrighted Material. Chapter 1 DEGREE OF A CURVE

Copyrighted Material. Chapter 1 DEGREE OF A CURVE Chapter 1 DEGREE OF A CURVE Road Map The idea of degree is a fundamental concept, which will take us several chapters to explore in depth. We begin by explaining what an algebraic curve is, and offer two

More information

Characterization and Modeling of Packet Loss of a VoIP Communication

Characterization and Modeling of Packet Loss of a VoIP Communication Characterization and Modeling of Packet Loss of a VoIP Communication L. Estrada, D. Torres, H. Toral Abstract In this work, a characterization and modeling of packet loss of a Voice over Internet Protocol

More information

FORECASTING. Operations Management

FORECASTING. Operations Management 2013 FORECASTING Brad Fink CIT 492 Operations Management Executive Summary Woodlawn hospital needs to forecast type A blood so there is no shortage for the week of 12 October, to correctly forecast, a

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results Evaluation of a New Method for Measuring the Internet Distribution: Simulation Results Christophe Crespelle and Fabien Tarissan LIP6 CNRS and Université Pierre et Marie Curie Paris 6 4 avenue du président

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution Acta Polytechnica Hungarica Vol. 7, No., Skewness and Kurtosis in Function of Selection of Network Traffic Distribution Petar Čisar Telekom Srbija, Subotica, Serbia, petarc@telekom.rs Sanja Maravić Čisar

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Risk Analysis and Quantification

Risk Analysis and Quantification Risk Analysis and Quantification 1 What is Risk Analysis? 2. Risk Analysis Methods 3. The Monte Carlo Method 4. Risk Model 5. What steps must be taken for the development of a Risk Model? 1.What is Risk

More information

Data Cleansing for Remote Battery System Monitoring

Data Cleansing for Remote Battery System Monitoring Data Cleansing for Remote Battery System Monitoring Gregory W. Ratcliff Randall Wald Taghi M. Khoshgoftaar Director, Life Cycle Management Senior Research Associate Director, Data Mining and Emerson Network

More information