# Applying the General Path Model to Estimation of. Remaining Useful Life

Save this PDF as:

Size: px
Start display at page:

Download "Applying the General Path Model to Estimation of. Remaining Useful Life"

## Transcription

5 X can be populated with any function of degradation measures, including higher order terms, interaction terms, and functions such as sin(x) or e x. If prior information is available for a specific model parameter, i.e. β j ~N(β jo,σ β ), then the matrix X should be appended with an additional row with value one at the j th position and zero elsewhere, and the Y matrix should be appended with the a priori value of the j th parameter. Y * y1 y ym j X * x x xm x1 n x n xmn Finally, the variance-covariance matrix is augmented with a final row and column of zeros, with the variance of the a priori information in the diagonal element. * y y1 y If knowledge is available about multiple regression parameters, the matrices should be appended multiple times with one row for each parameter. It is convenient to assume that the noise in the degradation measurements is constant and uncorrelated. Some a priori knowledge of the noise variance is available from the exemplar degradation paths. If this assumption is not valid for a particular system, then other methods of estimating the noise variance may be used; however, it has been seen anecdotally that violating this assumption does not have a significant impact on RUL estimation. In addition, it is also convenient to assume that the noise measurements are uncorrelated across observations of y; this allows the variance-covariance matrix to be a diagonal matrix consisting of noise variance estimates and a priori knowledge variance estimates. If this assumption is not valid, including covariance terms is trivial; again, these terms can be estimated from historical degradation paths x x x 1 m y m 1 j (5) (6) After a priori knowledge is used in conjunction with n current data observations to obtain a posterior estimate of degradation parameters, this estimate becomes the new prior distribution for the next estimation of regression parameters. The variance of this new knowledge is estimated as: s post æ = ç è 1 + n ö s y ø s prior -1 (7) The Bayesian information may be used to dynamically update the model fit as new data become available for each desired RUL estimate..4. Combined Monitoring and Prognostic Systems Figure 3 shows a combined monitoring, fault detection, and prognostics system similar to the one used in this research. The monitoring system employs an Auto-Associative Kernel Regression (AAKR) model for monitoring and the Sequential Probability Ratio Test (SPRT) for fault detection. Both of these methods are described in broad detail below. The interested reader is referred to (Hines et al., 8 and Garvey et al., 7) for a more complete discussion of AAKR and (Wald, 1943) for SPRT. Auto-Associative models can generally be considered an error correction technique. These models compare a new observation to those seen in the past to estimate how the system should be running. These corrected predictions can be compared to the measured data to identify faulted operation. Several auto-associative architectures are available, including auto-associative neural networks, auto-associative kernel regression, and multivariate state estimation technique (Hines et al., 8). This research employs the AAKR algorithm for system monitoring. AAKR is a non-parametric, empirical technique. Exemplar historical observations of system operation are stored in a data matrix. As a new observation is collected, it is compared to each of Figure 3: Combined Monitoring and Prognostic System 5

6 the exemplar observations to determine how similar the new observation is to each of the exemplars. This similarity is quantified by evaluating the distance between the new observation and the exemplar. Most commonly, the Euclidean distance is used: d i = m å j=1 ( X j - x i, j ) where d i is the distance between the new observation, X, and the i th exemplar, x i. The distance is converted to a similarity measure through the use of a kernel. Many kernels are available; this research employs the Gaussian kernel: æ s i = exp - d ö i ç è h ø where s i is the similarity of the new observation to the i th exemplar and h is the kernel bandwidth, which controls how close vectors must be to be considered similar. Finally, the corrected observation value is calculated as a weighted average of the exemplar observations: X ˆ ås i x i = ås i (8) (9) (1) Monitoring system residuals are then generated as the difference between the actual observation and the errorcorrected prediction. These residuals are used with a SPRT to determine if the system is operating in a faulted or nominal condition. As the name suggests, the SPRT looks at a sequence of residuals to determine if the time series of data is more likely from a nominal distribution or a pre-specified faulted distribution. As new observations are made, the SPRT compares the cumulative sum of the log-likelihood ratio: s lo g i s i 1 (11) to two thresholds, which depend on the acceptable false positive and false negative fault rates: æ b ö a» logç è 1-a ø æ b» log 1- b ö ç è a ø (1) where is the acceptable false alarm (false positive) rate and is the acceptable missed alarm (false negative) rate. For this research, false alarm and missed alarm rates of 1% and 1% respectively are used. If s i < a, then the null hypothesis cannot be rejected; that is, the system is assumed to be operating in a nominal condition. If s i > b, i then the null hypothesis is rejected; that is, the system is assumed to be operating in a faulted condition. When a determination is made, the sum, s i, is reset to zero and the test is restarted. After a fault is detected in the system, the prognostic system can be engaged to determine the RUL for the system. As discussed above, the GPM methodology uses a measure of system degradation, called a prognostic parameter, to make prognostic estimates. An ideal prognostic parameter has three key qualities: monotonicity, prognosability, and trendability. Monotonicity characterizes the underlying positive or negative trend of the parameter. This is an important feature of a prognostic parameter because it is generally assumed that systems do not undergo self-healing, which would be indicated by a nonmonotonic parameter. This assumption is not valid for some components such as batteries, which may experience some degree of self repair during short periods of nonuse, but it tends to hold for mechanical systems or for complex systems as a whole. Prognosability gives a measure of the variance in the critical failure value of a population of systems. A wide spread in critical failure values can make it difficult to accurately define a critical failure threshold and to extrapolate a prognostic parameter to failure. Prognosability may be very susceptible to noise in the prognostic parameter, but this effect may be reduced by traditional variance reduction methods such as parameter bagging and data denoising. Finally, trendability indicates the degree to which the parameters of a population of systems have the same underlying shape and can be described by the same functional form. The population of noise-free prognostic parameters shown in Figure 4 exhibits the three desired features. The parameters are monotonic: they all generally trend upward through time. They are prognosable: the parameter value at failure for each unit is at approximately the same value, as indicated by the red markers. Finally, they are trendable: each parameter appears to follow the same upward exponential or quadratic trend. Monitoring system residuals, or combinations of residuals, are natural candidates for prognostic parameters because they inherently measure the deviation of a system from normal operation. The following section investigates the application of this monitoring/prognostic method to the 8 PHM challenge problem. 6

7 Figure 4: Population of "good" prognostic parameters 3. PHM 8 CHALLENGE APPLICATION This section presents an application of the proposed GPM prognostic method to the PHM Challenge data set. The efficacy of the method is analyzed based on the given cost function for the 18 test cases. RUL estimates far from the actual value are penalized exponentially. The cost function is asymmetric; RUL predictions greater than the actual value are penalized more heavily than those which predict failure before it happens. The cost for each case is given by the following formula: d = RUL estimated - RUL actual score(d < ) = exp(-d /13) -1 score(d > ) = exp(d /1) -1 (13) where d is the difference between the estimated and the actual RUL. If d is negative, then the algorithm underestimates the RUL leading one to end operation before failure occurs; if d is positive, then the algorithm overestimates the RUL and results in a greater penalty because one may attempt to operate the component longer than possible and thereby experience a failure. The following sections give a brief description of the simulated data set used for the challenge problem, then outline the data analysis and identification of an appropriate prognostic parameter for GPM trending. Finally, the application of the GPM method and Bayesian updating are presented with final results given for the described method. The performance of the GPM algorithm with and without Bayesian updating is compared PHM Challenge Data Set Description The PHM Challenge data set consists of 18 cases of multivariate data that track from nominal operation through fault onset to system failure. Data were provided which modeled the damage propagation of aircraft gas turbine engines using the Commercial Modular Aerop- Propulsion System Simulation (C-MAPSS). This engine simulator allows faults to be injected in any of the five rotating components and gives output responses for 58 sensed engine variables. The PHM Challenge data set included 1 of these 58 output variables as well as three operating condition indicators. Each simulated engine was given some initial level of wear which would be considered within normal limits, and faults were initiated at some random time during the simulation. Fault propagation was assumed to evolve in an exponential way based on common fault propagation models and the results seen in practice. Engine health was determined as the minimum health margin of the rotating equipment, where the health margin was a function of efficiency and flow for that particular component; when this health indicator reached zero, the simulated engine was considered failed. The interested reader is referred to Saxena et al. (8) for a more complete description of the data simulation. The data have three operational variables altitude, Mach number, and TRA and 1 sensor measurements. Initial data analysis resulted in the identification of six distinct operational settings; based on this result, the operating condition indicators were collapsed into one indicator which fully defined the operating condition of the engine for a single observation (flight). In addition, ten sensed variables were identified whose statistical properties changed through time and were well correlated (linear correlation coefficient of at least.7, shown in Figure 5) to each other. In this way, the 4 sensor data set was reduced to 11 variables, with original variable numbers: 1 (the operating condition indicator), 5, 6, 7, 1, 14, 17, 18,, 3, and 4. The GPM method uses degradation information, either directly measured or inferred, to estimate the system RUL. Initial analysis of the raw data does Figure 5: Correlation Coefficient Matrix for Eleven Monitored Variables 7

8 Figure 6: Eleven PHM Data Set Variables not reveal any trendable degradation parameter. That is, no sensed measurement has an identifiable trend toward failure. Figure 6 is a plot of the eleven variables that were determined to statistically change with time. These variables were used to develop a monitoring and prognostics system. Visual inspection of the data does not indicate any obvious trends toward failure. The monitoring system provides much greater sensitivity to subtle changes that may be indicative of failure. 3.. Monitoring and Prognostics Results An AAKR model is used to determine the expected values of the eleven variables of interest. The baseline model is developed using the first 15% of each run as training data; this assumes that faults occur after at least 15% of each run is completed. This assumption is not universally valid, but seems to be reasonable for this data set. Based on the AAKR predictions, a residual is calculated between the nominal prediction and the actual value. These residuals are potential candidates for inclusion into the degradation parameter. Figure 7 is a plot of sensed variable 17 and the corresponding residual for five of the training cases. The final value for each of the five cases is indicated in the lower plot by red asterisks. In this case, the residual does not provide a useful prognostic parameter. The residual is not trendable; that is, the five cases show several distinct residual shapes. In addition, the residual's prognosability is not high. The residuals end at very different values for each case. This could be indicative of different failure modes, but is not directly useful as a degradation parameter. For the purpose of this analysis, it is assumed that all fault modes may be lumped together into one prognostic model. Therefore, a single degradation parameter is desired to prognose all systems. Several of the residuals grow in a similar manner with time for all the units and have failure values without much variation. These residuals can be used as a degradation parameter by trending them through regression and extrapolating the functional fit to some degradation threshold to give an estimate of RUL. The top plot in Figure 8 shows one such sensed variable while the lower plot is the residual between the predicted and measured value. It shows the unfiltered residual for 5 different training cases. This variable is a good prognostic parameter because the corresponding monitoring system residuals of each of the training systems have the same basic shape and failures occur at approximately the same negative value. The task is to model the degradation parameter and predict the failure point when only a subset of the case is given. Five residuals were found to have a similar shape with well clustered values at failure. Figure 7: Residual trend indicating possible different failure modes Figure 8: Residual trend candidate for degradation parameter 8

11 Figure 14: GPM Results With and Without Bayesian Updating Figure 15: Weibull Probability Fit The GPM/Type I model which does not include prior information gives an average error of approximately 55% when only 5% of the full lifetime is available and relies on the Type I method for approximately half of the cases. Conversely, the GPM/Bayes method gives approximately 5% error and is able to predict an RUL for every case. As Figure 14 shows, the average error of both methods decreases as more data becomes available and eventually converges to approximately equal error values when the available data overpowers the prior information in the GPM/Bayes model. 4. CONCLUSION This paper presented a method for performing prognostics on individual components or systems. The General Path Model (GPM) method is used to extrapolate a prognostic parameter curve to a predefined critical failure threshold to obtain an estimate of the Remaining Useful Life (RUL). In cases where only a few data points are available or the data are contaminated by significant noise, a Bayesian method was introduced. The Bayesian method includes prior information about the prognostic parameter distribution to "force" the functional fit to follow the trend seen in historic systems. The method was applied to the 8 PHM conference challenge problem to illustrate its efficacy. The given application utilized the results of a condition monitoring and fault detection system to characterize the degradation in a specific system. A prognostic parameter was generated from a subset of the monitoring system residuals; monitoring system residuals are well-suited components of a prognostic parameter because they naturally characterize the deviation of a system from nominal condition. A parametric, linear-in-parameters regression fit of the time series prognostic parameter was extrapolated to the critical failure threshold to give an estimate of the system RUL. A Bayesian updating method was applied to allow for the inclusions of prior information, which improves model performance particularly when faced with small amounts of data or extremely noisy data. The results show that the GPM/Bayes method greatly improved RUL predictive performance over a conventional regression solution. The need for a diversity of algorithms suggests that development of a large variety of prognostic methods can only strengthen the field. While the algorithm described here may not be the best performing method for this data set, it has several key advantages that the winning PHM Challenge algorithms lack, which may make it better suited for other applications. The proposed GPM/Bayes algorithm is qualitatively compared to the three best performing algorithms at the 8 PHM Challenge in the following discussion. The similarity-based approach described in (Wang et al., 8) shares several assumptions with the GPM method, namely (1) run-to-failure data from multiple units are available and () the history of each training unit ends at a soft failure, but may begin at some random level of initial degradation. However, this similarity-based method suffers the same deficiency that all similarity-based models suffer; it is only applicable within the range of data used for training. The proposed GPM/Bayes method will trend toward the training data when the Bayesian information is dominant, early in equipment degradation or when data are very noisy, but as more data become available, the method will accommodate degradation paths outside those seen in training. Additionally, the proposed similarity method requires storage of a large bank of historical data. This may not be a problem for large computer systems, but it can become cumbersome for onboard prognostic algorithms and systems with many fault 11

12 modes requiring many historical paths. Conversely, the GPM/Bayes method requires storage for only the regression model to be fit and the Bayesian prior information. The second and third place submissions both focused on recurrent neural networks for prognostic estimation (Heimes, 8; Peel, 8). Neural networks require a certain level of expertise and finesse to develop. While they are very powerful modeling tools, neural networks lack the accessibility of the GPM/Bayes method or other regression models. A well-developed neural network may outperform many other prognostic algorithms, but development is not a trivial task. Neural network approaches should not be discounted by any means, but the advantage of the GPM/Bayes method is its relative simplicity. ACKNOWLEDGEMENT We would like to acknowledge funding from Ridgetop Group, Inc. under Grant Number NAV-UTK-7, entitled Prognostics and Health Management (PHM) for Digital Electronics Using Existing Parameters and Measurands. The information and conclusions presented herein are those of the authors and do not necessarily represent the views or positions of Ridgetop Group. REFERENCES Callan, R., B. Larder, and J. Sandiford (6), An Integrated Approach to the Development of an Intelligent Prognostic Health Management System, Proc. Of 6 IEEE Aerospace Conference, Big Sky, MT. Carlin, B.P and T.A. Louis (), Bayes and Empirical Bayes Methods for Data Analysis, nd ed. Boca Raton: Chapman and Hall/CRC. Chinnam, R.B. (1999), "On-line Reliability Estimation of Individual Components, Using Degradation Signals", IEEE Transactions on Reliability, Vol 48, No 4, pp Garvey, D. and J.W. Hines (6), Robust Distance Measures for On-Line Monitoring: Why Use Euclidean? 7 th International Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS) Conference on Applied Artificial Intelligence. Garvey, J., R. Seibert, D. Garvey, and J.W. Hines (7), "Application of On-Line Monitoring Techniques to Nuclear Plant Data," Nuclear Engineering and Technology, Vol. 39 No.. Gelman, A., J. Carlin, H. Stern, and D. Rubin (4), Bayesian Data Analysis nd ed. Boca Raton: Chapman and Hall/CRC. Girish, T., S.W. Lam, J.S.R. Jayaram (3), "Reliability Prediction Using Degradation Data A Preliminary Study Using Neural Network-based Approach," Proc. European Safety and Reliability Conference (ESREL 3), Maastricht, The Netherlands, Jun Heimes, F.O. (8), Recurrent Neural Networks for Remaining Useful Life Estimation, PHM '8, Denver CO, Oct 6-9. Hines, J.W., D. Garvey, R. Seibert, A. Usynin, and S.A. Arndt, (8) Technical Review of On- Line Monitoring Techniques for Performance Assessment, Volume : Theoretical Issues, NUREG/CR-6895 Vol. Hines, J.W., J. Garvey, J. Preston, and A. Usynin (7), Empirical Methods for Process and Equipment Prognostics, Reliability and Maintainability Symposium RAMS. Koppen, M. (4), No-Free-Lunch Theorems and the Diversity of Algorithms, Congress on Evolutionary Computation, 1, pp Kothamasu, R., S.H. Huang, and W.H. VerDuin (6), System Health Monitoring and Prognostics A Review of Current Paradigms and Practices, International Journal of Advanced Manufacturing Technology 8: Lindely, D.V. and A.F. Smith (197), "Bayes Estimates for Linear Models," Journal of the Royal Statistical Society (B), Vol 34, No 1, pp Lu, C.J. and W.Q. Meeker (1993), "Using Degradation Measures to Estimate a Time-to- Failure Distribution," Technometrics, Vol 35, No, pp Meeker, W.Q., L.A. Escobar, and C.J. Lu (1998), Accelerated degradation tests: modeling and analysis, Technometrics, vol. 4 () pp Oja, M., J.K. Line, G. Krishnan, R.G. Tryon (7), Electronic Prognostics with Analytical Models using Existing Measurands, 61 st Conference of the Society for Machinery Failure Prevention Technology (MFPT) 7, Virginia Beach, April Pecht, M. and A. Dasgupta (1995), Physics of Failure: An Approach to Reliable Product Development, Journal of the Institute of Environmental Sciences 38 (5): Peel, L. (8), Data Driven Prognostics using a Kalman Filter Ensemble of Neural Network Models, PHM '8, Denver CO, Oct 6-9. Robinson, M.E. and M.T. Crowder (), "Bayesian Methods for a Growth-Curve Degradation Model with Repeated Measures," Lifetime Data Analysis, Vol 6, pp Saxena, A., K. Goebel, D. Simon, N. Eklund (8), "Prognostics Challenge Competition Summary: Damage Propagation Modeling for Aircraft 1

### SureSense Software Suite Overview

SureSense Software Overview Eliminate Failures, Increase Reliability and Safety, Reduce Costs and Predict Remaining Useful Life for Critical Assets Using SureSense and Health Monitoring Software What SureSense

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Health Management for In-Service Gas Turbine Engines

Health Management for In-Service Gas Turbine Engines PHM Society Meeting San Diego, CA October 1, 2009 Thomas Mooney GE-Aviation DES-1474-1 Agenda Legacy Maintenance Implementing Health Management Choosing

### System Aware Cyber Security

System Aware Cyber Security Application of Dynamic System Models and State Estimation Technology to the Cyber Security of Physical Systems Barry M. Horowitz, Kate Pierce University of Virginia April, 2012

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts

Page 1 of 20 ISF 2008 Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Andrey Davydenko, Professor Robert Fildes a.davydenko@lancaster.ac.uk Lancaster

### Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

### A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### CBM IV Prognostics and Maintenance Scheduling

FL Lewis, Assoc Director for Research Moncrief-O Donnell Endowed Chair Head, Controls, Sensors, MEMS Group Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington CBM IV Prognostics

### Statistical basics for Biology: p s, alphas, and measurement scales.

334 Volume 25: Mini Workshops Statistical basics for Biology: p s, alphas, and measurement scales. Catherine Teare Ketter School of Marine Programs University of Georgia Athens Georgia 30602-3636 (706)

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

### 3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### A Proven Approach to Stress Testing Consumer Loan Portfolios

A Proven Approach to Stress Testing Consumer Loan Portfolios Interthinx, Inc. 2013. All rights reserved. Interthinx is a registered trademark of Verisk Analytics. No part of this publication may be reproduced,

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### DEVELOPMENT OF AN ONLINE PREDICTIVE MONITORING SYSTEM FOR POWER GENERATING PLANTS

DEVELOPMENT OF AN ONLINE PREDICTIVE MONITORING SYSTEM FOR POWER GENERATING PLANTS Randall Bickford Expert Microsystems, Inc. Orangevale, California rando@expmicrosys.com Richard Rusaw South Carolina Electric

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Propulsion Gas Path Health Management Task Overview. Donald L. Simon NASA Glenn Research Center

Propulsion Gas Path Health Management Task Overview Donald L. Simon NASA Glenn Research Center Propulsion Controls and s Research Workshop December 8-10, 2009 Cleveland, OH www.nasa.gov 1 National Aeronautics

### Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

### pp. 4 8: Examples 1 6 Quick Check 1 6 Exercises 1, 2, 20, 42, 43, 64

Semester 1 Text: Chapter 1: Tools of Algebra Lesson 1-1: Properties of Real Numbers Day 1 Part 1: Graphing and Ordering Real Numbers Part 1: Graphing and Ordering Real Numbers Lesson 1-2: Algebraic Expressions

### How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power

How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 391-9971 Fax: (310) 391-8971

### Market Risk: FROM VALUE AT RISK TO STRESS TESTING. Agenda. Agenda (Cont.) Traditional Measures of Market Risk

Market Risk: FROM VALUE AT RISK TO STRESS TESTING Agenda The Notional Amount Approach Price Sensitivity Measure for Derivatives Weakness of the Greek Measure Define Value at Risk 1 Day to VaR to 10 Day

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### Linear and Piecewise Linear Regressions

Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

### The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Advanced Higher Mathematics Course Assessment Specification (C747 77)

Advanced Higher Mathematics Course Assessment Specification (C747 77) Valid from August 2015 This edition: April 2016, version 2.4 This specification may be reproduced in whole or in part for educational

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Calculating Interval Forecasts

Calculating Chapter 7 (Chatfield) Monika Turyna & Thomas Hrdina Department of Economics, University of Vienna Summer Term 2009 Terminology An interval forecast consists of an upper and a lower limit between

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

### Linear regression methods for large n and streaming data

Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is

### Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

### Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Outline. Random Variables. Examples. Random Variable

Outline Random Variables M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Random variables. CDF and pdf. Joint random variables. Correlated, independent, orthogonal. Correlation,

### Quantitative Inventory Uncertainty

Quantitative Inventory Uncertainty It is a requirement in the Product Standard and a recommendation in the Value Chain (Scope 3) Standard that companies perform and report qualitative uncertainty. This

### Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

### An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

, pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Predictive Analytics uses data

Predictive Analytics in Equipment Reliability Programs Predictive Analytics in conjunction with Threat Based Maintenance (TBM ) reduces and, in some cases, can eliminate the need to perform time-based

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 0-1 Sets There are no state-mandated Precalculus 0-2 Operations

### Lessons Learned From the U.S. Nuclear Power Plant On-Line Monitoring Programs

Lessons Learned From the U.S. Nuclear Power Plant On-Line Monitoring Programs J. Wesley Hines Nuclear Engineering Department, The University of Tennessee, Knoxville, Tennessee 37996-2300 jhines2@utk.edu

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

### Industry Environment and Concepts for Forecasting 1

Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6