Bayesian sample size determination of vibration signals in machine learning approach to fault diagnosis of roller bearings

Size: px
Start display at page:

Download "Bayesian sample size determination of vibration signals in machine learning approach to fault diagnosis of roller bearings"

Transcription

1 Bayesian sample size determination of vibration signals in machine learning approach to fault diagnosis of roller bearings Siddhant Sahu *, V. Sugumaran ** * School of Mechanical and Building Sciences, VIT University, Chennai siddhant.sahu2011@vit.ac.in ** School of Mechanical and Building Sciences, VIT University, Chennai v_sugu@yahoo.com ABSTRACT Sample size determination for a data set is an important statistical process for analysing the data to an optimum level of accuracy and using minimum computational work. The applications of this process are credible in every domain which deals with large data sets and high computational work. This study uses Bayesian analysis for determination of minimum sample size of vibration signals to be considered for fault diagnosis of a bearing using pre-defined parameters such as the inverse standard probability and the acceptable margin of error. Thus an analytical formula for sample size determination is introduced. The fault diagnosis of the bearing is done using a machine learning approach using an entropy-based J48 algorithm. The following method will help researchers involved in fault diagnosis to determine minimum sample size of data for analysis for a good statistical stability and precision. KEYWORDS: Bayesian Analysis, Fault Diagnosis, Machine Learning, Posterior Estimates, Sample Size Determination. I. INTRODUCTION Bearings are an integral part of any machinery used in the industry nowadays. It is basically a machine element that enables relative motion of one part with respect to the other thus reducing the friction between any two parts. It enables linear movement of a specific part or rotary motion about a fixed axis. Thus, being such an essential component, if the bearing carries some fault or abnormality, it directly affects the efficiency of the machine and leads to further damage or wear and tear. This leads to a need of fault diagnosis thus ensuring proper working and maintenance of the machine in which it is being used.

2 This paper focuses on the machine learning approach for evaluating the fault in various sections of a bearing namely inner race and outer race. For this particular data set, four conditions of the bearing that were taken into account namely Normal, Inner Race Fault, Outer Race Fault and Inner Outer Race Fault. The machine learning approach uses a classifier which uses various decision tree algorithms to analyse the data, classify the conditions and finally provide the researcher with the results. The classifier requires a specific number of vibration signals to be able to classify the data with an optimum level of accuracy termed as the classifier accuracy. The major question that arises here is that what is the optimum number of vibration signals to be given as an input to the classifier to have an optimum level of classification accuracy for precise factor. A higher level of classifier accuracy makes it robust. Secondly, a larger number of vibrational signals, when given as an input into a classifier would lead to increase in the computational effort. Thus, in order to meet the above stated requirements, an optimum sample size needs to be chosen. In this paper, a Bayesian approach based method is used to evaluate the optimum sample size from a normally distributed population of a single type of statistical feature of the vibration signal which best represents the vibration signal as inferred by the classifier. This method is based on the Bayesian theory which uses the present or the prior information and the observed data to evaluate the posterior estimates or the estimated or expected data. Certain parameters of the prior and posterior data and used in a combined way to obtain the optimum sample size. This method is however different from the classical/frequentist method [1] which uses only the prior estimates to evaluate the sample size. Bayesian sample size determination has been applied in various fields, to name a few, clinical trials [2], Continuous medical tests [3], Stratified random sampling of pond water [4], ANOVA Bayesian point of view [5]. However, little work has been done for sample size determination in the field of fault diagnosis using machine learning approach. One of the works done is this field is Sample size Determination using power analysis [6]. Inspection of general corrosion of process components [7], Case-control studies with misclassification [8]. The literature survey for this study was found in R. Weiss [9] Adcock [10], De Santis [11]. The formula used in this study depends on the parameters namely sample standard deviation, prior standard deviation, acceptable margin of error, specified confidence level. The formula finally yields a sample size which ensures that the posterior estimate of the data does not exceed a pre-defined acceptable margin of error.

3 II. EXPERIMENTAL DATA The data set which is being considered for analysis in the present study is the same as the one used by Cohen (1988 [12]), Sugumaran, Sabareesh, and Ramachandran (2008 [13]). The details about the experiment that was conducted which include the setup, fault simulation details and experimental procedure have been explained in Sugumaran and Ramachandran (2007 [14]). The various parameters for the experiment are, sampling frequency (12,000 Hz) and sample length of (Every vibrational signal has a total count of 8192 data points). For this experimental setup, SKF6205 bearing was considered. Hundred vibration signals were taken by simulating four different conditions on the bearings namely Normal, Inner race fault, Outer Race Fault, Inner-outer race fault. Thus, the final data set consists of a total of 400 vibration signals. The transducer used for data acquisition was a piezo-electric type accelerometer. III. FEATURE EXTRACTION 3.1. Definitions Each vibration signal is a culmination of 8192 vibration values. Thus, in order to obtain parameters that suitably represent a specific vibration signal and distinguish it from the other, feature extraction is done. They are explained below as:- (a) Standard Error: It is defined as a measure of the amount of error in the prediction of y for an individual x in the regression, where x and y are the sample means and n is the sample size. Standard error of the predicted Y * ( ) [ ( )( )] ( ) + (b) Standard deviation: The measure of the effective energy or power content of the sound signal. The following formula was used for computation of standard deviation. Standard Deviation = ( ) ( ) (c) Sample variance: The measure of spread of the signal points. Simple Variance = ( ) ( ) (d) Kurtosis: Indicates the measure of flatness or the spikiness of the signal. ( ) Kurtosis = { ( ) } ( )( )( ) ( ) ( )( ) (e) Skewness: The characterization of the degree of asymmetry of a distribution about its mean. Skewness = ( )

4 (f) Range: Difference in global maxima and minima of the signal. (g) Minimum value: It is that value which is the global minima of the signal. (h) Maximum value: It is that value which is the global maxima of the signal. (i) Sum: Sum of all feature values for each sample. 3.2 Best Feature Selection Firstly, all the statistical features stated above were extracted from each vibration signal containing 8192 data points. Since, all the eleven extracted statistical features are not necessary for classification; these statistical features were subjected to dimensionality reduction. It is a process which eliminates the less useful or unwanted statistical features thus reducing the dimension of the data-set and making it more suitable to be given as an input to the classifier. The classification is done using the J48 algorithm whose theory, procedure and implementation are explained in Sugumaran et. al. (2007) [15]. The decision tree dimensionally reduces the statistical features to four most important features. In this study, the four features in the decreasing order of importance were Kurtosis, Standard Error, Range, Minimum. Out of these four features, further classification using J48 algorithm yields the best statistical feature that could represent the population of vibration signals as a whole with high statistically stability and accuracy. This is shown in the fig. 1. Fig. 1. Decision tree representing the best feature selection process From fig. 1, it can be inferred that the best statistical feature that represents the data is Kurtosis. Thus, the final data set that was used for analysis was the kurtosis of 400 vibration signals belonging to four different classes, hundred signals in each class.

5 IV. BAYESIAN APPROACH BASED METHOD Bayesian theory is based on the concept of using the prior data to calculate the posterior estimates. The prior information is defined as the information which is currently available or could be based on historical data and theoretical models. Posterior information is defined as the updated information which is obtained using Bayesian updating theory and thus yields a new set of inspection data for analysis. This method stands apart as it used the updated posterior estimates obtained from the prior information whose details and procedures are explained in M. Khalifa, Khan & Haddara (2011) [7]. 4.1 Margins of Error and Finite Population Correction Factor Margin of error of mean is the difference between the population mean, µ and the selected sample mean, µ s. mathematically, it is represented as: (1) Thus, this parameter determines the accuracy to which the selected sample represents the population as a whole. The lesser the margin of error, the more accurate is the sample, and the greater is the statistical stability. Acceptable margin of error is pre-defined value assigned by the user who is evaluating the sample size. It is one of the two major parameters which decide the optimum sample size using a specific formula. An adequate sample is chosen such that the margin of error does not exceed the pre-defined value at a particular confidence level. In this case, the confidence level signifies the likelihood that the deviation between the population and the sample estimate is not exceeding the margin of error. The margin of error is expressed as: ( ) (2) Where, P is the inverse standard normal probability at a defined confidence interval α. It is the probability for half the width of the confidence interval for the probability distribution of the mean. The above equation can be rearranged to obtain a formula for the required sample size: * ( ) + (3) The above formula is only valid if the population size (N) is so large in comparison to the sample size that it can be considered as infinite. However, in order to evaluate the sample size for the case where the population is finite, a correction factor is used which is termed as the finite population correction factor (FPCF) which is expressed as follows: (Bernstein & Bernstein, 1999 [16])

6 ( ) ( ) (4) The above approximation is done on the basis that the value of N is very large in comparison to Bayes Theorem and Posterior Estimates Bayes Theorem states that if x denotes a variable representing a specific data-set, and the distribution of x has a parameter β, such that the parameter β is a random variable with a distribution denoted by f(β), the updated or posterior probability distribution, ( ) of this prior distribution can be obtained using Bayesian updating theory using the following formula: ( ) ( ) ( ) ( ) ( ) (5) Where ( ) is the conditional probability of observing the outcome of x for a given β. This is also termed as the likelihood function of β. In order to evaluate the posterior estimate of the parameter β, the following formula can be used: ( ) (6) The data set considered in this study is the statistical feature which best represents the whole vibration signal, which in this case is the Kurtosis. Thus the variable x representing the data set in this case is kurtosis. Considering the sample mean of the kurtosis of vibration signals of different classes to be the parameter β, certain assumptions are taken. Assuming that the prior distribution of the sample β is normally distributed, using Bayes Theorem, the posterior sample mean and standard deviation can be evaluated as (Ang & Tang, 2007 [1]): ( ) ( ) ( ) ( ) (7) ( ) ( ) ( ) ( ) (8) Where, is the prior standard deviation of the sample mean probability distribution.

7 is the standard deviation of the sample of size n. is the prior mean of the population. As discussed earlier, if from a population of prior standard deviation σ, if all the possible samples of size n are sampled, then the posterior estimate of standard deviation of the probability distribution of the sample mean, can be expressed as: (9) Substituting equation (9) in equations (7) and (8) (10) ( ) ( ) ( ) ( ) (11) (12) (13) Where (14) Thus, we can estimate the posterior estimates of sample mean and standard deviation from the prior estimates using equations (10) and (13). 4.3 Sample Size Calculation Thus, the posterior estimates for sample mean and standard deviation can be estimated for a finite population using the finite population correction factor. Thus, combining equation (3) and (13): (15) Thus, in this case, the margin of error of mean will depend on the inverse standard normal probability at a specific confidence level and the posterior estimate of the standard deviation of the sample mean thus replacing the standard deviation of the prior standard deviation of sample mean.

8 ( ) (16) Thus, equation (15) and (16) can be used define a formula to evaluate the required sample size n at a pre-defined margin of error. [ ( ) ( )] * ( ) ( )+ (17) Where V. RESULTS AND DISCUSSION In this study, initially, all the thirteen statistical features were extracted from the vibration signals belonging to different classes. In order to perform dimensionality reduction on the data, initially, all the signals were given as in input to the classifier and the best features was identified from the decision tree obtained after applying the J48 algorithm. For this data set, the best feature was Kurtosis. The kurtosis of each vibration signal was separated from the data containing all the statistical features and finally, a population of 400 kurtosis values of vibration signals was obtained containing 100 values from each of the four conditions namely Normal, Inner Race Fault, Outer Race Fault, Inner-Outer Race Fault. The population mean and standard deviation was computed which were and respectively. Next, a Bayesian approach based method was applied to a multi-class problem where a confidence level of 95 % was considered and different values of pre-defined acceptable margin of error for each class were considered to obtain different sample sizes using the formula in equation (17). The sample size obtained using this formula is the total sample size which is the sum of sample sizes of all the four classes. Since, the population of vibration signals contains equal number of data points from each class, thus the same can be assumed for the chosen sample size as well. The formula in equation (17) was applied using an iterative process wherein, a specific sample size was assumed and then a random sample for the particular sample was generated.. The sample standard deviation, σ s was calculated from this randomly generated sample. If the sample size n and the sample standard deviation satisfy the above proposed formula, then the iterative process was stopped and the optimum sample size was achieved. In this study, the random sample was generated, whereas when the experiment is being actually conducted, the vibration signals should be randomly taken. The obtained sample sizes corresponding to different values of pre-defined margin of errors using a confidence level of 95 % are shown in Table 1.

9 Table 1: Sample sizes of vibration signals for different margin of errors. S. No. Margin of Error Total Sample Size Sample Size Per Class Variation of sample size with classification accuracy The variation of sample size with classification accuracy was studied by reducing 5 data points every time from the class of 100 data points and noting the respective classification accuracy which is shown in fig. 2. Fig. 2. Classification accuracy as a function of sample size. From the graph in fig. 2, it can be observed that classification accuracy is stable with less oscillation at sample sizes more than 50 and is less stable at sample sizes less than 50. It falls abruptly to a low value of 80 % below a sample size of 15. For obtaining a sample size, the classification accuracy of a class sample size of 100 will be considered as a reference value.

10 5.2 Variation of sample size with error values The graphs shown in fig. 3 and 4 depict the variation of the root mean square error and mean absolute error as a function of the sample size by using the same method as mentioned in section 7.1. Fig. 3. Variation of absolute mean error with sample size. Fig. 4. Variation of root mean square error with sample size. The two types of errors namely mean absolute error and root mean square error play an important role in the decision of an optimum sample size as these reflect upon the robustness of the classifier. The lesser the error, the better will be the classification accuracy. It can be inferred from fig. 3 and 4 that a sample size higher sample size yields a low absolute mean and root mean square error.. The error values increase drastically as the sample size falls below 20.

11 5.3 Optimum Sample Size Determination After considering all the arguments discussed in section 7.1, 7.2, 7.3, and considering the population size to be the reference value, it can be observed that, sample sizes around 45 per class yield a classification a classification accuracy of around 92 % which is prominently high and also yield a low absolute mean and root mean square error. Thus, it could represent a statistically stable sample. However, a sample size of 25 per class yields a classification accuracy of %, an absolute mean error of and a root mean square error of which is almost near to the parameters of the actual population which has an accuracy, absolute mean error and root mean square error of 90.25, 0.07, Though, the parameters of this sample size are comparatively lower as compared to those of the sample sizes near 45, but a sample size of 25 per class is a much lower sample size as compared to 45 per class and possesses an almost equally prominent statistically stability as the actual population. Choosing a total sample size of 100 (25 per class) would yield an accurate result and reduce computational effort as well. 5.4 Variation of sample size with pre-defined acceptable margin of error The variations of the sample size with the input parameters in the iterative formula were studied. The variation of sample size with the change in pre-defined acceptable value of margin of error for population of vibration signals for all the four conditions fixed confidence levels (1- σ) of 90 %, 95 %, 99%, 99.5%, 99.8% is shown in Fig. 5. Fig. 5. Variation Of Sample Size With Margin Of Error From the graph shown in fig. 5, it could be inferred that the sample size increases with a decrease in the pre-defined acceptable value margin of error. Lower margin of error yield a bigger, yet more precise sample size whereas, a higher margin of error yield a smaller, yet less

12 precise sample size. The basic aim is finding a sample size which is satisfactorily precise, and has good classification accuracy as well. 5.5 Variation of sample size with confidence interval The variation of sample size with the change in the inverse standard probability was studied for confidence intervals, (1-σ) of 90 %, 95 %, 99%, 99.5%, 99.8% keeping the pre-defined acceptable margin of error constant for the population. The evaluated results for population of all the four classes are shown in Fig. 6. The three curves correspond to margin of errors: 0.78, 1.56 and 2.34 respectively. Fig. 6. Variation of sample size with confidence level From the graph shown in fig. 6, it could be inferred that the sample size increases with the increase in the confidence level. At confidence levels beyond 99 %, the sample size increases drastically. Thus, a larger sample size indicates more likelihood that the deviation between the population and the sample estimate is not exceeding the margin of error. Confidence levels beyond 99 % represent an extremely high level of likelihood and larger sample sizes are required for such high levels. VI. CONCLUSION Thus, the sample size was calculated at a specific confidence level and an acceptable margin of error in the posterior estimate of mean both of which were defined by the user. This particular formula is applicable for a population of size N. The variation of the sample size with the predefined parameters was discussed. Further, the variation of classification accuracy, root mean square and mean absolute error is also discussed. The classification accuracy and the two types of errors were considered as a basic for choosing the optimum sample size with reference to the respective values of the original population to make the classifier robust and to maintain

13 statistical stability. Finally, a total sample size of 25 per class was chosen as the optimum sample size of vibration signals to be taken for fault diagnosis using machine learning. Thus, the number of vibration signals to be obtained randomly from various conditions of a bearing is reduced from 100 to 25 per class for future experiments. REFERENCES [1] Ang, A., & Tang, W., probability concepts in engineering (2 nd edition) (New York: John Wiley and Sons, 2007). [2] Pierpaolo Brutti, Fulvio De Santis, Robust Bayesian sample size determination for avoiding the range of equivalence in clinical trials, Journal of Statistical Planning and Inference, 138, 2008, [3] Dunlei Chenga, Adam J. Branscum, James D. Stamey, A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests, Computational Statistics and Data Analysis, 54, 2010, [4] A.A. Bartolucci, A.D. Bartolucci, S. Bae, K.P. Singh, A Bayesian method for computing sample size and cost requirements for stratified random sampling of pond water, Environmental Modelling & Software, 21, 2006, [5] Anirban Das Gupta, Brani Vidakovic, Sample size problems in ANOVA Bayesian point of view, Journal of Statistical Planning and Inference, 65, 1997, [6] V. Indira, R. Vasanthakumari & V. Sugumaran, Sample Size Determination Of Vibration Signals using Power Analysis, Expert Systems with Applications, 37, 2010, [7] Mohamed Khalifa, Faisal Khan & Mahmoud Haddara, Bayesian sample size determination for inspection of general corrosion of process components, Journal of Loss Prevention in the Process Industries, 25, 2012, [8] James Stameya, Richard Gerlach, Bayesian sample size determination for case-control studies with misclassification, Computational Statistics & Data Analysis, 51, 2007, [9] Robert Weiss, Bayesian sample size calculations for hypothesis testing, Journal of the Royal Statistical Society: Series D (The Statistician), 46(2), 1997, [10] Adcock, C. T., Sample size determination: a review, Journal of the Royal Statistical Society, Series D (The Statistician), 46(2), 1997,

14 [11] De Santis, F., Using historical data for Bayesian sample size determination, Journal of the Royal Statistical Society, Series A (Statistics in Society), 170 (1), 2007, [12] Cohen, J., statistical power analysis for the behavioural sciences (2nd edition), (Hillsdale, NJ: Erlbaum, 1988) [13] Sugumaran, Sabareesh, and Ramachandran, Fault diagnostics of roller bearing using kernel based neighbourhood score multi-class support vector machine, Expert Systems with Applications, 34(4), 2008, [14] Sugumaran, V., & Ramachandran, K. I., Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing, Mechanical System and Signal Processing, 211, 2007, [15] Sugumaran, V., Muralidharan, V., & Ramachandran, K. I., Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearings, Mechanical System and Signal Processing, 21, 2007, [16] Bernstein, S., & Bernstein, R., elements of statistics II, schaum s outlines series (1 st edition) (New York: McGraw-Hill).

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

Efficient Curve Fitting Techniques

Efficient Curve Fitting Techniques 15/11/11 Life Conference and Exhibition 11 Stuart Carroll, Christopher Hursey Efficient Curve Fitting Techniques - November 1 The Actuarial Profession www.actuaries.org.uk Agenda Background Outline of

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Measuring Line Edge Roughness: Fluctuations in Uncertainty Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Risk Analysis and Quantification

Risk Analysis and Quantification Risk Analysis and Quantification 1 What is Risk Analysis? 2. Risk Analysis Methods 3. The Monte Carlo Method 4. Risk Model 5. What steps must be taken for the development of a Risk Model? 1.What is Risk

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

APPLIED MATHEMATICS ADVANCED LEVEL

APPLIED MATHEMATICS ADVANCED LEVEL APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

PeakVue Analysis for Antifriction Bearing Fault Detection

PeakVue Analysis for Antifriction Bearing Fault Detection August 2011 PeakVue Analysis for Antifriction Bearing Fault Detection Peak values (PeakVue) are observed over sequential discrete time intervals, captured, and analyzed. The analyses are the (a) peak values

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1 Accelerated Mathematics 3 This is a course in precalculus and statistics, designed to prepare students to take AB or BC Advanced Placement Calculus. It includes rational, circular trigonometric, and inverse

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

More information

Synthesis of Four Bar Mechanism for Polynomial Function Generation by Complex Algebra

Synthesis of Four Bar Mechanism for Polynomial Function Generation by Complex Algebra Synthesis of Four Bar Mechanism for Polynomial Function Generation by Complex Algebra Mrs. Tejal Patel Mechanical Department Nirma Institute of Technology, Ahmedabad, India 09mme015@nirmauni.ac.in Prof.M.M.Chauhan

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Aachen Summer Simulation Seminar 2014

Aachen Summer Simulation Seminar 2014 Aachen Summer Simulation Seminar 2014 Lecture 07 Input Modelling + Experimentation + Output Analysis Peer-Olaf Siebers pos@cs.nott.ac.uk Motivation 1. Input modelling Improve the understanding about how

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Statistical estimation using confidence intervals

Statistical estimation using confidence intervals 0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Proceeding of 5th International Mechanical Engineering Forum 2012 June 20th 2012 June 22nd 2012, Prague, Czech Republic

Proceeding of 5th International Mechanical Engineering Forum 2012 June 20th 2012 June 22nd 2012, Prague, Czech Republic Modeling of the Two Dimensional Inverted Pendulum in MATLAB/Simulink M. Arda, H. Kuşçu Department of Mechanical Engineering, Faculty of Engineering and Architecture, Trakya University, Edirne, Turkey.

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Adaptive feature selection for rolling bearing condition monitoring

Adaptive feature selection for rolling bearing condition monitoring Adaptive feature selection for rolling bearing condition monitoring Stefan Goreczka and Jens Strackeljan Otto-von-Guericke-Universität Magdeburg, Fakultät für Maschinenbau Institut für Mechanik, Universitätsplatz,

More information

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA ABSTRACT Random vibration is becoming increasingly recognized as the most realistic method of simulating the dynamic environment of military

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Syllabus for the TEMPUS SEE PhD Course (Podgorica, April 4 29, 2011) Franz Kappel 1 Institute for Mathematics and Scientific Computing University of Graz Žaneta Popeska 2 Faculty

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Advantages of Auto-tuning for Servo-motors

Advantages of Auto-tuning for Servo-motors Advantages of for Servo-motors Executive summary The same way that 2 years ago computer science introduced plug and play, where devices would selfadjust to existing system hardware, industrial motion control

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Empirical Model-Building and Response Surfaces

Empirical Model-Building and Response Surfaces Empirical Model-Building and Response Surfaces GEORGE E. P. BOX NORMAN R. DRAPER Technische Universitat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK Invortar-Nf.-. Sachgsbiete: Standort: New York John Wiley

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information