DOEACC Society Ministry of C&IT, Govt. of India. PSIT, Kanpur, U.P., India

Size: px
Start display at page:

Download "DOEACC Society Ministry of C&IT, Govt. of India. PSIT, Kanpur, U.P., India"

Transcription

1 Necessity of Goodness of Fit Tests in Research and Development 1 Sanjeev Kumar Jha, 2 Dr. A.K.D.Dwivedi, 3 Dr. Amod Tiwari 1,2 DOEACC Society Ministry of C&IT, Govt. of India 3 PSIT, Kanpur, U.P., India Abstract There are various life data distributions which can be used for reliability and other statistical analysis. The main problem in these types of analysis is identification of distributions to be fitted for collected data. Thus it is essential to identify appropriate distribution to be fitted for given data. Goodness of fit Test is a technique which is used for identification of appropriate distribution to be fitted for given data. In this paper Goodness of fit Test is discussed in detail. Initially theoretical background of this test is explained and then entire test is applied on live data collected from online bug repositories for Linux kernel 2.6 series. As there are different sub versions of Linux Kernel, thus these records are further grouped on the basis of sub versions and different samples are constructed. On each of these samples all three important methods for goodness of fit test, and Chi-squared Tests are used and result is analyzed. On the basis of these results necessity of goodness of fit test is explained. Keywords Goodness of Test, Open Source, Reliability, Linux Kernel, Distribution. I. Introduction The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions or whether outcome frequencies follow a specified distribution. There are various methods used for goodness of fit test. Most important among them are as given below: A. Test [1] This test is used to decide if a sample comes from a hypothesized continuous distribution. It is based on the empirical cumulative distribution function (ECDF). Assume that we have a random sample x 1,.,x n from some distribution with CDF F(x). The empirical CDF is denoted by 1. Definition The statistic (D) is based on the largest vertical difference between the theoretical and the empirical cumulative distribution function: 2. Hypothesis Testing [4] The null and the alternative hypotheses are: (1) (2) H0: the data follow the specified distribution. HA: the data do not follow the specified distribution. The hypothesis regarding the distributional form is rejected at the chosen significance level (α) if the test statistic, D, is greater than the critical value obtained from a table. The fixed values of α (0.01, 0.05 etc.) are generally used to evaluate the null hypothesis (H0) at various significance levels. A value of 0.05 is typically used for most applications, however, in some critical industries; a lower (α) value may be applied. The standard tables of critical values used for this test are only valid when testing whether a data set is from a completely specified distribution. If one or more distribution parameters are estimated, the results will be conservative: the actual significance level will be smaller than that given by the standard tables and the probability that the fit will be rejected in error will be lower. 3. The P-value, in contrast to fixed α values, is calculated based on the test statistic, and denotes the threshold value of the significance level in the sense that the null hypothesis (H0) will be accepted for all values of α less than the P-value. For example, if P=0.025, the null hypothesis will be accepted at all significance levels less than P (i.e and 0.02), and rejected at higher levels, including 0.05 and 0.1. The P-value can be useful in particular, when the null hypothesis is rejected at all predefined significance levels, and you need to know at which level it could be accepted. B. Test [3] The procedure is a general test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails than the test. 1. Definition The statistic (A2) is defined as 2. Hypothesis Testing The null and the alternative hypotheses are: H0: the data follow the specified distribution. H A : the data do not follow the specified distribution. The hypothesis regarding the distributional form is rejected at the chosen significance level (α) if the test statistic, A 2, is greater than the critical value obtained from a table. The fixed values of α (0.01, 0.05 etc.) are generally used to evaluate the null hypothesis (H0) at various significance levels. A value of 0.05 is typically used for most applications however; in some critical industries a lower α value may be applied. In general, critical values of the Anderson- Darling test statistic depend on the specific distribution being tested. However, tables of critical values for many distributions (except several the most widely used ones) are not easy to find. The test implemented in Easy Fit uses the same critical values for all distributions. These values are calculated (3) International Journal of Computer Science and Technology 135

2 using the approximation formula, and depend on the sample size only. This kind of test (compared to the original A-D test) is less likely to reject the good fit, and can be successfully used to compare the goodness of fit of several fitted distributions. C. Test [2] The test is used to determine if a sample comes from a population with a specific distribution. This test is applied to binned data, so the value of the test statistic depends on how the data is binned. This test is used for continuous sample data only. Although there is no optimal choice for the number of bins (k), there are several formulas which can be used to calculate this number based on the sample size (N). For example, Easy Fit employs the following empirical formula: k=1+log 2 N. The data can be grouped into intervals of equal probability or equal width. Each bin should contain at least 5 or more data points, so certain adjacent bins sometimes need to be joined together for this condition to be satisfied. 1. Definition The statistic is defined as: (4) where O i is the observed frequency for bin i, and E i is the expected frequency for bin i calculated by E i = F( x2 )-F( x1 ),where F is the CDF[6] of the probability distribution being tested, and x 1, x 2 are the limits for bin i. 2. Hypothesis Testing The null and the alternative hypotheses are: H 0 : the data follow the specified distribution. H A : the data do not follow the specified distribution. The hypothesis regarding the distributional form is rejected at the chosen significance level (α) if the test statistic is greater than the critical value defined as meaning the inverse CDF with k-1 degrees of freedom and a significance level of α. Though the number of degrees of freedom can be calculated as k-c-1 (where c is the number of estimated parameters), Easy Fit calculates it as k-1 since this kind of test is least likely to reject the fit in error. The fixed values of α (0.01, 0.05 etc.) are generally used to evaluate the null hypothesis (H 0 ) at various significance levels. A value of 0.05 is typically used for most applications, however, in some critical industries; a lower α value may be applied. 3. The P-value [6], in contrast to fixed α values, is calculated based on the test statistic, and denotes the threshold value of the significance level in the sense that the null hypothesis (H 0 ) will be accepted for all values of α less than the P-value. For example, if P=0.025, the null hypothesis will be accepted at all significance levels less than P (i.e and 0.02), and rejected at higher levels, including 0.05 and 0.1. The P-value can be useful; in particular, when the null hypothesis is rejected at all predefined significance levels, and you need to know at which level it could be accepted. Easy Fit displays the P-values based on the test statistics (χ 2 ) calculated for each fitted distribution. In this research goodness of fit tests are applied on failure data of Linux Kernel 2.6 series collected from various bug repositories. For statistical calculations Easy Fit which can be easily incorporated with Microsoft Excel 2007 is used. 136 International Journal of Computer Science and Technology II. Related Work The concept of goodness of fit is used for identification of statistical model for different analysis. In research area this concept is used in almost all the fields directly or indirectly. Here we have studied several papers and discussed the situations where this test is used. In [3] authors have used test to identify whether Weibull Distribution is suitable for collected sample or not. In [4] Test is used to identify whether the selected distribution is suitable for sample or not. In [7] Bayesian approach for software reliability is discussed. In this paper authors have used Test to identify the best distributions. In [8] user oriented reliability model is derived and used. In [17] authors have used Weibull distribution for construction of reliability model. A. Problem Identification After a detail study of research papers, articles and books related to reliability and other statistical analysis it has been found that in maximum of researches either a single goodness of fit test is applied or a particular distribution is selected randomly. Very little importance is given to Goodness of fit test due to which sometimes researchers got unexpected results. The main problems in all these researches are: If a distribution is accepted by using a particular goodness of fit test will it be accepted by all other tests? Can we select a particular distribution randomly for fitting on a given sample and hence for statistical analysis? Here in this research to answer above questions Goodness of fit test is applied on collected samples and result is analyzed. III. Research Methodology used To answer above questions we have collected failure data of Linux kernel 2.6 series from online bug repositories Bugzilla [18]. Total of 2821 records were collected. Initially these data were in raw format. During initial analysis phase all irrelevant records were deleted and finally we have selected 1600 records. The preprocessed data is stored in Mysql Table. The structure of table is as given below: Field Data Type Purpose Bug_Id Opendate Bug_ Severity Int Date Varchar Identification Number (Primary Key) Date of submission of the bug How severe is this bug Product Varchar Module of the kernel Component Varchar Sub module of the kernel cf_kernel_ version ISSN : (Print) ISSN : (Online) Varchar Sub Version of the kernel It is clear from above that there are different sub versions of Linux kernel 2.6 series with different release date. Hence minimum value of Open date for each of the sub versions will be different. Thus on the basis of kernel sub versions, the entire sample is subdivided into different sub samples and then goodness of fit test is applied on each of these sub samples. Separate samples are constructed for each of the kernel subversions from to Here in this paper to discuss the necessity of goodness of fit only versions is considered and discussed in detail. Time to Failure Data

3 for each of these two samples is calculated and goodness of fit test is applied on these data. For mathematical calculation Easy fit 5.5 software (which is very light weight software) with Microsoft Excel 2007 is used. Goodness of fit Test for both the samples are discussed here. A. Sub Version [Release Date: 24th January 2008] This sub version of the Linux Kernel was released on 24th January 2008 [19]. For Linux kernel series we have total of preprocessed bugs. Time to Failure Data in terms of week is as given below: Table 1 : TTFDATA Goodness of Fit Test is applied for collected sample data mentioned in Table 1. For goodness of fit test all important life data distributions are tested by using all three tests: Kolmogorov Smirnov Test, Anderson Darling Test and Chi Square Test. All above tests are applied on 1%, 2%, 5%, 10% and 20% level of significance. The detail of the test is elaborated in Table2. Table 2: Goodness of Fit Detail Kernel Fatigue Life Distribution Critical Value Deg. of freedom Fatigue Life (3P) Frechet Deg. of freedom E-5 Critical Value Frechet (3P) Deg. of freedom International Journal of Computer Science and Technology 137

4 Gamma E-5 Gamma (3P) Gen. Gamma E-4 Gen. Gamma (4P) Reject? Yes Yes Yes No No Gumbel Max Deg. of freedom E-4 Critical Value Gumbel Min ISSN : (Print) ISSN : (Online) E International Journal of Computer Science and Technology

5 Deg. of freedom E-6 Critical Value Kumaraswamy E Log-Logistic Log-Logistic (3P) Logistic E Lognormal Lognormal (3P) International Journal of Computer Science and Technology 139

6 Normal E Deg. of freedom Critical Value Reject? Yes Yes Yes Yes No Weibull Reject? Yes Yes Yes No No Weibull (3P) 140 International Journal of Computer Science and Technology Rank ISSN : (Print) ISSN : (Online) IV Conclusions After analyzing the results of Table 2 it has been found that there are various distributions available which are accepted by using one test while rejected by other test. Let us take the case of Gen. Gamma Distribution with 4 parameters, this distribution is rejected by Test at all level of significance but accepted by having at 1% and 2% level of significance as well as by Chi square test having P value at all level of significance. Thus here it is clear that merely acceptance by a single test does not guarantee its acceptance by all other tests. Similarly Weibull distribution with 3 parameters which is one of widely used life data distribution is rejected by Anderson darling but it is accepted by Chi Square Test. Thus without proper goodness of fit test one should not decide about the distribution to be fitted. We should consider those distributions for fitting which is accepted by maximum number of tests at maximum level of significances. Here for this version of the kernel Frechet distribution with 3 parameters is considered as best fit. Thus it indicates necessity of goodness of fit test in every research. If it is not considered properly normally we get unexpected results. Thus we must use goodness of fit test properly for decision about distribution to be fitted. This paper will be useful for researchers from all the fields. References [1] " Test", [Online] Available : en.wikipedia.org/wiki/kolmogorovsmirnov_test. [2] "General Concept of Goodness of Fit". [Online] Available : htm. [3], "A Goodness of Fit Test for Small Sample", [Online] Available : pdf/a_dtest.pdf [4] J.L., RAC START, " GOF Test Romeu: A Publication of the Reliability Centre", Volume 10 Number 6. [5] On line Manual: Easy Fit 5.5 [6] Rohatagi, V.K., "An Introduction to Probability Theory and Mathematical s", Willey. [7] Littlewood, Verrall, A Bayesian Reliability Growth Model for Computer Software.

7 [8] R.C. Cheung, A user-oriented software reliability model, IEEE Transactions on Software Engineering, vol. 6, no. 2 [9] J.De.S Coutinho, Software reliability growth. IEEE Symposium on Computer Software Reliability, [10] A.L. Goel, K. Okumoto, A time-dependent error-detection rate model for software reliability and other performance measure, IEEE Transactions on Reliability, vol. R-28, [11] S.S. Gokhale, M.R. Lyu, K.S. Trivedi, Reliability simulation of component-based software systems, Proceedings of 9th Int l Symposium on Software Reliability Engineering, [12] IEEE Reliability Society, IEEE recommended practice on software reliability, IEEE Std , June [13] Littlewood, J.L. Verrall, A bayesian reliability model with a stochastically monotone failure rate, IEEE Transactions on Reliability, vol. R-23, June [14] A. Mockus, T.R. Fielding, J.D. Herbsleb, Two case studies of open source software development: Apache and Mozilla, ACM Transactions on Software Engineering and Methodology, vol. 11, no. 3, July [15] T.R. Moss, The Reliability Data Handbook, ASME Press, [16] J.D. Musa, K. Okumoto, A logarithmic poisson execution time model for software reliability measurement, 7th Int l Conference on Software Engineering (ICSE), [17] Cobra Rahmani, Harvey Siy, Azad Azadmanesh An experimental Analysis of open Source Software Reliability. [18] "On line bug repositories Bugzilla", [Online] Available : [19] Linux Kernel, [Online] Available : org. Sanjeev Kumar Jha, Senior Systems Analyst, DOEACC Society Chandigarh B.O: Lucknow, Ministry of CO&IT Government of India. Bachelors and Masters Degree in s [Honors] from Patna University and presently doing his PhD in Computer Science from Singhania University. He has attended 2 international Conferences. Area of research interest is Reliability and Open Source Software Dr. A.K.D. Dwivedi, Director, DOEACC Society Chandigarh, Ministry of C&IT Government of India. M.Tech from Allahabad University and PhD from Gorakhpur University. Attended many international and national Conferences. Published many papers in international and national journals. Executed many government projects successfully. Area of research interest is Signal Processing, Reliability and Open Source Software. Dr. Amod Tiwari, Associate Professor, PSIT Kanpur. PhD from IIT Kanpur. Attended many international and national Conferences. Published many papers in international and national journals. Area of research interest is Image Processing, Reliability and Open Source Software. International Journal of Computer Science and Technology 141

An Experimental Analysis of Open Source Software Reliability*

An Experimental Analysis of Open Source Software Reliability* An Experimental Analysis of Open Source Software Reliability* Cobra Rahmani, Harvey Siy, Azad Azadmanesh College of Information Science & Technology University of Nebraska-Omaha Omaha, U.S. E-mail: (crahmani,

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson

More information

A Comparative Analysis of Open Source Software Reliability

A Comparative Analysis of Open Source Software Reliability 384 JOURNAL OF SOFTWARE, VOL. 5, NO. 2, DECEMBER 2 A Comparative Analysis of Open Source Software Reliability Cobra Rahmani, Azad Azadmanesh and Lotfollah Najjar College of Information Science & Technology

More information

Exploratory Failure Analysis of Open Source Software 1

Exploratory Failure Analysis of Open Source Software 1 Exploratory Failure Analysis of Open Source Software Cobra Rahmani, Satish M. Srinivasan, Azad Azadmanesh College of Information Science & Technology University of Nebraska-Omaha, Omaha, U.S. e-mail: {crahmani,

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Optimal parameter choice in modeling of ERP system reliability

Optimal parameter choice in modeling of ERP system reliability Proceedings of the 22nd Central European Conference on Information and Intelligent Systems 365 Optimal parameter choice in modeling of ERP system reliability Frane Urem, Želimir Mikulić Department of management

More information

35 th Design Automation Conference Copyright 1998 ACM

35 th Design Automation Conference Copyright 1998 ACM Design Reliability - Estimation through Statistical Analysis of Bug Discovery Data Yossi Malka, Avi Ziv IBM Research Lab in Haifa MATAM Haifa 3195, Israel email: fyossi, aziv@vnet.ibm.comg Abstract Statistical

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

PROCESS FLOW IMPROVEMENT PROPOSAL OF A BATCH MANUFACTURING SYSTEM USING ARENA SIMULATION MODELING

PROCESS FLOW IMPROVEMENT PROPOSAL OF A BATCH MANUFACTURING SYSTEM USING ARENA SIMULATION MODELING PROCESS FLOW IMPROVEMENT PROPOSAL OF A BATCH MANUFACTURING SYSTEM USING ARENA SIMULATION MODELING Chowdury M. L. RAHMAN 1 Shafayet Ullah SABUJ 2 Abstract: Simulation is frequently the technique of choice

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Mobile Systems Security. Randomness tests

Mobile Systems Security. Randomness tests Mobile Systems Security Randomness tests Prof RG Crespo Mobile Systems Security Randomness tests: 1/6 Introduction (1) [Definition] Random, adj: lacking a pattern (Longman concise English dictionary) Prof

More information

Distribution is a χ 2 value on the χ 2 axis that is the vertical boundary separating the area in one tail of the graph from the remaining area.

Distribution is a χ 2 value on the χ 2 axis that is the vertical boundary separating the area in one tail of the graph from the remaining area. Section 8 4B Finding Critical Values for a Chi Square Distribution The entire area that is to be used in the tail(s) denoted by. The entire area denoted by can placed in the left tail and produce a Critical

More information

TEACHING SIMULATION WITH SPREADSHEETS

TEACHING SIMULATION WITH SPREADSHEETS TEACHING SIMULATION WITH SPREADSHEETS Jelena Pecherska and Yuri Merkuryev Deptartment of Modelling and Simulation Riga Technical University 1, Kalku Street, LV-1658 Riga, Latvia E-mail: merkur@itl.rtu.lv,

More information

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170 Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

More information

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania Oriana Zacaj Department of Mathematics, Polytechnic University, Faculty of Mathematics and Physics Engineering

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

More information

Empirical Study of effect of using Weibull. NIFTY index options

Empirical Study of effect of using Weibull. NIFTY index options Empirical Study of effect of using Weibull distribution in Black Scholes Formula on NIFTY index options 11 th Global Conference of Actuaries Harsh Narang Vatsala Deora Overview NIFTY NSE index options

More information

How To Calculate The Power Of A Cluster In Erlang (Orchestra)

How To Calculate The Power Of A Cluster In Erlang (Orchestra) Network Traffic Distribution Derek McAvoy Wireless Technology Strategy Architect March 5, 21 Data Growth is Exponential 2.5 x 18 98% 2 95% Traffic 1.5 1 9% 75% 5%.5 Data Traffic Feb 29 25% 1% 5% 2% 5 1

More information

Source Anonymity in Sensor Networks

Source Anonymity in Sensor Networks Source Anonymity in Sensor Networks Bertinoro PhD. Summer School, July 2009 Radha Poovendran Network Security Lab Electrical Engineering Department University of Washington, Seattle, WA http://www.ee.washington.edu/research/nsl

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

Monitoring Software Reliability using Statistical Process Control: An Ordered Statistics Approach

Monitoring Software Reliability using Statistical Process Control: An Ordered Statistics Approach Monitoring Software Reliability using Statistical Process Control: An Ordered Statistics Approach Bandla Srinivasa Rao Associate Professor. Dept. of Computer Science VRS & YRN College Dr. R Satya Prasad

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U Previous chapters of this text have explained the procedures used to test hypotheses using interval data (t-tests and ANOVA s) and nominal

More information

Inference on the parameters of the Weibull distribution using records

Inference on the parameters of the Weibull distribution using records Statistics & Operations Research Transactions SORT 39 (1) January-June 2015, 3-18 ISSN: 1696-2281 eissn: 2013-8830 www.idescat.cat/sort/ Statistics & Operations Research c Institut d Estadstica de Catalunya

More information

I. INTRODUCTION OBJECTIVES OF THE STUDY

I. INTRODUCTION OBJECTIVES OF THE STUDY ISSN: 2349-7637 (Online) RESEARCH HUB International Multidisciplinary Research Journal (RHIMRJ) Research Paper Available online at: www.rhimrj.com An Analysis of Working Capital Trends P. R. Halani 1st

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

( ) = 1 x. ! 2x = 2. The region where that joint density is positive is indicated with dotted lines in the graph below. y = x

( ) = 1 x. ! 2x = 2. The region where that joint density is positive is indicated with dotted lines in the graph below. y = x Errata for the ASM Study Manual for Exam P, Eleventh Edition By Dr. Krzysztof M. Ostaszewski, FSA, CERA, FSAS, CFA, MAAA Web site: http://www.krzysio.net E-mail: krzysio@krzysio.net Posted September 21,

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

People have thought about, and defined, probability in different ways. important to note the consequences of the definition: PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

More information

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

More information

C. Wohlin, "Managing Software Quality through Incremental Development and Certification", In Building Quality into Software, pp. 187-202, edited by

C. Wohlin, Managing Software Quality through Incremental Development and Certification, In Building Quality into Software, pp. 187-202, edited by C. Wohlin, "Managing Software Quality through Incremental Development and Certification", In Building Quality into Software, pp. 187-202, edited by M. Ross, C. A. Brebbia, G. Staples and J. Stapleton,

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Performance evaluation of large-scale data processing systems

Performance evaluation of large-scale data processing systems Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 1. pp. 295 301. Performance evaluation of large-scale data processing systems Attila Adamkó,

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution Acta Polytechnica Hungarica Vol. 7, No., Skewness and Kurtosis in Function of Selection of Network Traffic Distribution Petar Čisar Telekom Srbija, Subotica, Serbia, petarc@telekom.rs Sanja Maravić Čisar

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

A Selective Survey and direction on the software of Reliability Models

A Selective Survey and direction on the software of Reliability Models A Selective Survey and direction on the software of Reliability Models Vipin Kumar Research Scholar, S.M. Degree College, Chandausi Abstract: Software development, design and testing have become very intricate

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

More information

Testing Random- Number Generators

Testing Random- Number Generators Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/

More information

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

THE PROCESS CAPABILITY ANALYSIS - A TOOL FOR PROCESS PERFORMANCE MEASURES AND METRICS - A CASE STUDY

THE PROCESS CAPABILITY ANALYSIS - A TOOL FOR PROCESS PERFORMANCE MEASURES AND METRICS - A CASE STUDY International Journal for Quality Research 8(3) 399-416 ISSN 1800-6450 Yerriswamy Wooluru 1 Swamy D.R. P. Nagesh THE PROCESS CAPABILITY ANALYSIS - A TOOL FOR PROCESS PERFORMANCE MEASURES AND METRICS -

More information

Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

More information

How to Measure Software Quality in Vain

How to Measure Software Quality in Vain Empirical Estimates of Software Availability in Deployed Systems Avaya Labs Audris Mockus audris@mockus.org Avaya Labs Research Basking Ridge, NJ 07920 http://www.research.avayalabs.com/user/audris Motivation

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Estimating Industry Multiples

Estimating Industry Multiples Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Conditional probability of actually detecting a financial fraud a neutrosophic extension to Benford s law

Conditional probability of actually detecting a financial fraud a neutrosophic extension to Benford s law Conditional probability of actually detecting a financial fraud a neutrosophic extension to Benford s law Sukanto Bhattacharya Alaska Pacific University, USA Kuldeep Kumar Bond University, Australia Florentin

More information

This PDF is a selection from a published volume from the National Bureau of Economic Research. Volume Title: The Risks of Financial Institutions

This PDF is a selection from a published volume from the National Bureau of Economic Research. Volume Title: The Risks of Financial Institutions This PDF is a selection from a published volume from the National Bureau of Economic Research Volume Title: The Risks of Financial Institutions Volume Author/Editor: Mark Carey and René M. Stulz, editors

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

A logistic approximation to the cumulative normal distribution

A logistic approximation to the cumulative normal distribution A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University

More information

Measurement and Modelling of Internet Traffic at Access Networks

Measurement and Modelling of Internet Traffic at Access Networks Measurement and Modelling of Internet Traffic at Access Networks Johannes Färber, Stefan Bodamer, Joachim Charzinski 2 University of Stuttgart, Institute of Communication Networks and Computer Engineering,

More information

Distribution Analysis

Distribution Analysis Finding the best distribution that explains your data ENMAX Energy Corporation 8 October, 2015 Introduction Introduction Statistical tests Goodness of fit We often fit observations to a model (e.g., lognormal

More information

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore*

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* The data collection phases for evaluation designs may involve

More information

Operational Risk Modeling *

Operational Risk Modeling * Theoretical and Applied Economics Volume XVIII (2011), No. 6(559), pp. 63-72 Operational Risk Modeling * Gabriela ANGHELACHE Bucharest Academy of Economic Studies anghelache@cnvmr.ro Ana Cornelia OLTEANU

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Dynamical Simulation Models for the Development Process of Open Source Software Projects

Dynamical Simulation Models for the Development Process of Open Source Software Projects Dynamical Simulation Models for the Development Process of Open Source Software Proects I. P. Antoniades, I. Stamelos, L. Angelis and G.L. Bleris Department of Informatics, Aristotle University 54124 Thessaloniki,

More information

Vulnerability Discovery in Multi-Version Software Systems

Vulnerability Discovery in Multi-Version Software Systems 1th IEEE High Assurance Systems Engineering Symposium Vulnerability Discovery in Multi-Version Software Systems Jinyoo Kim, Yashwant K. Malaiya, Indrakshi Ray Computer Science Department Colorado State

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

T test as a parametric statistic

T test as a parametric statistic KJA Statistical Round pissn 2005-619 eissn 2005-7563 T test as a parametric statistic Korean Journal of Anesthesiology Department of Anesthesia and Pain Medicine, Pusan National University School of Medicine,

More information

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Goodness of Fit. Proportional Model. Probability Models & Frequency Data Probability Models & Frequency Data Goodness of Fit Proportional Model Chi-square Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

MATH2740: Environmental Statistics

MATH2740: Environmental Statistics MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016 Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case

More information

RELIABILITY IMPROVEMENT WITH PSP OF WEB-BASED SOFTWARE APPLICATIONS

RELIABILITY IMPROVEMENT WITH PSP OF WEB-BASED SOFTWARE APPLICATIONS RELIABILITY IMPROVEMENT WITH PSP OF WEB-BASED SOFTWARE APPLICATIONS Leticia Dávila-Nicanor, Pedro Mejía-Alvarez CINVESTAV-IPN. Sección de Computación ldavila@yahoo.com.mx, pmejia@cs.cinvestav.mx ABSTRACT

More information

The Study of Relationship between Customer Relationship Management, Patrons, and Profitability (A Case Study: all Municipals of Kurdistan State)

The Study of Relationship between Customer Relationship Management, Patrons, and Profitability (A Case Study: all Municipals of Kurdistan State) International Journal of Basic Sciences & Applied Research. Vol., 3 (SP), 316-320, 2014 Available online at http://www.isicenter.org ISSN 2147-3749 2014 The Study of Relationship between Customer Relationship

More information

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails. Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

More information

Validating Market Risk Models: A Practical Approach

Validating Market Risk Models: A Practical Approach Validating Market Risk Models: A Practical Approach Doug Gardner Wells Fargo November 2010 The views expressed in this presentation are those of the author and do not necessarily reflect the position of

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

Learning and Researching with Open Source Software

Learning and Researching with Open Source Software Learning and Researching with Open Source Software Minghui Zhou zhmh@pku.edu.cn Associate Professor Peking University Outline A snapshot of Open Source Software (OSS) Learning with OSS Research on OSS

More information