SPC on Ungrouped Data: Power Law Process Model

Iteratioal Joural of Software Egieerig. ISSN 0974-3162 Volume 5, 1 (2014), pp. 7-16 Iteratioal Research Publicatio House http://www.irphouse.com SPC o Ugrouped Data: Power Law Process Model DR. R. Satya Prasad, Mrs. A. Srisaila ad Dr. G.Krisha Moha Assoc. Professor, dept. Of cs&egg. Acharya agrjua uiversity profrsp@gmail.com Asst. Professor, V.R.Siddhartha Egg.College sr.saila@gmail.com Assoc.Professor, Dept. of CSE, KL Uiversity km_mm_2000@yahoo.com Abstract The Duae model, also kow as Power-Law Process model, is a twoparameter NoHomogeeous Poisso Process model which is widely used i software reliability growth modelig. I this paper, we propose to apply Statistical Process Cotrol (SPC) to moitor software reliability process. A cotrol mechaism is proposed based o the cumulative observatios of failures which is ugrouped usig mea value fuctio of the Power Law Process model. The Maximum Likelihood Estimatio (MLE) approach is used to estimate the ukow poit estimate parameters of the model. The process is illustrated by applyig to real software failure data. Keywords: Power-law process, NoHomogeeous Poisso Process, Maximum Likelihood Estimatio, SPC, Poit estimatio. 1. INTRODUCTION May software reliability models have bee proposed i last 40 years to compute the reliability growth of products durig software developmet phase. These models ca be of two types i.e. static ad dyamic. A static model uses software metrics to estimate the umber of defects i the software. A dyamic model uses the past failure discovery rate durig software executio over time to estimate the umber of failures. Various software reliability growth models (SRGMs) exist to estimate the expected umber of total defects (or failures) or the expected umber of remaiig defects (or failures). The goal of software egieerig is to produce high quality software at low cost. As, huma beigs are ivolved i the developmet of software, there is a possibility Paper code: 14647- IJSE

8 DR. R. Satya Prasad et al of errors i the software. To idetify ad elimiate huma errors i software developmet process ad also to improve software reliability, the Statistical Process Cotrol cocepts ad methods are the best choice. SPC cocepts ad methods are used to moitor the performace of a software process over time i order to verify that the process remais i the state of statistical cotrol. It helps i fidig assigable causes, log term improvemets i the software process. Software quality ad reliability ca be achieved by elimiatig the causes or improvig the software process or its operatig procedures (Kimura et al., 1995). The most popular techique for maitaiig process cotrol is cotrol chartig. The cotrol chart is oe of the seve tools for quality cotrol. Software process cotrol is used to secure, that the quality of the fial product will coform to predefied stadards. I ay process, regardless of how carefully it is maitaied, a certai amout of atural variability will always exist. A process is said to be statistically i-cotrol whe it operates with oly chace causes of variatio. O the other had, whe assigable causes are preset, the we say that the process is statistically out-of-cotrol. Cotrol charts should be capable to create a alarm whe a shift i the level of oe or more parameters of the uderlyig distributio occurs or a o-radom behavior comes ito. Normally, such a situatio will be reflected i the cotrol chart by poits plotted outside the cotrol limits or by the presece of specific patters. The most commo o-radom patters are cycles, treds, mixtures ad stratificatio (Koutras et al., 2007). For a process to be i cotrol the cotrol chart should ot have ay tred or oradom patter. The selectio of proper SPC charts is essetial to effective statistical process cotrol implemetatio ad use. The SPC chart selectio is based o data, situatio ad eed (MacGregor ad Kourti, 1995 ). Cha et al.,(2000) proposed a procedure based o the moitorig of cumulative quatity. This approach has show to have a umber of advatages: it does ot ivolve the choice of a sample size; it raises fewer false alarms; it ca be used i ay eviromet; ad it ca detect further process improvemet. Xie et al.,(2002) proposed t-chart for reliability moitorig where the cotrol limits are defied i such a maer that the process is cosidered to be out of cotrol whe oe failure is less tha LCL or greater tha UCL. Assumig a acceptable false alarm α=0.0027 the cotrol limits were defied. I sectio 5 of preset paper, a method is preseted to estimate the parameters ad defiig the limits. The process cotrol is decided by takig the successive differeces of mea values. 2. BACKGROUND THEORY This sectio presets the theory that uderlies NHPP models, the SRGMs uder cosideratio ad maximum likelihood estimatio for ugrouped data. If t is a cotiuous radom variable with pdf: f ( t; 1, 2,, k ). Where, 1, 2,, k are k ukow costat parameters which eed to be estimated, ad cdf: mathematical relatioship betwee the pdf ad cdf is give by: f t F t F t. Where, The ' ( ). Let a deote the expected umber of faults that would be detected give ifiite testig

SPC o Ugrouped Data: Power Law Process Model 9 time. The, the mea value fuctio of the NHPP models ca be writte as: m( t) af ( t), where F(t) is a cumulative distributio fuctio. The failure itesity fuctio ( t) i case of NHPP models is give by: ( t) af '( t) (Pham, 2006). 2.1. NHPP MODEL The No-Homogeous Poisso Process (NHPP) based software reliability growth models (SRGMs) are proved to be quite successful i practical software reliability egieerig. The mai issue i the NHPP model is to determie a appropriate mea value fuctio to deote the expected umber of failures experieced up to a certai time poit. Model parameters ca be estimated by usig Maximum Likelihood Estimate (MLE). Various NHPP SRGMs have bee built upo various assumptios. May of the SRGMs assume that each time a failure occurs, the fault that caused it ca be immediately removed ad o ew faults are itroduced. Which is usually called perfect debuggig. Imperfect debuggig models have proposed a relaxatio of the above assumptio. 2.2. POWER LAW PROCESS MODEL Software reliability growth models (SRGM s) are useful to assess the reliability for quality maagemet ad testig-progress cotrol of software developmet. They have bee grouped ito two classes of models cocave ad S-shaped. The most importat thig about both models is that they have the same asymptotic behavior, i.e., the defect detectio rate decreases as the umber of defects detected (ad repaired) icreases, ad the total umber of defects detected asymptotically approaches a fiite value. This model is proposed by Duae(1964). The model has bee applied i aalyzig failure data of repairable systems. I software reliability cotext, this model has bee discussed by may authors, see khoshgoftaar ad woodstock(1991), Lyu ad Nikora (1991). This model is characterized by the followig mea value fuctio: b m t at. Where, a, b 0, t 0. The failure itesity fuctio of the model, which is defied as the derivative of the mea value fuctio t b1 abt. mt, is give by 2.3. MAXIMUM LIKELIHOOD ESTIMATION I much of the literature the preferred method of obtaiig parameter estimates is to use the maximum likelihood equatios. Likelihood equatios are derived from the model equatios ad the assumptios which uderlie these equatios. The parameters are the take to be those values which maximize these likelihood fuctios. These values are foud by takig the partial derivate of the likelihood fuctio with respect to the model parameters, the maximum likelihood equatios, ad settig them to zero. Iterative routies are the used to solve these equatios. Ufortuately, the SRGM literature is sadly lackig i advice o which iterative routies to use, ad with what startig values. This is ufortuate because the accuracy of parameter estimates ad thus the accuracy of the models themselves greatly deped o the ability of the iterative search methods used to overcome local miima ad fid good values for the

10 DR. R. Satya Prasad et al parameters. If we coduct a experimet ad obtai N idepedet observatios, t 1, t 2,, t N. The likelihood fuctio may be give by the followig product: L t, t,, t,,, f ( t ;,,, ) 1 2 N 1 2 k i 1 2 k i1 m t Likelihood fuctio by usig λ(t) is: L e ( ti ) N Log Likelihood fuctio for ugrouped data [10] is give as, lo g L lo g ( t ) m ( t ) i 1 i The maximum likelihood estimators (MLE) of 1, 2,, k are obtaied by maximizig L or, where is l L. By maximizig, which is much easier to work with tha L, the maximum likelihood estimators (MLE) of 1, 2,, k are the simultaeous solutios of k equatios such as:, j=1,2,,k. 3. ILLUSTRATION: PARAMETER ESTIMATION We used cumulative time betwee failures data for software reliability moitorig. The use of cumulative quality is a differet ad ew approach, which is of particular advatage i reliability. Usig the estimators of a ad b we ca compute m( t ). The likelihood fuctio of Power Law Process model is give as, as: N b a t b 1 i (3.1) i 1 L e a b t Takig the atural logarithm o both sides, The Log Likelihood fuctio is give log b 1 b L log abti at i 1 i1 a b b ti a t log log 1 log i 1 j b 0. (3.2) Takig the Partial derivative with respect to a ad equatig to 0. aˆ bˆ (3.3) t Takig the Partial derivative of log L with respect to b ad equatig to 0. ˆ b (3.4) t l i1 ti 4. TIME DOMAIN FAILURE DATA SETS The techiques examied here deal with data about the time at which failures occurred; or alteratively, data about the time betwee failure occurreces. These two

SPC o Ugrouped Data: Power Law Process Model 11 forms ca be cosidered equivalet. Although most software reliability growth models use data of this form, ad such models have bee i use for several decades, fidig suitable data to verify models ad improvemet techiques is difficult. Early work geerally focused o data based o caledar or wall clock time. Musa asserts that CPU executio time is a better measure tha wall clock time, durig which the actually time spet ruig a program ca vary greatly based o CPU load, ma hours, ad other factors. The performace of the model uder cosideratio is exemplified by applyig o the data sets give i tables 4.1 ad 4.2. Data Set #1: SONATA Software Limited The data is collected for a project X i SONATA software Limited durig the Testig phase (Ashoka, 2010). Data Set #2: (Lyu, 1996) Table 4.1: Data Set #1 Iter Iter 1 52.5 11 52.5 21 105 2 52.5 12 52.5 22 105 3 26.25 13 105 23 52.5 4 52.5 14 35 24 52.5 5 17.5 15 52.5 25 52.5 6 105 16 52.5 26 52.5 7 105 17 35 27 105 8 21 18 52.5 28 52.5 9 35 19 105 29 52.5 10 35 20 105 30 52.5 Iter Table 4.2: Data Set #2 Iter Iter Iter 1 0.5 9 1.4 17 3.2 2 1.2 10 3.5 18 2.5 3 2.8 11 3.4 19 2 4 2.7 12 1.2 20 4.5 5 2.8 13 0.9 21 3.5 6 3 14 1.7 22 5.2 7 1.8 15 1.4 23 7.2 8 0.9 16 2.7 24 10.7

12 DR. R. Satya Prasad et al 5. ESTIMATED PARAMETERS AND THEIR CONTROL LIMITS The estimated parameters ad the calculated cotrol limits of the cotrol Chart for Data Set#1 ad Data Set #2 with the false alarm risk, α = 0.0027 are give i Table 5.1. Usig the estimated parameters ad the estimated limits, we calculated the cotrol limits UCL= m ( t U ), CL= m( t C ) ad LCL= m ( t L ). They are used to fid whether the software process is i cotrol or ot. The estimated values of a ad b ad their cotrol limits are as follows. t t t 0.99865 0.5 1/ b 0.00135 1/ b T C 1/ b T U T L Table 5.1: Parameter estimates ad Cotrol limits Data Set Estimated Parameters Cotrol Limits a b UCL=m(T u ) CL=m(T c ) LCL=m(T l ) LYU 0.988826 0.748933 0.987491 0.494413 0.001335 SONATA 0.019820 0.974573 0.019793 0.009910 0.000027 Goodess-of-fit Model compariso ad selectio are the most commo problems of statistical practice, with umerous procedures for choosig amog a set of models proposed i the literature. Goodess-of-fit tests for this process have bee proposed by (Park ad Kim, 1992), (Rigdo, 1989). The AIC is a measure of the relative quality of a statistical model, for a give set of data. As such, AIC provides a meas for model selectio. AIC deals with the tradeoff betwee the goodess of fit of the model ad the complexity of the model. AIC 2* L 2* k Where k is the umber of parameters i the statistical model, ad L is the maximized value of the likelihood fuctio for the estimated model. Give a set of cadidate models for the data, the preferred model is the oe with the miimum AIC value. Hece AIC ot oly rewards goodess of fit, but also icludes a pealty that is a icreasig fuctio of the umber of estimated parameters. This pealty discourages over-fittig. The Log likelihood ad the AIC for 5 real life failure data sets for the power law process model is give i table 5.2 as follows. Table 5.2: Goodess-of-fit for Power law process model Data Set Log L AIC LYU -48.563908 101.127817 SONATA -39.642182 83.284365 Xie -95.102089 194.204178 NTDS -50.386846 104.773692 ATT -74.844868 153.689737

SPC o Ugrouped Data: Power Law Process Model 13 6. DISTRIBUTION OF TIME BETWEEN FAILURES The mea value successive differeces of time betwee failures cumulative data of the cosidered data sets are tabulated i Table 6.1 ad 6.2. Cosiderig the mea value successive differeces o y axis, failure umbers o x axis ad the cotrol limits o cotrol chart, we obtaied figure 6.1 ad 6.2. A poit below the cotrol limit m( t L ) idicates a alarmig sigal. A poit above the cotrol limit m( t U ) idicates better quality. If the poits are fallig withi the cotrol limits it idicates the software process is i stable. Table 6.1: Successive differeces of mea values, LYU F. F. C_TBF m(t) SD No No C_TBF m(t) SD 1 0.5 0.588395 0.882938 13 26.1 11.378600 0.550641 2 1.7 1.471332 1.578896 14 27.8 11.929241 0.447138 3 4.5 3.050229 1.286923 15 29.2 12.376379 0.847489 4 7.2 4.337152 1.209782 16 31.9 13.223867 0.981469 5 10 5.546934 1.204390 17 35.1 14.205336 0.751171 6 13 6.751324 0.688585 18 37.6 14.956507 0.591928 7 14.8 7.439909 0.336314 19 39.6 15.548434 1.305230 8 15.7 7.776223 0.513719 20 44.1 16.853664 0.992103 9 17.1 8.289942 1.240621 21 47.6 17.845767 1.440910 10 20.6 9.530562 1.155206 22 52.8 19.286677 1.937759 11 24 10.685768 0.397686 23 60 21.224436 2.775564 12 25.2 11.083454 0.295146 24 70.7 24.000000 Figure 6.1: cotrol Chart of LYU

14 DR. R. Satya Prasad et al Table 6.1: Successive differeces of mea values, SONATA F. F. C_TBF m(t) SD No No C_TBF m(t) SD 1 52.5 0.940860 0.907986 16 852.25 14.228392 0.569177 2 105 1.848845 0.449136 17 887.25 14.797569 0.852703 3 131.25 2.297981 0.891786 18 939.75 15.650272 1.701839 4 183.75 3.189767 0.295716 19 1044.75 17.352111 1.697488 5 201.25 3.485483 1.762191 20 1149.75 19.049599 1.693545 6 306.25 5.247674 1.746578 21 1254.75 20.743144 1.689941 7 411.25 6.994252 0.347850 22 1359.75 22.433085 0.843710 8 432.25 7.342102 0.578805 23 1412.25 23.276795 0.842913 9 467.25 7.920908 0.577703 24 1464.75 24.119708 0.842145 10 502.25 8.498611 0.864657 25 1517.25 24.961853 0.841404 11 554.75 9.363268 0.862576 26 1569.75 25.803257 1.680686 12 607.25 10.225843 1.719617 27 1674.75 27.483943 0.839328 13 712.25 11.945460 0.571723 28 1727.25 28.323271 0.838679 14 747.25 12.517184 0.856319 29 1779.75 29.161950 0.838050 15 799.75 13.373502 0.854890 30 1832.25 30.000000 Figure 6.2: cotrol Chart of SONATA 7. CONCLUSION The give betwee failures data are plotted through the estimated mea value fuctio agaist the failure serial order. The graphs have show out of cotrol sigals i.e below the LCL. Hece we coclude that our method of estimatio ad the cotrol chart are givig a positive recommedatio for their use i fidig out preferable

SPC o Ugrouped Data: Power Law Process Model 15 cotrol process or desirable out of cotrol sigal. By observig the Cotrol chart it is idetified that, for DS#1 the failure process out of UCL. For DS#2 the failure situatio is detected at 15th poit below LCL. Hece our proposed Cotrol Chart detects out of cotrol situatio. May of the successive differeces have goe out of upper cotrol limits for the preset model. Refereces 1. Ashoka. M.,(2010), Soata Software Limited Data Set, Bagalore. 2. Cha, L.Y, Xie, M., ad Goh. T.N., (2000), Cumulative quality cotrol charts for moitorig productio processes. It J Prod Res; 38(2):397-408. 3. Duae, J.T., (1964). Learig curve approach to reliability moitorig, IEEE Tras. Aerospace, AS-2, pp.[563-566]. 4. Gaudoi,O., Yag, B. ad Xie, M. (March, 2003). A simple goodess-of-fit test for the power-law process, based o the duae plot, IEEE tras. Rel., Vol. 52, No, 1. 5. Khoshgoftaar, T.M. ad Woodstock, T.G. (1991). Software reliability model selectio: a case study, Proc. It. Symp. O software reliability egieerig, may 18-19, Austi, Texas, pp.[183-191]. 6. Kimura, M., Yamada, S., Osaki, S., 1995. Statistical Software reliability predictio ad its applicability based o mea time betwee failures. Mathematical ad Computer Modelig Volume 22, Issues 10-12, Pages 149-155. 7. Koutras, M.V., Bersimis, S., Maravelakis,P.E., 2007. Statistical process cotrol usig shewart cotrol charts with supplemetary Rus rules Spriger Sciece + Busiess media 9:207-224. 8. Lyu, M.R. ad Nikora, A. (1991). A Heuristic approach for software reliability predictio: the equally weighted liear combiatio model. Proc. It. Symp. O software reliability egieerig, may 18-19, Austi, Texas, pp.[172-181]. 9. Lyu, M.R., (1996). Hadbook of Software Reliability Egieerig, McGraw- Hill, New York.. 10. MacGregor, J.F., Kourti, T., 1995. Statistical process cotrol of multivariate processes. Cotrol Egieerig Practice Volume 3, Issue 3, March 1995, Pages 403-414. 11. Park, W. J ad Kim, Y.G., (1992). Goodess-of-fit tests for the power law process, IEEE tras. Rel., Vol. 41, pp. 107-111. 12. Park, W. J ad Scoh, M., (1994). More Goodess-of-fit tests for the power law process, IEEE tras. Rel., Vol. 43, pp. 275-278. 13. Pham. H., 2006. System software reliability, Spriger. 14. Rigdo, S.E., (1989). Testig goodess-of-fit for the power law process, commuicatios i statistics: theory ad methods, Vol.18, pp.4665-4676. 15. Xie. M, T.N Goh ad P.Raja. (2002). Some effective cotrol chart procedures for reliability moitorig, Reliability Egieerig ad System Safety. 77, 143-150.

16 DR. R. Satya Prasad et al Authors Profile: Dr. R. Satya Prasad received Ph.D. degree i Computer Sciece i the faculty of Egieerig i 2007 from Acharya Nagarjua Uiversity, Adhra Pradesh. He received gold medal from Acharya Nagarjua Uiversity for his outstadig performace i Masters Degree. He is curretly workig as Associate Professor i the Departmet of Computer Sciece & Egieerig, Acharya Nagarjua Uiversity. His curret research is focused o Software Egieerig. He has published 75 papers i Natioal & Iteratioal Jourals. Mrs. A.Srisaila received her M.Tech (CSE) from Bapatla Egieerig College, Acharya Nagarjua Uiversity. Presetly she is workig as Asst. Professor i Departmet of Iformatio Techology i V.R.Siddhartha Egieerig College. Her research area icludes Software Reliability Egieerig, Graphical Passwords, Huma Computer Iteractio. Dr. G. Krisha Moha is workig as a Assoc. Professor i the Departmet of CSE, KL Uiversity, Vaddeswaram, Gutur. He obtaied his M.C.A degree from Acharya Nagarjua Uiversity i 2000, M.Tech from JNTU, Kakiada, M.Phil from Madurai Kamaraj Uiversity ad Ph.D from Acharya Nagarjua Uiversity. His research iterests lies i Data Miig ad Software Egieerig. He has 14 years of teachig experiece. He has qualified APSLET i 2012. He published 28 papers i Natioal & Iteratioal jourals.