INTERNATIONAL JOURNAL OF STATISTICS - THEORY AND APPLICATIONS, DECEMBER 204, VOL., NO., PAGES 43-49 43 Reliability Analysis Based on AHP and Software Reliability Models for Big Data on Cloud Computing Yoshinobu Tamura and Shigeru Yamada ABSTRACT In this paper, we propose the method of software reliability assessment for the infrastructure software for the big data on cloud computing by using the analytic hierarchy process and software reliability growth models. In particular, the proposed method is focused on the characteristics caused by the interaction arising from the relationship between big data and cloud computing. Then, we assess the performance of our method by sensitivity analysis for AHP considering 3V s model. Moreover, we show several numerical examples of the proposed method. KEYWORDS Big data, Cloud computing, AHP, Software reliability model I. INTRODUCTION It is difficult for many companies to assess the reliability in cloud computing by using OSS (Open Source Software), because an OSS includes several software versions, the vulnerability issue, the opened source code, and the security hole, etc. Considering the reliability assessment for the cloud computing and OSS, it is important to deal with the external factors as the results of big data, because the external factors arising from the relationship between big data and cloud computing have an effect on the infrastructure software for cloud computing. Then, it is difficult for the software managers to consider these external factors from the big data, mobile clouds, and cloud computing. Also, there are some interesting research papers in terms of the cloud hardware, cloud service, mobile clouds, and cloud performance evaluation. However, most of them have focused on the case studies of cloud service and cloud data storage technologies. The effective methods of dynamic reliability assessment considering the environment such as cloud computing and OSS have been only a few presented [2, 6, 8]. Also, several research papers [3, 5, 0,, 4] have been proposed in the area of cloud computing. However, these papers focus on the security, service optimization, secure control, resource allocation technique, etc. The research papers in terms of reliability for cloud computing have only a few Yoshinobu Tamura Yamaguchi University Tokiwadai 2-6-, Ube-shi, Yamaguchi, 755-86, Japan Email: tamura@yamaguchi-u.ac.jp Shigeru Yamada Tottori University Minami 4-0, Koyama, Tottori-shi, 680-8552, Japan Email: yamada@sse.tottori-u.ac.jp (Received: 30 August 204; Revised 0 November 204; Accepted 05 November 204) presented. In particular, it is very interesting to consider the status of fault-detection and big because the big data as the results from the huge and complicated data by using the internet network cause the system-wide failures due to the complexity of data management. The big data means not only the huge-capacity data but also high-speed and real-time property data. Then, the rule and pattern characteristics can be obtained from the big data. As the case example of the failure for big data, there is the software system failure in terms of DataNode or NameNode used in Hadoop. Thereby, the software system failure is occurred from the database. We have assumed that the big data can be assessed by using the subjective experience of software managers, because the skill of software managers is important to control the big data on cloud computing. From above reasons, it is important to consider the indirect influences of big data on the reliability. We have proposed several methods of software reliability for cloud computing in the past [5, 6]. However, the effective methods of reliability assessment considering among the network, big data, and fault factors have been only a few presented, because it is very difficult to describe the indirect influence of network data, big data, and fault data as the reliability assessment measures as shown in Figure. Figure : The relationship between the big data and cloud computing We summarize the indirect relationship between the big data and the cloud computing as follows: The big data as the results from the huge and complicated data by using the internet network cause the system-wide failures because of the complexity of data management. The cloud computing has a particular maintenance phase such as the provisioning processes. The various mobile devices are connected via the network to the cloud service.
44 Yoshinobu Tamura and Shigeru Yamada Reliability Analysis Based on AHP and Software Reliability Models for Big Data on Cloud Computing The data storage areas for cloud computing are reconfigured via the various mobile devices. From above discussed points, we assume that all factors of big data, cloud computing, and network access have an effect on the cloud computing, directly and indirectly. Then, we consider 3V's model defined by Gartner Group, Inc. [2] for the big data. In this paper, we propose a method of software reliability analysis considering 3V's model in big data on cloud computing. Moreover, we estimate several parameters included in the proposed model by using AHP (Analytic hierarchy process) based on 3V's model in big data. Also, we propose a new approach to describe the indirect effect on reliability by using three kinds of Brownian motions. In particular, we assess the performance of the proposed three-dimensional stochastic differential equation models based on the estimation results of AHP. Then, we show performance examples of the proposed models to evaluate the method of software reliability assessment for the big data on cloud computing. II. ANALYSIS OF BIG DATA ON CLOUD COMPUTING BASED ON DECISION-MAKING An AHP developed in the 970s is utilized widely in Europe and the United States in terms of management issues such as energy problems, decision-making, urban planning, and so on. Especially, the AHP is considered as one of the most effective methods for decision-making support [3]. In case of considering the effect of debugging process on an entire system in the development of a method of software reliability assessment for big data on cloud computing, it is necessary to grasp the deeply-intertwined factors, such as the characteristics of big data, the application of cloud computing, the system reliability, and so on. In this paper, we propose a method of reliability assessment based on the AHP in terms of estimating the effect of each factor on the entire big data on cloud computing in a complex situation. In particular, we can apply the 3V's model for describing big data to the evaluation criteria of the AHP. The 3V's model in the big data means Volume, Velocity, and Variety. The 3V's model is defined by Gartner Group, Inc. [2]. The Volume, Velocity, and Variety are very important to assess the big data in terms of the external factor for reliability. Let wi ( i =,2,, n) be the weight parameters for evaluation criteria of the AHP. Then, we can obtain the weight parameter λ i for each evaluation criterion by using the following geometric average: n λi = n xij, j= x ij wi =. w j Therefore, the total weight parameter for each evaluation criterion is given by the following equation: λi μi =. (2) n λ i i= () By using the weight parameter μ i ( i =, 2,, n) in (2), we can obtain the total weight parameter pi ( i =, 2, 3) which represents the alternative of AHP. In this paper, the Volume, Velocity, and Variety of 3V's model are applied to the evaluation criteria of the AHP. Moreover, we consider three probabilities as the alternative of AHP, i.e., the detection rate of fault per unit time, the changing rate of network traffic per unit time, and the renewal rate of data per unit time. Figure 2 shows the basic concept of factor analysis for big data by using AHP in this paper. Figure 2: The basic concept of factor analysis for big data III. THREE DIMENSIONAL WIENER PROCESS MODELS At present, the amount of data used by cloud users becomes large. Then, we consider the big data in order to assess the reliability for cloud computing. Let M ( t ) be the cumulative number of faults detected by operation time t ( t 0) in the cloud software. Suppose that M ( t ) takes on continuous real values. Since latent faults in the cloud software are detected and eliminated during the operation phase, M ( t ) gradually increases as the operational procedures go on. Thus, under common assumptions for software reliability growth modeling [4, 7, 9, 2], we extend to the following stochastic differential equation modeling considering three Brownian motions [, 9]: dm() t = { b() t + σν () t }{ R() t M() t }, (3) dm2 () t = { b2() t + σν 2 2() t }{ R2() t M2() t }, (4) dm3() t = { b3() t + σν 3 3() t }{ R3() t M3() t }, (5) where bi ( t) ( i =, 2,3) is the software fault-detection rate at operation time t and a non-negative function, Ri ()( t i =,2,3), means the amount of changes of requirements specification [20]. Also, Ri ( t) ( i =,2,3) is assumed to be αi e β i t, where α i ( i =,2,3) is the number of faults latent in the cloud OSS, and β i ( i =,2,3) the changing rate of requirements specification. It is assumed that the fault-prone requirements specification of OSS grows exponentially in terms of t. Thus, the OSS shows a reliability regression trend if β i ( i =,2,3) is negative. On the other hand, the OSS shows
INTERNATIONAL JOURNAL OF STATISTICS - THEORY AND APPLICATIONS, DECEMBER 204, VOL., NO., PAGES 43-49 45 a reliability growth trend if β i ( i =, 2, 3) is positive. Moreover, σ, σ 2, and σ 3 are a positive constant representing a magnitude of the irregular fluctuation, ν () t, ν 2 () t, and ν 3 () t a standardized Gaussian white noise. We assume that M () t and R () t are related with the software fault-detection rate b () t depending on the failure-occurrence phenomenon. Also, M 2 () t and R2 () t are related with the software faultdetection rate b 2 () t depending on the big data. Moreover M 3 () t and R3 () t are related with the software fault-detection rate b 3 () t depending on the network of cloud computing. Considering the independent of each noise, we can obtain the following integrated stochastic differential equation: dm () t = { bt () + σν () t + σ2ν2() t + σν 3 3() t}{ Rt () M() t}. (6) We extend to the following stochastic differential equations of an Ito type [22]: 2 dm() t = b() t σ { R() t M() t } 2 (7) + σ{ R( t) M( t) } dω( t), 2 dm2() t = b2() t σ 2 { R2() t M2() t } 2 (8) + σ2{ R2( t) M2( t) } dω2( t), 2 dm3() t = b3() t σ3 { R3() t M3() t } 2 (9) + σ3{ R3( t) M3( t) } dω3( t), th i where ω i ( t) is one-dimensional Wiener process which is formally defined as an integration of the white noise ν ( ) with i t respect to time t. Similarly, we can obtain the following integrated stochastic differential equation based on the independent of each noise: 2 2 2 dm () t = b3() t ( σ + σ2 + σ3 ) { R() t M () t } 2 + σ{ Rt ( ) M( t) } dω( t) (0) + σ2{ Rt ( ) M( t) } dω2( t) + σ3{ Rt ( ) M( t) } dω3( t). We define the three dimensions processes [ ω( t), ω2( t), ω 3( t)] as follows [8]: 2 2 2 ω() t = ( σ 2 + σ2 + σ3 ) { σω() t + σ2ω2() t + σ3ω3() t }.() By using Ito's formula [, 9], we can obtain the solution of (0) under the initial condition M (0) = 0 as follows [22]: t Mt () = Rt () exp bsds () σω () t σ2ω2() t σω 3 3() t. 0 (2) Using solution process M ( t ) in (0), we can derive several software reliability measures. Moreover, we assume the software fault-detection rate per fault in case of bt ( ) defined as: di () t t () + c bsds =, (3) 0 a I() t + c exp( bt) where I( t ) means the mean value functions for the inflection S-shaped SRGM based on a nonhomogeneous Poisson process (NHPP) [2]. In (3), a means the expected total number of latent faults prior to operation, and b the fault-detection rate per fault. Generally, the parameter c is defined as ( l) / l. We define the parameter l as the value of fault factor. Therefore, the cumulative numbers of faults detected up to time t are obtained as follows: Mt () = Rt () + c exp { bt σω () t σ2ω2() t σω 3 3() t }. + c exp( bt) (4) In the proposed model, we assume that the parameter σ depends on the failure-occurrence phenomenon. Also, we assume that the parameter σ 2 depends on the network changing rate per unit time resulting from the cloud computing. Moreover, we assume that the parameter σ 3 depends on the renewal rate per unit time resulting from the big data. IV. PARAMETER ESTIMATION In this section, the estimation method of unknown parametersα, β, b, and σ in (4) is presented. We assume that the software managers estimate the unknown parameters α, β, b, and σ included in the proposed stochastic differential equations by using the method of maximum-likelihood. Also, the known parameters l, σ 2, and σ 3 included in the proposed stochastic differential equations are obtained from the actual network traffic per unit time and the actual renewal rate per unit time gathered by the database analysis of big data on cloud computing. The joint probability distribution function of the process M ( t ) is denoted as Pt (, y; t2, y2; ; tk, yk) (5) Pr[ M( t) y,, M( tk) yk M( t0) = 0]. The probability density of (5) is denoted as K Pt (, y; t2, y2; ; tk, yk) pt (, y; t2, y2; ; tk, yk). y y2 yk (6) Since M ( t ) takes on continuous values, the likelihood function, l, for the observed data ( tk, yk) ( k =, 2,, K) is constructed as follows: λ = p( t, y; t2, y2; ; tk, yk ). (7) For convenience in mathematical manipulations, the following logarithmic likelihood function is used: Λ= log λ. (8)
46 Yoshinobu Tamura and Shigeru Yamada Reliability Analysis Based on AHP and Software Reliability Models for Big Data on Cloud Computing The maximum-likelihood estimates ˆα, ˆβ, ˆb, and ˆ σ are the values making Λ in (8) maximize. These can be obtained as the solutions of the following simultaneous likelihood equations: Λ Λ Λ Λ = = = = 0. (9) α β b σ Based on the estimated parameters, we can represent the three dimensional figure of the sample path of the number of detected faults by using the noise-by-noise formula in (4). V. PERFORMANCE EXAMPLES The OSS is closely watched from the point of view of the cost reduction and the quick delivery. There are several open source projects in the area of cloud computing. In particular, we focus on OpenStack [7] in order to evaluate the performance of our method. In this paper, we show numerical examples by using the data sets for OpenStack of cloud OSS. The data used in this paper are collected in the bug tracking system on the website of OpenStack open source project. The actual data set is shown in Figure 3. Figure 3: The actual data set in OpenStack Project 7. Sensitivity analysis in terms of evaluation criteria in AHP The specified evaluation criteria of AHP considering 3V s model on big data in case of small variety is shown in Figure 4. Also, Figure 5 shows the specified evaluation criteria of AHP considering 3V s model on big data in case of small volume. Moreover, Figure 6 shows the estimated weight parameters for the fault-detection rate, the traffic changing rate, and the data changing rate in case of small variety. Similarly, the estimated weight parameters for the fault-detection rate, the traffic changing rate and the data changing rate in case of small volume are shown in Figure 7. We assume the two cases in Figures 4 and 5 as the characteristics of big data by using AHP. From Figures 6 and 7, we can confirm that the estimated weight parameter of network traffic factor becomes large. In particular, we confirm that the impact of network factor becomes large in Figure 7. Figure 7 means that the impact of data factor shows small, because the size of volume in early operation phase becomes small. 8. Sensitivity analysis in terms of several noises in the proposed model In this section, some behavior of software reliability assessment measures are shown if the characteristic noise parameter σ i ( i =,2,3) included in the proposed stochastic differential equation model is changed. The sample path of the number of detected faults (fault and network factors) in case of small variety is represented in Figure 8. Also, the sample path of the number of detected faults (fault and network factors) in case of small volume is represented in Figure 9. The proposed model parameter σ i( i =,2,3) means the most important factor in the characteristics of big data on cloud computing. From Figures 8 and 9, we can find that the behavior of network factor changes with the difference in the estimation results of AHP. Similarly, the sample path of the number of detected faults (fault and data factors) in case of small variety is represented in Figure 0. Also, the sample path of the number of detected faults (fault and data factors) in case of small volume is represented in Figure. From Figures 0 and, we can find that the behavior of data factor changes with the difference in the estimation results of AHP. In particular, the difference in data factor becomes large as the estimation results of the proposed stochastic differential equation model. Moreover, the curves of the sample path of the number of detected faults (network and data factors) in case of small variety are represented in Figure 2. Also, the sample path of the number of detected faults (network and data factors) in case of small volume is represented in Figure 3. From Figures 2 and 3, we can find that the behavior of data factor changes with the difference in the estimation results of AHP. Then, the difference in data factor becomes large as the estimation results of the proposed stochastic differential equation model. Also, we find that the combination of network and big data factors become large in all operating phase of cloud computing. The above results have shown that the proposed model can not only cover many conventional models but also can assess software reliability considering the big data on cloud computing, because the proposed model can widely describe the growth curve by using the several noises. Also, the proposed model will reduce some efforts to select a suitable model for the collected data sets on cloud computing. The proposed method will be useful to visually understand the characteristics of big data on cloud computing. The characteristics of the proposed model are as follows: The proposed model can assess the indirectly influences for software reliability, i.e., the software managers can comprehend the situations both the network factor and data factor by the noise. On the other hand, the existing models cannot assess the multi-factor such as the indirectly influences. The sample path of the number of detected faults will be useful for the software managers to confirm the status of cloud computing, because the proposed model has the characteristics of representing the multiple aspects in the cloud computing and big data by several noises. Then, the magnitude of noise for each factor will be useful to confirm the operating status of the cloud computing and big data. In the conventional models, it is difficult to formulate the indirectly influences, because the typical models
INTERNATIONAL JOURNAL OF STATISTICS - THEORY AND APPLICATIONS, DECEMBER 204, VOL., NO., PAGES 43-49 47 cannot formulate as the noise in case of including the model parameter. Then, stochastic differential equation model can formulate the indirectly influences for software reliability by using the noise. The proposed model includes several noises. In case of considering the noise, it is important to apply the noise for the large-scale objective. On the other hand, it is difficult to consider as the noise for the small-scale objective. From this standpoint, we propose the three noisy model for the big data on cloud computing. VI. CONCLUSION We have focused on the big data on cloud computing. In particular, we have proposed the method of reliability assessment in order to consider the characteristics of cloud computing under big data. Then, we have incorporated it for analyzing the interaction among 3V's model on big data. Moreover, we have shown several performance examples of the proposed method for the actual data. Thereby, we have found that the proposed model can assess the integrated reliability considering the relationship among software failure, network traffic, and big data. In case of considering the effect of external factors on entire system in the development of software reliability assessment methods for cloud computing, it is necessary to grasp the deeply-intertwined factors. In this paper, we have shown that the proposed method can grasp such deeply-intertwined factors by assuming 3V's model of big data. Also, we have analyzed actual data to show sensitivity analysis of software reliability assessment for the cloud computing by using the proposed method. As the results of sensitivity analysis, the proposed models will be useful to visually assess the reliability considering the characteristics of big data on cloud computing by using the noises of three types. Figure 4: The specified evaluation criteria of AHP considering 3V s model on big data in case of small variety Figure 5: The specified evaluation criteria of AHP considering 3V s model on big data in case of small volume Figure 6: The estimated weight parameters for the traffic changing rate, the data changing rate, and the fault-detection rate in case of small variety Figure 7: The estimated weight parameters for the traffic changing rate, the data changing rate and the fault-detection rate in case of small volume
48 Yoshinobu Tamura and Shigeru Yamada Reliability Analysis Based on AHP and Software Reliability Models for Big Data on Cloud Computing Figure 8: The sample path of the number of detected faults (fault and network factors) in case of small variety Figure 9: The sample path of the number of detected faults (fault and network factors) in case of small volume Figure 0: The sample path of the number of detected faults (fault and big data factors) in case of small variety Figure : The sample path of the number of detected faults (fault and big data factors) in case of small volume Figure 2: The sample path of the number of detected faults (network and big data factors) in case of small variety Figure 3: The sample path of the number of detected faults (network and big data factors) in case of small volume ACKNOWLEDGMENT This work was supported in part by JSPS KAKENHI Grant No. 24500066 and No. 25350445 in Japan. REFERENCES [] Arnold, L. (974). Stochastic Differential Equations- Theory and Applications, John Wiley & Sons, New York. [2] Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R. and Trivedi, K. S. (203), Fault triggers in open-source software: an experience, Proceedings of the 24th IEEE International Symposium on Software Reliability Engineering, Pasadena, CA, 78-87. [3] Gabner, R., Schwefel, H. P., Hummel, K. A. and Haring, G. (20), Optimal model-based policies for component migration of mobile cloud services, Proceedings of the 0th IEEE International Symposium on Network
INTERNATIONAL JOURNAL OF STATISTICS - THEORY AND APPLICATIONS, DECEMBER 204, VOL., NO., PAGES 43-49 49 Computing and Applications, Cambridge, MA, USA, 95-202. [4] Kapur, P. K., Pham, H., Gupta, A. and Jha, P.C. (20). Software Reliability Assessment with OR Applications, Springer-Verlag, London. [5] Khalifa A. and Eltoweissy, M. (203), Collaborative autonomic resource management system for mobile cloud computing, Proceedings of the Fourth International Conference on Cloud Computing, GRIDs, and Virtualization, Valencia, Spain, 5-2. [6] Li, X., Li, Y. F., Xie, M. and Ng, S. H. (20), Reliability analysis and optimal version-updating for open source software, Journal of Information and Software Technology, v. 53, n. 9, 929-936. [7] Lyu, M. R. (996), ed., Handbook of Software Reliability Engineering, IEEE Computer Society Press, Los Alamitos, CA. [8] Mikosch, T. (998). Elementary Stochastic Calculus, with Finance in View, Advanced Series on Statistical Science and Applied Probability: v. 6, World Scientific, Singapore. [9] Musa, J. D., Iannino, A. and Okumoto, K. (987). Software Reliability: Measurement, Prediction, Application, McGraw-Hill, New York. [0] Park, J., Yu, H. C. and Lee, E. Y. (202), Resource allocation techniques based on availability and movement reliability for mobile cloud computing, in Distributed Computing and Internet Technology, Lecture Notes in Computer Science, Springer-Verlag, Berlin, v. 754, 263-264. [] Park, N. (20), Secure data access control scheme using type-based re-encryption in cloud environment, in Semantic Methods for Knowledge Management and Communication, Studies in Computational Intelligence, Springer-Verlag, Berlin, v. 38, 39-327. [2] Pettey C. and Goasduff, L. (20), Gartner Special Report: Examines How to Leverage Pattern-Based Strategy to Gain Value in Big Data, Press Releases, Gartner Inc., 27 June. [3] Satty, T. (980). The Analytic Hierarchy Process, McGraw-Hill, New York. [4] Suo, H., Liu, Z., Wan, J. and Zhou, K. (203), Security and privacy in mobile cloud computing, Proceedings of the 9th International Wireless Communications and Mobile Computing Conference, Cagliari, Italy, 655-659. [5] Tamura Y. and Yamada, S. (200), Reliability analysis methods for an embedded open source software, Mechatronic Systems, Simulation, Modelling and Control, A. Milella, D.D. Paola, and G. Cicirelli (eds.), Chapter 3, 239-254, IN-TECH, Vukovar, Croatia. [6] Tamura, Y., Miyahara, H. and Yamada, S. (202), Reliability analysis based on jump diffusion models for an open source cloud computing, Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management, Hong Kong, 752-756. [7] The OpenStack project, OpenStack. [Online]. Available: http://www.openstack.org/ [8] Ullah, N., Morisio, M. and Vetro, A. (202), A comparative analysis of software reliability growth models using defects data of closed and open source software, Proceedings of the 35th IEEE Software Engineering Workshop, Greece, 87-92. [9] Wong, E. (97). Stochastic Processes in Information and Systems, McGraw-Hill, New York. [20] Yamada S. and Fujiwara, T. (200), Testing-domain dependent software reliability growth models and their comparisons of goodness-of-fit, International Journal of Reliability, Quality and Safety Engineering, v. 8, n. 3, 205-28. [2] Yamada, S. (203). Software Reliability Modeling: Fundamentals and Applications, Springer-Verlag, Tokyo/Heidelberg. [22] Yamada, S., Kimura, M., Tanaka, H. and Osaki, S. (994), Software reliability measurement and assessment with stochastic differential equations, IEICE Transactions on Fundamentals, v. E77-A, n., 09-6. Yoshinobu Tamura Dr. Yoshinobu Tamura is an Associate Professor at the Graduate School of Science and Engineering, Yamaguchi University, Japan. Yoshinobu has received several awards including Best Paper Award of the IEEE International Conference on Industrial Engineering and Engineering Management in 202; Research Leadership Award in Area of Reliability from the ICRITO in 200; IEEE Reliability Society Japan Chapter Awards in 2007; and Presentation award of the Seventh International Conference on Industrial Management in 2004. Shigeru Yamada Dr. Shigeru Yamada is a Professor in the Department of Social Management Engineering, Graduate School of Engineering, Tottori University, Japan. He has published over 500 reviewed technical papers in the areas of software reliability engineering, project management, and quality control. Shigeru has received several awards including the Best Paper Award from the IEEE Reliability Society Japan Chapter in 202; Exceptional International Leadership and Contribution Award in Software Reliability at the ICRITO 200 and 20; International Leadership and Pioneering Research Award in Software Reliability Engineering from the SREQOM/ICQRIT in 2009; numerous outstanding paper awards from IEEE and Reliability Engineering associations; and the Best Author Award from the Information Processing Society of Japan in 992.