Software reliability analysis of laptop computers W. Wang* and M. Pecht** *Salford Business School, University of Salford, UK, w.wang@salford.ac.uk ** PHM Centre of City University of Hong Kong, Hong Kong, m.pecht@cityu.edu.hk Abstract. Computers freezing to work is a common problem and can cause unexpected damages if the work is important but not saved yet. This kind of problems is often caused by software failures which are hard to predict. This paper focuses on the reliability analysis of laptop computers experiencing multiple occurrences of failures in operation. The failure has been defined as the event that the computer freezes and has to be restarted unintended or by a forced re-start. The failure data was collected during three experiments running on one laptop computer continuously for a period of time under three different running environments. We fit a number of candidate distributions to the time between failures data. The results showed that the conventional Weibull distribution often used in practice is the best choice for all three data sets. However one interesting finding is that the mean time between failures decreases substantially when the computer is heavily stressed. This shows that if the computer is running on a full load, then the work should be frequently saved to avoid the damage.. Introduction Reliability is defined as the ability of a product to perform as intended (without failure and within specified performance limits) for a specified time period, in its life cycle application environment, Pecht et al. (). A product s health is a description of the state under its specified operating environment. Health monitoring is a method for evaluating a product s health as a means to determine whether and when failure will occur, Ramakrishnan and Pecht (3). We focus specifically laptop computers as the product we refer to in this paper since they are now almost household items widely used by both private individuals and companies. For laptop computers, hardware degradation is difficult to detect due to the complexity of the hardware, consisting of various sub-systems assembled together. One approach used by Vichare et al. was to monitor internal temperatures of laptop computers for health and usage monitoring, Vichare et al. (4). However, laptop computers can fail due to hardware or software failures. Hardware failures are rare though they do occur, but software failures as defined in the abstract are common things observed by virtually everyone who has used laptop computers. Because of the complexity associated with various software installed, the laptop computer is treated as a single system and we are interested in the reliability characteristics of software failures of laptop computers. A primary objective of the paper is to identify whether different intensities of usages influence the time to failure and what is the best distribution for describing the software failures. For the above objective, we designed and conducted a number of experiments to verify whether the intensity of the use of laptop computers do influnnce the software failure characteristics. Anoter objective is to find whether the failure rate is increasing, constant or decreasing. We examined eight field-returned laptop computers which were sent back to the manufacturer by the users who reported having problems with them. The manufacturer tested these computers and found no problems. These computers are of different models and have different system configurations such as processor speed, hard drive space, RAM, etc. Different experiments were designed via stressing the computers. Stressing was done by running different applications under different scenarios on these computers. Two of the eight computers showed multiple failures (as defined in the abstract and also in section ), but others did not. The different performance by these 8 laptop computers might be due to the different configurations and quality of hardware. Since we cannot wait to get sufficient data from each of the 8 computers, analysis for one computer that showed repeated failures is performed and shown in this paper. This may limit our conclusions to this type of computers only but the analysis nevertheless showed some interesting finding. Our hypothesis was that each computer would fail when stressed by using a set of specific applications. Applications that use different levels of memory and processing capability were selected for the experiments. Various applications were run at the same time to cause the CPU of the computer to work at diffeent levels of capacity. The belief was that the intensively used CPU would force the computer to restart or freeze more often.. Failure identification and determination For the purposes of this study we have defined the failure of a laptop as an automatic restart or a forced restart after the system freezes. If the laptop was to restart by itself, it would be considered an automatic
restart. If the computer was not to respond to the commands given by the user and the user was forced to restart the computer to resume the operation, this would be considered a forced restart. A forced restart can be triggered by a system lockdown. In this study, other types of computer failures such as faulty screen, or faulty keyboard, etc are not considered. The times of failure are determined by failure identifier events. Failure identifier events are event messages that are generated by the event log service, an inbuilt logging service in MS Windows, after a restart. The benchmark event used is the EventSystem, which is logged every time the computer restarts. Hence, the occurrence of this event is determined from the data and a gap in time is manually searched to find the time of failure. To understand this, a sample data record is shown below in Table to illustrate the process. Level Date and Time Source Error 4/9/ 5:9:7 Usbperf Information 4/9/ 5::5 User Profile Service Information 4/9/ 5::5 Security-Licensing- SLC Information 4/9/ 5::5 EventSystem Information 4/9/ 5::55 Desktop Window Manager Table Sample of Event Log Data In Table, EventSystem (or alternatively Microsoft-Windows-EventSystem) is found. The time it logged was not the time of failure, but is the time when the computer restarted. Hence, the time when the computer actually froze is found by scanning the data upwards to see a gap in time. At 4/9/ 5:9:7, a gap of around one and a half minutes occurred after the event usbperf. It is assumed that this was the actual time when the computer froze and is designated as the time of failure. The time stamp occurring after it (4/9/ 5::5. EventSystem) is chosen as the start time for the next lifetime. This is done so that we do not count the durations when the computer froze and there is a big gap in time if the user was not present to manually restart the computer. After determining the start time and end time in the lifetime of a computer, the times to failures and time between failures can be found. 3. Experiments We designed three experiments corresponding to three different user environments. 3. Experiment A specific set of applications were identified for this experiment to simulate a user behavior. The applications include Matlab (mathematical tool), Excel (office tool), Real Player (video playback software), and Internet Explorer (web browsing tool). These applications were initiated manually, and if idle, they were re-run at random. For example, when Excel had finshied an application, it may be re-run some time later by a code set-up by us ramdonly selecting the time to restart the same application. This also applies to other application such as playing video and running a Matlab code. Clearly in this experiment, the applications were not run in any specified order and had various level of stressing at various times. There could be times no application was running or all were running at the same time. This is perhaps the most closed situation to an actual user operation. The test computers were kept in room temperature and humidity. The time of failures were found by scanning the data for failure identifier events. Data were collected between /3/ and 4/6/. This amounted to 7446 minutes for the experiment. During this time period, there were a total of 8 failures in computer 4. 3. Experiment Experiment focused on stressing the computers under a more controlled setting. The test computers were stressed by the Matlab application which ran several specified codes in a loop with various uses of the CPU capacity. The Internet access had been cut off so that other applications running by default in the background could not communicate with the Internet. This experiment simulated a user behavior that ran only one application extensively which used up a lot of memory. The test computers were at room temperature and humidity. Data were collected between 4/8/ and 4/3/. During this time, there were a total of 4 failures from computer 4.
Time between failures 5 4 3 3.3 Experiment 3 Experiment 3 was an extension of experiment. In experiment 3, the computer automatically started the Matlab application and ran a specified code in a loop whenever the compute was rebooted. This code was taken from experiment, which has the highest use of the CPU capacity. The aim of experiment 3 was to generate failures that were all associated with the same stressing as a result of running Matlab with a more intensive use of the CPU than experiment. This gave the computer continuous stressing at the same but higher level of processing usage. Data were collected between 5/7/ and 5//. During this period, Time to failure there were a total of 9 failures in computer 4. 6.4 6447.63 794.83 8654.73 955.8 457.95 5.48 3.95 55. 698.3 695.4 7796.57 4. Failure analysis In this study, we fitted the following four distributions to the time between failure (TBF) data of three experiments using the maximum likelihood estimation method. They are Weibull,, and Exponential. The best suited distribution may be selected by the AIC measure, Akaike (98), which is given by AIC = l + N, () where l is the log-likelihood function value at the maximum and N is the number of estimated parameters. The use of AIC balances the log-likelihood value with the number of parameters. The one gives the lowest AIC is the best distribution fitted to the data. 4. Experiment Table shows the result from the data of experiment. Distribution Weibull β x α α β x β α e x α x e β α α β Γ( ) log( ) x α β e xβ π Exponential x e α α Estimated α = 67.49 α =.5 α = 5.573 α = 95. 59 parameter values β =.657 β = 836.9 β =.9388 Mean 93.63 98.45 63.7 95.58 Loglikelihood -65.74-54.977-58.36-64.686 AIC 5.44 33.95 6.7 5.36 χ.55.343.3769 3.83 Cannot be Cannot be Cannot be Can be rejected rejected rejected rejected Table : Estimated parameter values, means and goodness of fit measures from experiment From table we can see that the distribution is the best in terms of the AIC. It is obviously since it has the highest log-likelihood which is another measure of the goodness of fit. The total duration of the experiment is 7446 minutes and there were 8 failures for experiment. If we use the independent and identical PDF assumption for the TBF, then using the renewal theory, Cox (96), the expected number of failures over the experiment period can be approximated by 7446 divided by the mean time between failures. From Table, we have E(TBF)=98.45 from the chosen distribution and therefore, the expected number of failures over 7446 minutes is 78.88 which is not far from the observed number of 8. However, if we choose the exponential distribution we even get a better fit in terms of the difference between the fitted and observed mean numbers of failures. This is because the mean produced by the exponential distribution will be the same to the averaged mean time between failures from the data. However, it can be miss- leading if we only look at the mean numbers. Now we use another goodness of fit test to decide which distribution should be chosen. This is the Chi-Squared test of goodness of fit given below, Corder and Foreman (9), n i χ = = ( Oi Ei ) / Ei (), where the observed failure data is arranged into bins to form a histogram and O i and Ei denote the observed and expected number of failures within the ith bin and n is the number of total bins. We used the statistical tool box in the Matlab to perform this task and the result showed that the exponential distribution can be
rejected at the 5% confidence level, see Table. The other three distributions cannot be rejected but the Weibull produced the smallest Chi-Squared value, and therefore from this statistic, we choose the Weibull distribution. It is clearly that different criterions produced different results, but it is generally accepted that Chi-Squared statistic is a better measure for goodness of fit than the AIC since it uses more information from the data. Using the Weibull distribution we have E(TBF)=93.63 and then the mean number of failures over the experiment period is 79.9 which is very close to the observed 8 failures. Figure shows the Probability Density Functions (PDF) of these four fitted distributions. 4.5 5 x -3 4 Weibull Exponen 3.5 3.5.5.5 4 6 8 4 6 8 Figure. PDFs of TBF of four distributions for experiment The, Weibull and PDFs showed similar shapes but the exponential PDF is no near closer to them. 4. Experiment Table 3 shows the result of the parameter estimation based on the data of experiment. Distribution Weibull Exponential Estimated α =.893 α =.3 α = 3.4459 α = 356.337 parameter value β =.47 β =.5 β =.7846 Mean 54.4 333.75 8.5 356.33 Loglikelihood -33.36-54.78-85.63-69.9 AIC 47.6 33.564 375.64 538.8 χ..97 4.888 3.63 Cannot be Cannot be Cannot be Cannot be rejected rejected rejected rejected Table 3: Estimated parameter values, means and goodness of fit measures from experiment From table 3 we can see that also produced the best fit in terms of the AIC. But the same as experiment and if we use the Chi-Squared measure then Weibull is again the best choice as seen from Table 3. The experiment lasted 453 minutes and produced 4 failures. Following the same way we did for experiment and using the mean of 54.4 from the Weribull distribution, we have the expected number of failures over the experiment period is 56., which is not very good since we have observed 4 failures from the data. However, we have to note the sample size is small in this experiment and comparing just with means can be misleading as we have said before since an exponential distribution will always produce the best fit in terms of the means. We show the PDFs of these four distributions in Figure which shows that again the exponential distribution is singled out from the competition.
4.5 5 x -3 4 Weibull Exponen 3.5 3.5.5.5 4 6 8 4 6 8 Figure. PDFs of TBF of four distributions for experiment 3.3 Experiment 3 In a similar way, Table 4 shows the fitted result based on the data of experiment 3. In this case lognormal should be chosen since it has the smallest AIC. But in terms of the Chi-Squared statistic, we have again the Weibull which produced the smallest χ so we chose Weibull as our distribution. Using the same method as before we have that the expected number of failures using the Weibull distribution is 6.8. Distribution Weibull Exponential Estimated α = 76.63 α =.667 α = 3.7433 α = 5. parameter value β =.787 β = 57.544 β =.84 Mean 93.4 5.75 79.83 5.3 Loglikelihood -4.5-944.63-89. -7. AIC 87 893.4 64.4 4. χ.884.938.5574.7 Cannot be Cannot be Cannot be Cannot be rejected rejected rejected rejected Table 4. Estimated parameter values, means and goodness of fit measures We plot the PDFs of these four distributions in Figure 3. 4.5 5 x -3 4 Weibull Exponen 3.5 3.5.5.5 4 6 8 4 6 8
Figure 3. PDFs of the TBF of four distributions for experiment 3 In this case all PDFs are similar but the means are considerably shorter than the ones in Figures and. We then examined the hazard for the chosen PDF from each experiment. They are shown in Figure 4..3.5 Expt Expt Expt3..5..5 4 6 8 4 6 8 Figure 9. Hazard plot for the chosen PDF of each experiment We can see clearly that the hazards from all experiments decrease. This is not surprising since the hazard for the Weibull distribution when the shape parameter α< is always decreasing. This implies that the probability of the chance failure when the laptop already survived a while is getting smaller and smaller as the time goes. However, the hazard of experiment 3 is larger than experiment which is larger than experiment. This shows that intensive use of CPU will produce more failures. 5. Conclusions Experiments were designed to study the reliability of laptop computers. It was hypothesized that stressing the computers using various applications would cause it to restart or freeze, resulting in a failure. These failures were analyzed. The model parameters were estimated using the maximum likelihood method and two measures of the model fit were used to select the best model. It turned out that AIC is not a informative measure in this analysis since all models were not involved with a large number of unknown parameters. Eventually we used the Chi-Squared statistic to select the best model. From the analysis, Weibull was chosen for all three experiments since this distribution produced the smallest Chi-Squared values in all three experiments. However, the analysis shows that applying different stressing at different levels did produce different results with notable differences. The expected time to failures from the fitted model and observations show a clear trend in that the means are getting shorter when the stress increases. References Pecht, M., Das, D., and Ramakrishnan, A., (), The IEEE standards on reliability program and reliability prediction methods for electronic equipment, Microelectronics Reliability, 4, 59 66. Ramakrishnan, A., and Pecht, M., (3), A life consumption monitoring methodology for electronic systems, IEEE Trans. Components and Packaging Technologies, 6, 65 634. Vichare, N., Rodgers, P., Eveloy, V., and Pecht, M., (4), In situ temperature measurement of a notebook computer A case study in health and usage monitoring of electronics, IEEE Trans. Device and Materials Reliability, 4, 658 663. Akaike, H., (98), Likelihood and the Bayes procedure, Bayesian Statistics, Ed. Bernardo et. al., University Press. Cox, D.R., 96, Renewal theory, Methuen. Corder, G.W., and Foreman, D.I. (9), Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, Wiley