Journal of Theoretical and Applied Computer Science Vol. 6, No. 1, 2012, pp. 3-12 ISSN 2299-2634 http://www.jtacs.org Method for detecting software anomalies based on recurrence plot analysis Michał Mosdorf Warsaw University of Technology, Institute of Computer Science, Poland m.mosdorf@stud.elka.pw.edu.pl Abstract: Presented paper evaluates method for detecting software anomalies based on recurrence plot analysis of trace log generated by software execution. Described method for detecting software anomalies is based on windowed recurrence quantification analysis for selected measures (e.g. Recurrence rate - RR or Determinism - DET). Initial results show that proposed method is useful in detecting silent software anomalies that do not result in typical crashes (e.g. exceptions). Keywords: anomaly detection, fault injection, recurrence plot, software dependability 1. Introduction Detection of software anomalies caused by various failures is important part of software dependability methods. It allows us to make decision to undertake corrective actions in case of detected failures. Literature mentions different methods for detecting software failures. Part of the methods for detecting software anomalies is based on building very accurate and often formal assertions that check program invariants. Those methods usually impose program execution overhead and necessity to optimize obtained assertion set. Authors of the [1][2] present technique allowing to detect software faults based on dynamic derivation of detectors that check discovered invariants. Those methods are based on analysis of Dynamic Dependence Graph (DDG) [3][4] that represents dependencies of values observed during program execution. Another example of similar approach is presented by DAIKON tool [5] that can be used for generating assertions set. Different approach for failure detection methods is presented in [6][7][8] where authors describe two techniques EDDI (Error Detection by Duplicated Instructions) and CFC (Control Flow Checking). EDDI technique is based in the idea of duplicating software instructions and inserting additional instructions that compare results obtained from original and redundant instructions. CFC realizes control flow checking by generating sun-time signatures for control change that are verified by different program blocks. This paper evaluates alternative method for detecting software execution anomalies based on the recurrence plot analysis. Presented method focuses rather on detecting anomalies in dynamics of data flow than checking value invariants in examined software. This approach was proposed for the purpose of detecting anomalies that are caused by software errors that do not result in typical application crashes that are relatively easy to detect and compensate. The proposed approach aims at detecting software errors that result in change of overall dynamical properties of software data flow that is characterized by a few recurrence plot quantification measures.
4 Michał Mosdorf This paper is organized as follows. The section two gives a short overview of recurrence plot analysis and used Recurrence Quantification Analysis measures. Next section describes the proposed method for software anomaly detection. Next section describes architecture or artificial software model used for the evaluation of proposed method. Next section describes software anomalies introduced in used software model and events caused by those anomalies. Then the paper describes proposed methodology and presents obtained results. The end of the work contains conclusions and future plans. 2. Recurrence Plot analysis Recurrence plot is a technique for nonlinear data analysis that allows us to investigate recurrent behavior in m-dimensional phase space trajectory through 2 dimensional representation. This technique was first introduced by Poincaré in 1890 [9]. Calculation of the recurrence plot starts with reconstruction of phase space of dynamical system. For this purpose there can be applied time delay method with autocorrelation function that allows us to calculate time delay τ [10]. During the next step there can applied Grassberg-Procaccia method for calculation of dimension required for attractor reconstruction. Recurrence plot that visualizes recurrences is described by the matrix: where: N is the number of considered states xi in m dimensional space, ε is the threshold distance, - a norm and Θ( ) - the Heaviside function. The proposed method for detecting software anomalies is based on windowed Recurrence Quantification Analysis (RQA) for selected measures (e.g. Recurrence rate - RR or Determinism - DET). Anomalies are reported based on change of selected RQA measures. Results of this research focus mainly on two parameters: RR and DET that are obtained with following the equations [10]: (1), (2) where: N number of points on the phase space trajectory, P(l) histogram of the lengths l of the diagonal lines, - neighborhood size. RR measures the density of recurrence points in recurrence plot. DET shows the ratio of recurrence points that form diagonal lines to all recurrence points. 3. Software anomaly detection method Discussed method is based on the idea of comparing results of windowed RQA analysis of traces data generated from program execution without anomalies and program execution that may be influenced by anomalies. Figure 1 shows steps of the proposed method. In the first step the examined software must be executed without anomalies to gather not disturbed execution trace. Trace log contains series of integer values that represent different transitions in the program state (e.g. function calls) In the next step the obtained execution trace is analyzed with autocorrelation function and Grassberg-Procaccia method to determine delay and dimension required for the attractor reconstruction. With those quantities there is performed windowed RQA analysis of the obtained trace log. Time series obtained after this analysis describes dynamical properties of (3)
Method for detecting software anomalies based on recurrence plot analysis 5 not disturbed software execution and it is used as comparison pattern for software anomaly detection. Figure 1. Algorithm of proposed anomaly detection method During anomaly detection process obtained RQA analysis data is compared with RQA data generated from original software execution. At current stage of method development this comparison is performed offline after the completion of software execution. This assumption was made to simplify the evaluation of proposed approach. Future work will be focused on development of method allowing for real-time software anomaly detection and classification of different dynamical states of software. 4. Architecture of tested software The proposed approach was verified during experiments performed on artificial software model that simulated messages flow between separated threads. The aim of this model was to simulate typical data flow between different modules of e.g. real time software divided to separate application threads that can be found in typical software based on operating systems like FreeRTOS or RTEMS. Architecture of tested software is shown in fig 2. Figure 2. Architecture of tested software
6 Michał Mosdorf The prepared software consists of one sender thread that generates message events with Poisson distribution. Each message contained randomly generated designator and additional number describing the amount of time required to process it by receiver thread. This number was also generated randomly with Poisson distribution. Each message was inserted into first queue that connected the sender thread with router thread that was responsible for routing received messages to correct destination queue according to destination designator. In the presented model there were 6 different receiver threads grouped into 3 groups. Each thread group was responsible for receiving messages from given group queue. For the purpose of creating execution trace the selected program points were equipped with log generation procedures. For the whole program there were selected 13 points which represented message generation, send and receive events by different threads. Execution of each selected point resulted in generation of log containing single integer number in range of 1 to 13. 5. Simulated anomalies During the experiment there were collected 6 different execution traces. One for the proper execution and 5 for different simulated software anomalies. Anomalies were introduced artificially and concerned the amount of time required to handle message at destination thread or status of the tread (enabled or disabled, by default all threads were enabled). Each trace was collected for 3 minutes and contained about 14k reported events. The below list provides more details about collected trace logs. Execution without anomalies 1. A1 thread requires 2 times larger time to handle messages 2. A1 thread requires 4 times larger time to handle messages 3. A1 is not working 4. A1 and B1 require 2 times larger time to handle messages 5. A1 and B1 are not working For all the experiments there was made the assumption that if router thread was not able to insert message to receiver queue thread then message was lost (queue was full). There was no particular trace for such kind of event. Figure 3 shows the example of time series gathered for execution without anomalies. Figure 3. Example of time series from execution trace without anomalies
Method for detecting software anomalies based on recurrence plot analysis 7 It is important to notice that the software test model was tuned in such a way that without anomalies the program was working in stable way. The amount of messages in all queues was maintained at low level and none of the messages were lost. Due to introduced anomalies there were observed special events caused by anomalies. The below list gives a short description of those events for anomalies from 1 to 5. (1) Queue A full at 1 minute and 50s (2) Queue A full at 1 minute and 10s (3) Queue A full at 1 minute (4) Queue A full at 1 minute and 40s and queue B full at 2 minutes and 40s (5) Queue A full at 55s and queue B full at 1 minute For the initial examination of obtained trace logs from different executions, all reported program points were counted. Results are presented in the fig. 4. As it is visible the initial inspection of the results is not showing a lot of difference between gathered trace logs. Such kind of inspection can only show differences in number of registered points that were associated with given threads operations. Total number of calls for thread A1 decreases for anomalies 1, 2 and 3 what is caused by introduced anomalies that increase the required time to handle message received from queue (log number 6). Figure 4. Number of different program points occurred in analyzed execution trace logs 6. Analysis of obtained results In the first stage of execution trace analysis without anomalies was analyzed with autocorrelation function and Grassberg-Procaccia method to determine the delay and dimension required for the attractor reconstruction. Also value of was selected based on execution trace without errors (required for recurrence plot calculation). In the next step for the each of the execution traces with anomalies there were created many recurrence plots with window size of 300 samples. For each of the resulting recurrence plots there were calculated selected RQA measures. Figure 5 shows the example of calculated recurrence plot for selected window size for trace log collected from execution without anomalies.
8 Michał Mosdorf Figure 5. Example of calculated recurrence plot for window size of 300 sample of trace log collected from execution without anomalies Figure 6 shows the example of recurrence plot calculated for trace log collected from execution with anomaly 5. It is visible that both presented recurrence plots differ in number and structure of recurrence points. Figure 6. Example of calculated recurrence plot for window size of 300 samples of trace log collected from execution with anomaly 5 Figures 7 and 8 show calculated RR and DET measures for trace logs without anomaly and with anomaly 5. It can be observed that RR and DET series are noisy. It can be noticed on both figures that at about 30% of experiment time series associated with Anomaly 5 drastically change value. This is caused by anomaly 5 event when queues A and B become full. Additionally the value of RR from the beginning of the experiment shows that data from Anomaly 5 execution trace has different dynamic character than original data without anomalies.
Method for detecting software anomalies based on recurrence plot analysis 9 Figure 7. RR measure calculated for trace logs without anomalies and with anomaly 5 Figure 8. DET measure calculated for trace logs without anomalies and with anomaly 5 Due to presence of noise in RR and DET series, some anomalies may be difficult to distinguish from original series. Because of that, figures 9 and 10 show series obtained from original series with moving averaging window with size of 500 samples. After that the anomaly series can be easily distinguished from original data obtained from trace log of system without anomalies.
10 Michał Mosdorf Figure 9. RR measures calculated for all trace logs containing data without anomaly and with all simulated anomalies. Original plot was filtered by moving averaging window with size of 500 samples. Fig. 10. DET measures calculated for all trace logs containing data without anomaly and with all simulated anomalies. Original plot was filtered by moving averaging window with size of 500 samples Presented results show that RR and DET measures from trace log without anomalies maintain rather similar values in relatively small range. This fact is caused by stable character of program execution without anomalies. In case of all introduced anomalies RR measure value after averaging was different than the value computed from trace log without anomalies. This property allows us to distinguish executions with the anomalies from the original one. Additionally it can be observed that values of both measures for trace logs with anomalies change in much greater range. This fact is caused by the effect of the anomalies that caused affected queues to maintain higher
Method for detecting software anomalies based on recurrence plot analysis 11 amount of data and eventually become full. This effect is especially visible in case of Anomaly 5 that causes very rapid increase of amount of messages maintained in queues A and B and queue blockage in relatively short time. 7. Conclusions The presented paper proposed method for software anomaly detection. Described approach is based on the idea of performing windowed RQA analysis on software execution trace logs and making decisions about anomaly detection based on comparison of RQA measures calculated for original not disturbed software execution. For the evaluation purpose, the method was applied to very simple and artificial software model that simulated messages flow between different program threads. For that model there were introduced 5 different anomalies that influenced performance of threads responsible for handling messages. Created anomalies disturbed stable character of the model and caused affected queues to maintain higher level of messages. Results obtained for performed tests showed that RQA measures allowed to distinguish executions with anomaly from original execution. Results of initial study show that recurrence plot analysis can be useful tool suitable for detecting anomalies in software execution. Results show that this approach can help us to detect silent software errors that do not result in typical application crashes (e.g. exceptions). This type of errors may result in change of system statistical behavior or performance degradation. In future this method can be applied for anomaly detection in more complex systems such as kernel of operating system. Drawback of this solution is high computation power required to perform recurrence plot analysis. Due to this, applicability of the method for real time applications will be investigated in future research. Additionally, due to the presence of noise, data obtained from RQA analysis may be difficult to read. In the presented paper there was used additional windowed average to show the differences between anomalies series and original series. Due to that fact making reliable and rapid decision about possible anomaly detection may be difficult. This issue will be investigated in future work. References [1] Pattabiraman K., Kalbarczyk Z., Iyer K. R., Application-Based Metrics for Strategic Placement of Detectors, Dependable Computing, 2005. Proceedings. 11th Pacific Rim International Symposium on, 12-14 Dec. 2005 [2] Pattabiraman K., Saggese G. P., Chen D., Kalbarczyk Z., Iyer K. R., Dynamic Derivation of Application-Specyfic Error Detectors and their Implementation in Hardware, Dependable Computing Conference, 2006. EDCC '06. Sixth European, 18-20 Oct. 2006 [3] Austin T. M., Sohi G. S., Dynamic Dependency Analysis of Ordinary Programs, ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture, 1992 [4] Tip F., A Survey of Program Slicing Techniques, JOURNAL OF PROGRAMMING LANGUAGES, Volume: 5399, Issue: 3, Publisher: Citeseer, Pages: 1-65, 1995 [5] Ernst M.,Cockrell J.,Griswold W., Notkin D., Dynamically Discovering Likely Program Invariants to Support Program Evolution, IEEE Trans. on Software Engineering, 27(2), 2001Trans. on Software Engineering, 27(2), 2001.
12 Michał Mosdorf [6] George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, SWIFT: Software Implemented Fault Tolerance, Proceedings of the 3rd International Symposium on Code Generation and Optimization, 2005. [7] N. Oh, P. P. Shirvani, and E. J. McCluskey. Control-flow checking by software signatures, volume 51, pages 111 122, March 2002. [8] N. Oh, P. P. Shirvani, and E. J. McCluskey. Error detection by duplicated instructions in super-scalar processors.ieee Transactions on Reliability, 51(1):63 75, March 2002. [9] Poincaré H., Sur la probleme des trois corps et les équations de la dynamique, Acta Mathematica 13 (1890) 1 271. [10] Norbert Marwan, M. Carmen Romano, Marco Thiel, Jürgen Kurths, Recurrence plots for the analysis of complex systems, Physics Reports, Volume 438, Issues 5 6, Pages 237 329, January 2007