A Fast Host-Based Intrusion Detection System Using Rough Set Theory

A Fast Host-Based Intrusion Detection System Using Rough Set Theory Sanjay Rawat 1,2, V P Gulati 2, and Arun K Pujari 1 1 AI Lab, Dept. of Computer and Information Sciences University of Hyderabad, Hyderabad-500046, INDIA sanjayr@idrbt.ac.in, akpcs@uohyd.ernet.in 2 IDRBT Castle Hills, Road No.1 Masab Tank, Hyderabad-500057, INDIA vp.gulati@tcs.com Abstract. Intrusion Detection system has become the main research focus in the area of information security. Last few years have witnessed a large variety of technique and model to provide increasingly efficient intrusion detection solutions. We advocate here that the intrusive behavior of a process is highly localized characteristics of the process. There are certain smaller episodes in a process that make the process intrusive in an otherwise normal stream. As a result it is unnecessary and most often misleading to consider the whole process in totality and to attempt to characterize its abnormal features. In the present work we establish that subsequences of reasonably small length of sequence of system calls would suffice to identify abnormality in a process. We make use of rough set theory to demonstrate this concept. Rough set theory also facilitates identifying rules for intrusion detection. The main contributions of the paper are the following- (a) It is established that very small subsequence of system call is sufficient to identify intrusive behavior with high accuracy. We demonstrate our result using DARPA 98 BSM data; (b) A rough set based system is developed that can extract rules for intrusion detection; (c) An algorithm is presented that can determine the status of a process as either normal or abnormal on-line. Keywords:Data mining, Decision Table, Rough Set, Intrusion Detection, Anomaly, Misuse. 1 Introduction Intrusion detection systems (IDSs) have become a major area of research and product development. They work on the premise that intrusions can be detected through examinations of various parameters such as network traffic, CPU utilization, I/O utilization, user location, and various file activities. Based on the various approaches, different types of IDS are proposed in the literature. On the basis of audit data, there are two types of IDS. The network-based systems collect data directly from the network that is being monitored, in the form of packets

[29] and the host-based systems collect data from the host being protected [2]. Based on processing of data to detect attacks, IDS can also be classified into two types misuse-based systems and anomaly-based systems. While the former keeps the signatures of known attacks in the database and compares new instances with the stored signatures to find attacks, the latter learns the normal behavior of the monitored system and then looks out for any deviation in it for signs of intrusions. It is clear that misuse based IDS cannot detect new attacks and we have to add manually any new attack signature in the list of known patterns. IDS based on anomaly detection, on the other hand, are capable of detecting new attacks as any attack is assumed to be different from normal activity. However anomaly based IDS sometimes sets false alarms because it cannot differentiate properly between deviations due to authentic user s activity and that of an intruder. Among various IDS approaches, signature-analysis stores patterns of attacks as semantic descriptions [21]. The main drawback of the signature analysis technique, like all misuse-based approaches, is the need for frequent updates to keep up with the stream of new vulnerabilities/attacks discovered. Rule-based intrusion detection [34][20][13] assumes that intrusion attempts can be characterized by sequences of events that lead to the state of compromised-system. Such systems are characterized by their expert system properties that fire rules when audit records or system status information begin to indicate suspicious activity. The main limitations of this approach are the difficulty of extracting knowledge about attacks and the processing speed. State transition analysis technique describes an attack with a set of goals and transitions, and represents them as state transition diagrams [18][19][32]. The most widely used approach of anomaly-based intrusion detection is statistical [16][27]. User or system behavior is measured by a number of variables sampled over time and stored in a profile. The current behavior of each user is maintained in a profile. At regular intervals the current profile is merged with the stored profile. Anomalous behavior is determined by comparing the current profile with the stored profile. Forrest et al [11][12] suggest that system calls trace of a process under normal execution can be taken as its normal behavior in terms of system calls, as variation in sequences of system calls is very small. On the other hand, this variation is relatively higher when compared to a sequence of system calls under abnormal execution. This variation can be attributed to the presence of one or more alien (thus malicious) subsequences in the abnormal process. It should be noted that not all the subsequences of an abnormal process are malicious. Thus intrusive part should be detectable as a subsequence of the whole abnormal sequence of the process. In this paper we present a technique of discovering rules for intrusion detection. We make use of rough set theory for this purpose. To best of our knowledge, Lin was the first to propose the idea of applying rough sets to the problem of anomaly detection [25]. Though the paper lacks the experimental results [25], it provides some solid theoretical background. The following two theorems are important:

1. Every sequence of records in computer has a repeating sequence 2. If the audit trail is long enough, then there are repeating records Following the argument of Forrest et al and in the view of above theorems, our approach is based on subsequences of system calls. We formulate the problem as a classification problem by writing the set of subsequences as a decision table. The proposed method is a combination of signature-based and anomaly based approaches. A program behavior is monitored as a sequence of system calls. These sequences are further converted into the subsequences of shorter length. These subsequences are considered as the signatures for malicious as well as normal activities. By doing so, one of the disadvantages of signature-based approach of frequently updating the signature database can be avoided. Empirical results show that the proposed system is able to detect new abnormal activities without updating the signatures. Further, these signatures are represented in the form of IF-THEN type decision rules. The advantage of representing signatures in this form is that such signatures are easy to interpret for further analysis. Rough set theory is used to induce decision rules. Rules induced by using rough set theory are very compact because before inducing rules, all the redundant features of the audit data are removed. This makes the matching of rules faster, thus making the system suitable for on-line detection. The proposed system is also fast in the sense that process is compared, in parts, as it starts calling system calls. So we do not have to wait until it exits. The major contributions of the paper are: It is established empirically that short sequences of system calls are sufficient to detect intrusive behavior with high accuracy; A rough set based approach is developed that can extract decision rules for intrusion detection; An algorithm is presented that can classify a process as normal or abnormal on-line. Rest of the paper is organized as follows: Section 2 gives an overview of research work on process profiling using sliding window approaches and learning rules for intrusion detection. Section 3 presents some preliminary background to understand the approach. A detailed description of the proposed scheme is given in the section 4. Section 5 covers the experimental setup and analysis of the results. Section 6 concludes the paper. 2 Related Work Recently, process monitoring for the sign of intrusions has attracted the attention of many researchers and active research is being done in this area. In the approach, called time-delay embedding (tide), initiated by Forrest et al [11][12], normal behavior of processes is captured because programs show a stable behavior over the period of time under normal execution. In this approach, short

attribute values. Knowledge representation is very simple and learning rate is very fast as compared to other techniques. Our study shows that it is possible to detect an attack by mare looking at some portion of the abnormal process. This reduces the dimension of the data to be processed and thus makes the subsequent computations much faster. The decision rules induced by rough set theory are easy to interpret and thus can be useful in further analyzing the events. We have tested our scheme by conducting experiments on DARPA 98 data. Empirical results, reported in the paper, justify our approach of making use of rough set for intrusion detection. As our future work, we intend to use the concept of incremental learning so that new rules can be learnt without retraining on whole data. We are also analyzing the IF-THEN rules to better understand the relationship among system calls to gain more insight about attacks. Our future work also includes to combine rough set method with other learning techniques, e.g. neural networks to propose a more robust IDS in terms of accuracy. Acknowledgement The authors are thankful to anonymous reviewers for their useful comments to improve the presentation and quality of the paper. The first author is associated with IDRBT as research fellow and thankful to IDRBT for providing financial assistance and infrastructure to carry out this work. The third author is thankful to MIT, India for its funding. References 1. An A., Huang Y., Huang X., Cercone N.: Feature Selection with Rough Sets for Web Page Classification. In Dubois D., Grzymala-Busse J.W., Inuiguchi M., and Polkowski L. (eds), Rough Sets and Fuzzy Sets, Springer-Verlag (2004) 2. Bace R., Mell P.: NIST special publication on intrusion detection system. SP800-31, NIST, Gaithersburg, MD (2001) 3. Bazan J.: A Comparison of Dynamic and non-dynamic Rough Set Methods for Extracting Laws from Decision Tables, In: Skowron A., Polkowski L.(ed.), Rough Sets in Knowledge Discovery 1, Physica-Verlag, Heidelberg, (1998) 321 365 4. Bazan J., Nguyen H. S., Nguyen S. H., Synak P., and Wrblewski J.: Rough set algorithms in classification problem. In: Polkowski L., Tsumoto S., Lin T.Y. (eds.), Rough Set Methods and Applications, Physica-Verlag, Heidelberg, (2000) 49-88. 5. Bazan J. G., Szczuka M. S., Wrblewski A.: A New Version of Rough Set Exploration System. In: Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing RSCTC, Malvern, PA, Lecture Notes in Artificial Intelligence vol. 2475, Springer-Verlag (2002) 397-404 Available at: http://logic.mimuw.edu.pl/~rses/ 6. Cabrera J. B. D., Ravichandran B., Mehra R. K.: Detection and classification of intrusions and faults using sequences of system calls. In: ACM SIGMOD Record, Special Issue: Special Section on Data Mining for Intrusion Detection and treat Analysis, Vol. 30(4) (2001) 25-34

7. Cai Z., Guan X., Shao P., Peng Q., Sun G.: A Rough Set Theory Based Method for Anomaly intrusion Detection in Computer Network Systems. J Expert System 20(5) (2003) 251-259 8. Cios K., Pedrycz W., Swiniarski Roman W.: Data mining methods for Knowledge discovery. Kluwer Academic Publisher USA, (2000) 9. DARPA 1998 Data Set, MIT Lincoln Laboratory, available at: http://www.ll.mit.edu/ist/ideval/data/data index.html 10. Delic D., Lenz Hans-J, Neiling M.: Improving the Quality of Association Rule Mining by Means of Rough Sets. In: Proceedings of the First International Workshop on Soft Methods in Probability and Statistics (SMPS 02), Warsaw (poland) (2002) 11. Forrest S., Hofmeyr S. A., Somayaji A.: Computer Immunology. Communications of the ACM, 40(10) (1997) 88-96 12. Forrest S., Hofmeyr S. A., Somayaji A., Longstaff T. A.: A Sense of Self for Unix Processes. In: Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy. Los Alamitos, CA. IEEE Computer Society Press, (1996) 120-128 13. Garvey T., Lunt T. F.: Model-based Intrusion Detection. In: Proceedings of the 14th National Computer Security Conference. (1991) 372-385 14. Grzymala-Busse J. W.: A New Version of the Rule Induction System LERS. Fundamenta Informaticae, 31(1) (1997) 27-39 15. Guan J. W., Bell D. A., Liu D. Y.: The Rough Set Approach to Association Rule Mining. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 03), (2003) 16. Helman P., Liepins G.: Statistical Foundations of Audit Trail Analysis for the Detection of Computer Misuse. IEEE Transactions on Software Engineering, 19(9) (1993) 886-901 17. Hofmeyr S. A., Forrest A., Somayaji A.: Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6 (1998) 151-180 18. Ilgun K.: USTAT: A Real-Time Intrusion Detection System for UNIX. In: Proceedings of the 1993 IEEE Symposium on Research in Security and Privacy. (1993) 16-28 19. Ilgun K., Kemmerer R. A., Porras P. A.: State Transition Analysis: A Rule-Based Intrusion Detection Approach. IEEE Transactions on Software Engineering 21(3) (1995) 181-199 20. Kemmerer R. A.: NSTAT: A Model-based Real-time Network Intrusion Detection System. Technical Report, Number TRCS97-18, Computer Science, University of California, Santa Barbara. (1998) 21. Kumar S., Spafford E.: A Pattern-Matching Model for Intrusion Detection. In: Proceedings National Computer Security Conference, (1994) 11-21 22. Lee W., Stolfo S., Chan P.: Learning Patterns from Unix Process Execution Traces for Intrusion Detection. In: Proceedings of the AAAI97 workshop on AI methods in Fraud and risk management. AAAI Press. (1997) 50-56 23. Lee W., Stolfo Salvatore J.: Data Mining Approaches for Intrusion Detection. In: Proceedings of the 7th USENIX Security Symposium (SECURITY-98), Usenix Association, January 26-29. (1998) 79-94 24. Lian-hua Z., Guan-hua Z., Lang YU., Jie Z., Ying-cai B.: Intrusion Detection Using Rough Set Classification. Journal of Zhejiang University SCIENCE Vol. 5(9) (2004) 1076-1086 25. Lin T. Y.: Anomaly Detection: A Soft Computing Approach. In: Proceedings of the 1994 Workshop on New Security Paradigms, Little Compton, Rhode Island, United States, IEEE Computer Society Press (1994) 44-53

26. Lingras P.: Rough Set Clustering for Web Mining. In: Proceedings of the IEEE International Conference on Fuzzy Systems 2002, Honolulu, Hawaii (2002) 27. Lunt T. F.: Using Statistics to Track Intruders. In: Proceedings of the Joint Statistical Meetings of the American Statistical Association (1990) 28. Lunt T. F., Tamaru A., Gilham F., Jagannathan R., Neumann P. G., Javitz H. S., Valdes A., Garvey T. D.: A Real-Time Intrusion Detection Expert System (IDES) Technical Report, SRI Computer Science Laboratory (1992) 29. Mukherjee B., Heberlein L. T., Levitt K. N.: Network Intrusion Detection. IEEE Network. 8(3) (1994) 26-41 30. Mukkamala R., Gagnon J., Jajodia S.: Integrating Data Mining Techniques with Intrusion detection Methods. In: Research Advances in database and Information System Security: IFIPTCII, 13th working conference on Database security, July, USA, Kluwer Academic Publishers (2000) 31. Pawlak Z.: Rough sets: Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991) 32. Porras P. A.: STAT A State Transition Analysis Tool For Intrusion Detection. Technical Report, Number TRCS93-25, Computer Science. University of California, Santa Barbara (1993) 33. Rawat S., Gulati V. P., Pujari A. K.: Frequecy And Ordering Based Similarity Measure For Host Based Intrusion Detection. J Information Management and Computer Security. 12(5), Emerald Press (2004) 411-421 34. Sebring M. M., Shellhouse E., Hanna M. E., Whitehurst R. A.: Expert System in Intrusion Detection: A Case Study. In: Proceedings of the 11th National Computer Security Conference, (1988) 74-81 35. Stefanowski J.: On Rough Set Based Approaches to Induction of Decision Rules. In: Polkowski L, Skowron A (eds) Rough Sets in Data Mining and Knowledge Discovery, vol 1. Physica Verlag, Heidelberg. (1998) 500-529 36. Tandon G., Chan P.: Learning Rules from System Calls Arguments and Sequences for Anomaly Detection. In: ICDM Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL. (2003) 20-29 37. Warrender C., Forrest S., Pearlmutter B.: Detecting Intrusions Using System Calls: Alternative Data Modelss. In: IEEE Symposium on Security and Privacy (1999) 38. Wespi A., Dacier M., Debar H.: Intrusion Detection Using Variable-Length Audit Trail Patter. In : LNCS # 1907, RAID 2000. Toulouse, France. (2000) 110-129 39. Zhu D., Premkumar G., Zhang X., Chu Chao-Hsien: Data mining for Network Intrusion Detection: A comparison of alternative methods. J. Decision Sciences 32(4) (2001) 635-660