Data Mining for Network Intrusion Detection: A Comparison of Alternative Methods *

Size: px
Start display at page:

Download "Data Mining for Network Intrusion Detection: A Comparison of Alternative Methods *"

Transcription

1 Decision Sciences Volume 32 Number 4 Fall 2001 Printed in the U.S.A. Data Mining for Network Intrusion Detection: A Comparison of Alternative Methods * Dan Zhu and G. Premkumar Department of Logistics, Operations and MIS, Iowa State University, Ames, IA 50011, dzhu@iastate.edu, prem@iastate.edu Xiaoning Zhang Tellabs Operations, Inc., 4951 Indiana Avenue, Lisle, IL 60532, mzhang@tellabs.com Chao-Hsien Chu School of Information Sciences and Technology, Pennsylvania State University, University Park, PA 16802, chu@ist.psu.edu ABSTRACT Intrusion detection systems help network administrators prepare for and deal with network security attacks. These systems collect information from a variety of systems and network sources, and analyze them for signs of intrusion and misuse. A variety of techniques have been employed for analysis ranging from traditional statistical methods to new data mining approaches. In this study the performance of three data mining methods in detecting network intrusion is examined. An experimental design ( 3 2 2) is created to evaluate the impact of three data mining methods, two data representation formats, and two data proportion schemes on the classification accuracy of intrusion detection systems. The results indicate that data mining methods and data proportion have a significant impact on classification accuracy. Within data mining methods, rough sets provide better accuracy, followed by neural networks and inductive learning. Balanced data proportion performs better than unbalanced data proportion. There are no major differences in performance between binary and integer data representation. Subject Areas: Data Mining, Inductive Learning, Intrusion Detection, Network Security, Neural Networks, Rough Sets, and Telecommunications. INTRODUCTION Information technology has become a key component to support critical infrastructure services in various sectors of our society. In an effort to share information and *This research is supported in part by the Chinese National Science Foundation, grant number Corresponding Author. 1

2 2 Data Mining for Network Intrusion Detection streamline operations, organizations are creating complex networked systems and opening their networks to customers, suppliers, and other business partners (Chin, 1999). While most users of these networks are legitimate users, an open network exposes the network to illegitimate access and use. Increased network complexity, greater access, and a growing emphasis on the Internet have made network security a major concern for organizations. The number of computer security breaches has risen significantly in the last three years. In February 2000, several major web sites including Yahoo, Amazon, E-Bay, Datek, and E-Trade were shut down due to denial-of-service attacks on their web servers. The U.S. General Accounting Office (GAO) disclosed that approximately 250,000 break-ins into Federal computer systems were attempted in one year, and 64% of these attacks were successful (Durst, Champion, Witten, Miller, & Spagnuolo, 1999). Worse yet, the number of attacks is doubling every year. Based on previous studies, the GAO estimates that only 1-4% of these attacks will be detected, and only about 1% will be reported. While traditional approaches to network security have focused on prevention, network intrusion detection has become increasingly important in recent years to enable firms to reduce undetected intrusion. Typically, network intrusion is detected by examining the data trail left by users and searching for abnormal user behavior. Data mining has become a very useful technique to reduce information overload and improve decision making by extracting and refining useful knowledge through a process of searching for relationships and patterns from the extensive data collected by organizations. The extracted information is used to predict, classify, model, and summarize the data being mined. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). Data mining technologies, such as rule induction, neural networks, genetic algorithms, fuzzy logic, and rough sets are used for classification and pattern recognition in many industries (Tam & Kiang, 1992; Chu & Widjaja, 1994; Bigus, 1996; Desai, Crook, & Overstreet, 1996; Zhu & Padman, 1997). They have been extensively used in discriminating normal from abnormal behavior in a variety of contexts (Chung & Tam, 1993; Spangler, May, & Vargas, 1999; Fox, Henning, Reed, & Simonian, 1990; Fanning & Cogger, 1998; Mé, 1998). In recent years data mining techniques have been successfully used in the context of network intrusion detection (Hofmeyr, Forrest, & Somayaji, 1998; Hosmer, 1995; Lee & Stolfo, 2000). Studies have compared the performance of various data mining methods, but the results have been conflicting (Sen & Gibbs, 1994, Sung, Chang, & Lee, 1999; Spangler et al., 1999). Identifying the appropriate method is important in network intrusion since performance in terms of detection accuracy, false alarm rate, and detection time become critical for near real-time network monitoring. Very few studies have examined the relative performance of various methods in the context of intrusion detection (Warrender, Forrest, & Pearlmutter, 1999). This study addresses this research gap. The primary objective of this paper is to compare the detection accuracy of three data mining methods neural networks, inductive learning, and rough sets. Comparing multiple data mining methods will provide us with significant insights into selecting the appropriate model for detecting intrusions. This research, therefore, will have a tremendous impact on electronic commerce and information security.

3 Zhu, Premkumar, Zhang, and Chu 3 In the next section, we provide the research background on data mining and network security, the two areas that form the foundation for this study. Following that, we introduce the three data mining methods used in this study. We then describe the experiments including the network security problem, the data creation and representation, and the experimental design, followed by the results of the study. The final section discusses the results and its implications for future research. BACKGROUD Network Security There are two broad types of techniques for network security protection and detection (Lunt, 1993). Protection techniques are designed to guard hardware, software, and user data against threats from both outsiders and malicious insiders. A common protection device is a firewall that sets up a barrier at the point of connection between the external and the corporate internal networks, and ensures that only valid data are allowed to pass through it. However, firewalls are not foolproof. They require accurate configuration of numerous and confusing access control lists, and continuous updating to allow access to new network services and to keep up with changing security policies. Even properly configured firewalls are known to have weak spots. Also, firewalls cannot prevent attacks from the inside of the network, which is also a frequent source of break-ins. In addition to firewalls, operating systems provide user authentication through passwords and multilevel access control to information. Unfortunately, most mechanisms are powerless against misbehavior by legitimate users who perform unauthorized actions. Intrusion detection systems (IDS) and vulnerability assessment systems (also known as scanners) are good complements for network security. Vulnerability assessment systems, such as SATAN and COPS (Farmer & Spafford, 1990), perform rigorous examinations of systems to identify weaknesses that might allow security violations. These products inspect system configuration files for problematic settings, system password files for weak passwords, and other system objects for security policy violations. Although these systems cannot reliably detect an attack in progress, they can determine the possible weak points for attack. Intrusion detection systems (IDS) collect information from a variety of systems and network sources, and then analyze the information for signs of intrusion and misuse. They can be host-based systems or network-based systems (Lippmann, Haines, Fried, Korba, & Das, 2000). The major functions performed by IDS are: (1) monitoring and analyzing user and system activity, (2) assessing the integrity of critical system and data files, (3) recognizing activity patterns reflecting known attacks, (4) responding automatically to detected activity, and (5) reporting the outcome of the detection process. The success of an intrusion detection system can be characterized by both false alarm rates and detection efficiency (Stillerman, Marceau, & Stillman, 1999). Because the audit services record the occurrence of all security-relevant events, which result in an enormous quantity of audit data, using the correct method to analyze appropriate system data becomes an important issue in having high detection efficiency and a low level of false alarm rate.

4 4 Data Mining for Network Intrusion Detection Intrusion detection can be broadly divided into two categories: misuse detection and anomaly detection. Misuse detection systems detect attacks based on well-known vulnerabilities and intrusions stored in a database (a.k.a., signatures), while anomaly detection systems detect deviations in activity from normal profiles. Misuse detection systems use various techniques including rule-based expert systems, model-based reasoning systems, state transition analysis, genetic algorithms, fuzzy logic, and keystroke monitoring. Rule-based expert systems have been used to encode knowledge about vulnerabilities and past intrusions (Snapp & Smaha, 1992; Porras & Valdes, 1998). This approach has been extensively used in commercial systems. Model-based reasoning systems identify intrusions by comparing user actions with a database of attack scenarios specified in terms of sequence of user behavior (Garvey & Lunt, 1991). State transition analysis is a method in which penetration is viewed as a sequence of actions that take the system from the initial state prior to an attack to the final compromised state after the attack (Porras, 1992). The system converts penetration scenarios into state transition diagrams where successive states are connected by arcs that represent the events required for changing the state. Fuzzy logic is a set of concepts, techniques, and theorems designed to handle vagueness and imprecision. RETISS (Carrettoni, Castano, Martella, & Samarati, 1991) uses fuzzy logic to evaluate the probability of a given threat by using knowledge about all the possible attempts against security in the target system. Keystroke monitoring is based on pattern match with specific keystroke sequences, but has not been extensively used. A major shortcoming of misuse detection is that it is only based on existing knowledge of attacks and vulnerabilities. The rules and logic have to be continuously updated as new forms of attacks are identified. Anomaly detection systems detect deviations from normal behavior, and based on a threshold value determine if it is normal or abnormal behavior. There are various approaches for anomaly detection including statistical analysis, sequence analysis, neural networks, machine learning, and artificial immune system. In statistical analysis, profiles are created for various system objects (users, files, directories, and devices) using various attributes of normal use (number of accesses, time of day, number of logon failures), and intrusions are detected if the values fall outside the normal range. EMERALD (Porras & Valdes, 1998) and NIDES (Lunt et al., 1992) are examples of this approach. Statistical approaches provide well-researched robust procedures, but have their shortcomings. They are insensitive to order of occurrence of events, and it is possible to train the system to consider abnormal behavior as normal behavior over a period of time. Sequence analysis addresses some of those shortcomings by examining the sequence of activities over a fixed window size and comparing normal data with test data to identify anomalous behavior. The unit of analysis is a sequence of system calls rather than individual calls. Hofmeyr et al. (1998) used look ahead pairs (timedelay embedding TIDE) and fixed length sequences of system calls (sequence time-delay embedding STIDE) to set up a normal profile, and compared them with actual system traces using statistical indexes such as local frame frequency and Hamming distance. Neural networks train the neural nets using both normal and abnormal data sets and then use it on actual system traces. It is a robust approach with limited assumptions on data distribution, and automatically

5 Zhu, Premkumar, Zhang, and Chu 5 accounts for correlations between various input measures. However, training neural nets is a time-consuming task. Many studies have explored the use of neural networks for intrusion detection (Bonifacio, Causian, Carvalho, & Moreira, 1997; Debar, Becker, & Siboni, 1992; Fox et al., 1990). Machine learning has been extensively used in pattern recognition and optimization problems. Lee, Stolfo, and Chan (1997) used a machine learning tool called RIPPER for intrusion detection. The system extracts a collection of decision rules from the information provided using an inductive learning algorithm. Forrest, Hofmeyr, Somayaji, and Longstaff (1996), Forrest, Jovornik, Smith, and Perelson (1993), and Kim and Bentley (1999) proposed a unique artificial immune system, simulating human immunology, to detect intrusions. Their primary premise was that a security system, much like the human immune system, should be able to protect itself from unauthorized intruders. Comparison of Data Mining Methods Data mining methods search through a database using specialized algorithms to identify general patterns that are useful in classifying individual observations and in making reasoned predictions about outcomes (Fayyad et al., 1996). A variety of algorithms are used including statistical analysis, multidimensional analysis, neural networks, expert systems, fuzzy logic, rough set theory, intelligent agents, genetic algorithms, machine learning, data visualization, and inductive learning or decision trees (Chung & Gray, 1999; Berry & Linoff, 1997). Each method uses a different search algorithm for searching, extracting, and exploring different kinds of knowledge. Chen, Han, and Yu (1996) classified the techniques for knowledge discovery into six categories: (1) mining of association rules, (2) data generation and summarization, (3) classification, (4) data clustering, (5) pattern-based similarity search, and (6) mining path traversal patterns. Prior research on comparison of data mining methods in different domains has provided mixed results. Many studies have compared the performance of these methods in the context of bankruptcy prediction, but the results have been conflicting. For example, while Tam and Kiang (1992) reported that backpropagation neural nets performed better than discriminant analysis, logit analysis, k-nearest neighbor, and ID3, Weiss and Kapouleas (1989) found the opposite: Inductive learning performed better than neural networks, discriminant analysis, and other statistical methods. Messier and Hansen (1998) and Sung et al. (1999) found that inductive learning outperformed discriminant analysis in bankruptcy prediction. In other contexts, Fanning and Cogger (1998) found that neural networks performed better than traditional statistical methods in identifying fraudulent financial statements. In contrast, Sasisekharan, Seshadri, and Weiss (1994) found that inductive learning performed better than neural networks, discriminant analysis, and nearest neighbor in the network performance field. Chung and Tam (1993) compared three data mining methods across five managerial tasks in construction project assessment and concluded that performance was generally task dependent, although neural networks tended to perform better across task domains. Sen and Gibbs (1994) compared neural networks and logistic regression for analyzing corporate takeover and found little difference in performance among them.

6 6 Data Mining for Network Intrusion Detection There has been very limited research comparing performance of various data mining methods in the intrusion detection domain. Recently, Warrender et al. (1999) examined the performance of four different methods on a suite of data sets consisting of different types of program and intrusion techniques. Hidden Markov models (HMM) provided better accuracy but at a higher computational cost. They found that no single method consistently gave the best results on all programs, and the results between programs varied more than the results between methods. Hofmeyr et al. (1998) and Hofmeyr and Forrest (1999) conducted a series of experiments evaluating various intrusion detection methods using data from sequences of system calls. They found that a short sequence of system calls could detect some common sources of anomalous behavior in some Unix programs. DATA MINING METHODS Three data mining methods were used in this study neural networks, inductive learning, and rough sets. We chose these three methods based on prior research and relevance to our problem context. Two of the three methods, neural networks and inductive learning, have been used in prior studies on intrusion detection. Neural networks have been widely used for data mining and have also been found to be effective in intrusion detection (Lippman & Cunningham, 2000; Bonifacio et al., 1997; Debar et al., 1992; Fox et al., 1990). Inductive learning systems have recently been used in intrusion detection with much success (Lee, Stolfo, & Mok, 1998). Rough sets have been very successful as a data mining technique in many fields including medicine, business, market research, conflict analysis, and other areas. Prior studies on rough sets have found that it consistently performed better than statistical approaches (Dimitras, Slowinski, Susmaga, & Zopounidis, 1999). However, there has been very little research on the use of rough sets for intrusion detection systems. Given that the rough set technique is useful for data reduction, data classification, pattern discovery, and other data mining applications, it should be well suited for intrusion detection. Using rough sets will provide a unique contribution to research on intrusion detection, as it is a new method with limited prior research in this context. Hence, the rough set technique was included as the third method in our study. Neural Networks Neural networks were first inspired by an attempt to mimic the neural functions of the human brain (Rumelhart, Hinton, & Williams, 1986). They are powerful prediction and classification tools, and provide new opportunities for solving difficult problems that have been traditionally modeled using statistical approaches. Among the numerous neural networks that have been proposed, backpropagation networks, as shown in Figure 1, are probably the most popular and widely used. As shown, the network consists of several components: (1) a set of neurons or processing units that receive and send signals from an outside environment or other neurons in the network using three layers input, hidden and output layers; (2) connectivity, which shows the interactivity between neurons; (3) propagation rules, which aggregate input signals from other neurons; (4) activation/transfer functions, which convert the aggregated inputs to output to be sent to other connected

7 Zhu, Premkumar, Zhang, and Chu 7 Figure 1: Backpropagation neural network architecture. a 1 a 2 a n Input Neurons (i) 1 2 n w j1 w j2 w jn Hidden Neurons (j) (Learning) S j = i n = 1 a iw ji O j = f j ( a j, s j ) w 1j w 2j w mj Propagation Rule Activation/ Transfer Function Output Neurons (k) 1 2 m O 1 O 2 O m a 1 a 2 a m (a 1 -O 1 ) (a 2 -O 2 ) (a m -O m ) Computed Outputs Desired Outputs Errors neurons; and (5) learning algorithms, which update the patterns and strength of connectivity. Typically, the network starts with a random set of weights, W ij, and adjusts the weights each time it detects an input-output pair of errors. This process is called learning. During the training period, various classes of training data are fed into the networks. Activation flows from the input layer, through the hidden layer, and then to the output layer. Each neuron receives as input the outputs of all neurons from the previous layer. After input data is applied as a stimulus to the input layer of the network, it is propagated through neurons in each upper layer until an output, O k, is generated. The error signals (a k O k ) are then transmitted backward from the output layer to the middle layer. This process repeats layer by layer and, based on the error signal received, connection weights are then updated to cause the network to converge toward a stable state. A detailed description of the process can be found in Rumelhart et al. (1986). Overall, neural networks are comparable to their statistical counterparts. For real-world problems with high nonlinearity and short memory dynamics, neural networks usually perform better at prediction and classification accuracy. Furthermore, neural network models are more robust, more easily adaptive to a changing environment, and less sensitive to changes in sample size, number of variables, and data distribution. They work well when the form of the mapping function is unknown (Sun, Wang, & Zhu, 1997). Due to its fault tolerance and adaptability to noisy data, neural networks are being used in a growing number of industrial and research applications including pattern recognition in engineering, control, manufacturing, and financial investment (Tam & Kiang, 1992; Chu & Widjaja, 1994; Desai et al., 1996; Zhu & Padman, 1997).

8 8 Data Mining for Network Intrusion Detection Inductive Learning Inductive learning attempts to induce general concepts from examples by creating a decision-tree-like knowledge structure. Each node in the decision tree is labeled with attributes, the edge is labeled with attribute value, and the leaf is labeled with class. The ID3 algorithm (Quinlan, 1984) and its descendants, such as C4.5 (Quinlan, 1993), are simple and yet powerful algorithms of learning from examples. ID3 performs a top-down heuristic search through a problem space and uses information gain as a criterion for selecting the branching attribute of a node. Let the node contain a set of T cases, with C j of the cases belonging to one of the predefined class C j. The information needed for classification in the current node is: inf( T) = C j C j log (1) T T j The value measures the average amount of information needed to identify the class of a case. Assume that using attribute X as the branching attribute will divide the case into n subsets. Let T i denote the set of cases in subset i. The information required for the subset i is inf(t i ), Thus, the expected information required after choosing attribute X as the branching attribute is the weighted average of the subtree information: Inf x ( T) = i T i inf( T. (2) T i ) Thus, the information gain will be: gain( X) = inf( T) inf x ( T). (3) After the branching attribute is selected, the training cases are divided by the different values of the branching attributes. If all examples in one branch belong to the same class, this branch becomes a leaf labeled with that class. If all branches are labeled with a class, the algorithm terminates. ID3 uses the chi-square test to avoid overfitting due to noise. If the χ 2 value is lower than a threshold, then the attribute will not be used. This avoids creating unnecessary branches and complicating the tree. The use of information gain in ID3 has a serious deficiency. It favors tests with many outcomes. C4.5 improves this by using a gain ratio: gain ratio ( X) = gain( X) split inf x ( T), (4) where

9 Zhu, Premkumar, Zhang, and Chu 9 split inf x ( T) = n i = 1 T i log2 T i. (5) T T The attribute with the maximum value on gain ratio (X) is selected as the branching attribute. In addition, to avoid overfitting, C4.5 does not use the χ 2. Instead, it allows the tree to grow and prunes the unnecessary branches later. A detailed discussion of the algorithm can be found in Quinlan (1993). The C4.5 algorithm is able to generate a decision tree based on the data samples. It constructs classification rules in the form of a decision tree, recursively starting at the root. At each node, attribute a i is selected to split the training data into examples, where a i = 0 or 1. This algorithm is then invoked recursively on the two subsets of training data until all examples in one node belong to the same class. At this point, a leaf node is created and labeled as the expected value of the categorical attributes for the records described by the path from the root to that leaf. C4.5 has been tested in many domains and has demonstrated to be a good classification model in machine learning. Rough Sets Rough sets use a mathematical approach to extract knowledge from imprecise and uncertain data. It was introduced by Pawlak (1982) in the early 1980s and was motivated by practical needs in concept formation (Pawlak, Gryamala-Busse, Slowinski, & Ziarko, 1995). A brief tutorial on rough sets is provided in the Appendix. The essence of rough set theory is that objects may be indiscernible in terms of the value of their attributes. A rough set is a set of objects that cannot be precisely characterized based on a set of available attributes. In this case, a pair of lower and upper approximations replaces any vague concept in the set. These two approximations are two basic operations in rough set theory. Rough sets can identify and characterize non-deterministic systems and incorporate probabilistic information in decision making. The rough set theory is characterized by its knowledge representation system (KRS), indiscernible relations, approximation of set, dependency of attributes, reduction of attributes, and decision rules. Let S = < U, Q, V, f > be a KRS, where U = non-empty, finite set of objects, the universal of data. For example, U = { u 1, u 2,, u n }; Q = set of attributes, including a non-empty set of condition attributes C and a non-empty set of decision attributes D, where Q = C Dand C D = Φ; V = U q Q V q, where for each q Q, V q is the domain of attribute q, and the elements of V q are called values of the attribute of q. f = information function that assigns a unique value of the attribute q to each object U. u i

10 10 Data Mining for Network Intrusion Detection Suppose P is a non-empty subset of Q, and u i and u j are members of U. We can associate an approximation space in S by defining a binary indiscernible relation as follows: IND (P) = { ( u i, u j ) U: q P fu ( i, q) = fu ( j, q) }. We say that u i and u j are indiscernible or equivalent by a set of condition attributes P in S IFF q P, fu ( i, q) = fu ( j, q). This indiscernible relation partitions U into several elementary sets. Each elementary set in IND (P) consists of a group of objects which has the same value of attributes; thus, u i and u j are in one elementary set in terms of the attribute subset P. Based on the concept of indiscernible relation, a universe U can be divided into several elementary sets by any subset of the attribute Q. Suppose X is a non-empty subset of C. U is divided into A = { A 1, A 2,, A i } in terms of X, and divided into D = { D 1, D 2,, D j } in terms of D. The lower approximation of set D n, denoted by X D n, is the union of all A m in the positive regions. The upper approximation of set D n, denoted by XD n, is the union of all A m in the positive regions and boundary regions. The dependency of attributes is the relationship between condition attributes C and decision attributes D. Analysis of dependency is used to determine whether D can be characterized by the value of C. It is of primary importance in rough sets to discover data regularities for deriving rules. The dependency of the decision attribute (D) on the condition attribute (C) equals the ratio of the number of objects in the positive regions to the number of objects in the universe U. It can range from zero to one. Another important issue is the identification and elimination of redundant conditions. The objective is to find a subset of attributes that have the same discriminating power as the set of original attributes without losing any essential information. After all redundant attributes have been eliminated, the remaining subset of attributes is called a minimal subset or reduct. Decision rules are generalized based on the non-redundant attributes contained in the chosen reduct. Values for these attributes are then analyzed to identify patterns in the data. The patterns are then expressed as logical statements that link the value of specific conditions with an outcome. The decision rules can be employed to analyze new objects and partition them into different classes. If the new object matches one possible rule, strength for all suggested decision classes in Dec_D in this rule will be assessed, and the new object will be included in the class with the most strength. The performance of decision rules can be measured by the accuracy of decision rules and/or decision coverage. The main advantage of rough sets is that they do not require any preliminary or additional information about the data. The method can work with missing values, switch between different reducts, and use less expensive or alternative sets of measurements. It is able to discover important facts hidden in the data and express them in the natural language of decision rules. The rough sets method offers the ability to handle large amounts of both quantitative and qualitative data. Its ability to model highly nonlinear or discontinuous functional relationships provides a

11 Zhu, Premkumar, Zhang, and Chu 11 powerful method for characterizing complex, multidimensional patterns. It offers transparency of classification decisions, allowing for their argumentation. The rough sets method has been successfully applied in knowledge acquisition, forecasting and predictive modeling, and decision support (Pawlak et al., 1995; Hashemi, LeBlanc, Rucks, & Rajaratnam, 1998; Dimitras et al., 1999; Slowinski & Zopounidis, 1995). EXPERIMENT Data Systems can be monitored at various levels. Various factors including cost, accuracy, and ability to differentiate normal from abnormal behavior influence the choice. Typically, intrusion detection systems monitor either user behavior or privileged processes. Although the former method was more popular earlier (Denning, 1987), recent studies have used the latter method (Lee, Stolfo, & Mok, 1998; Hofmeyr et al., 1998). Privileged processes are programs that require access to system resources that are usually inaccessible to ordinary users. Privileged processes are easier to detect, since, unlike a user with a wide latitude of actions, they perform a specific limited function; the range of behaviors is limited compared to that of users and is fairly stable over time. In Unix, the user has to be granted super-user status to run privileged process. The normal user with super-user status gets a broad range of permissions to perform tasks that are typically not allowed for the user. Normally, the processes are trusted to access-only relevant system resources, but can be misused due to improper configuration or modification of code. The privileged process is observed through system calls that the Unix process uses to access system resources. Hofmeyr et al. (1998) found that short sequences of system calls are a good discriminator for several types of intrusion. A detection system should be reliable and efficient reliable in discriminating between acceptable and unacceptable behavior, and efficient to detect intrusion with nominal use of computer resources. We can record a variety of information from the system calls including timing, parameters passed, instruction sequence, and interactions with other processes. Hofmeyr et al. used the temporal ordering of system calls for intrusion detection. In intrusion detection, as in most data mining operations, a database of normal behavior is developed, and data from system calls are compared with this database to detect abnormal behavior. Traces of system calls generated by a particular program (e.g., sendmail program) are analyzed, and a database is created of all unique sequences of a given length. The following example, based on Hofmeyr et al., illustrates the creation of the database. Let us assume we have the following trace of system calls: Open, read, mmap, mmap, open, read, mmap. For a window size of 3 we get four unique sequences: Open, read, mmap Read, mmap, mmap Mmap, mmap, open Mmap, open, read

12 12 Data Mining for Network Intrusion Detection These sequences are stored as trees, with each tree rooted at a particular system call. To evaluate a new trace, overlapping sequences of length K in a new trace are compared with the database of normal trace, and those that do not occur in the database are considered as mismatches. The number of mismatches, both the raw number and the percentage of total number of matches, is an indicator of abnormal behavior. Abnormal behavior can be both legal and illegal actions. The normal database may not have all possible actions, and some legal infrequent sequences may be termed abnormal by the comparison procedure. Identifying them may be as important as identifying illegal behavior, since these may signal other non-security problems with the system. In this study we used the above procedure to capture data. The data used in this study is based on an immune system developed at the University of New Mexico (Lee & Stolfo, 2000; Lee et al., 1998). It is for one privileged program sendmail. The data includes both normal and abnormal traces. The normal trace is a trace of the sendmail daemon and several invocations of the sendmail programs. During the period of collecting these traces, there are no intrusions or any suspicious activities happening. The abnormal traces contain several traces including intrusions that exploit well-known problems in Unix systems. For example, Sunsendmailcp (SSCP) is a script that sendmail uses to append an message to a file, but when used on a file such as /.rhosts, a local user may obtain root access. Syslog attack uses the syslog interface to overflow a buffer in sendmail. Forwarding loops occur in sendmail when a set of files in $home/.forward form a logical circle. In our study, intrusion traces include five error conditions of forwarding loops, three sunsendmailcp (sscp) attacks, two traces of the syslog-remote attacks, two traces of the syslog-local attacks, two traces of the decode attacks, and two traces of unsuccessful intrusion attempts sm5x and sm565a. Detailed descriptions of these intrusions can be found in Hofmeyr et al. (1998). Each trace has two attributes: the first one is the process ID, indicating the process the system call belongs to; and the second one is the system call value. There are a total of 182 kinds of system calls. The system calls are converted from strings to integer values using a lookup table. Table 1 illustrates a section of one sendmail trace from a single process with the process ID, the actual system calls, and its integer value in the three rows. These traces can be normal or abnormal. Data Preprocessing Prior research indicates that short sequences of system calls made by a program during its normal execution are very consistent and can be used for anomaly detection (Hofmeyr et al., 1998). Our objective is to recognize the different patterns of normal and abnormal behavior by using various learning algorithms. First, we need to set up these sequences from the original data sets. One system call and N 1 subsequent system calls in the same process comprises one sequence of length N. Sequences of system calls from normal traces would be normal sequences, while those from suspicious traces are compared with sequences from normal traces, and if no match is found, will be termed abnormal sequences. Table 2 shows two sequences labeled. Using a sliding window of length N, all traces are searched and two data sets are created, one consisting of normal sequences and another consisting of abnormal ones. The selection of sequence length is determined by

13 Zhu, Premkumar, Zhang, and Chu 13 Table 1: Sendmail trace. Values Process ID System Call Number System Call* write fork sstk sstk write sethostid sstk *The system calls in last row will not appear in the data set. Table 2: Normal and abnormal sequence System Call Sequences Length 7 Class Labels normal abnormal two conflicting criteria. While a short sequence helps to minimize computation and database size, it may not be adequate to discriminate normal from abnormal behavior. Prior research (Hofmeyr et al., 1998) indicates that a window size of around 6-7 is appropriate for most instances. The length of the sliding window was set to 7 to facilitate easy comparison of results with prior studies. Eventually, 1,112 normal sequences and 1,576 abnormal sequences were identified from the data. The number of abnormal sequences for each intrusion type is listed in Table 3. Experimental Design While the primary objective of our study was to compare the performance of the three data mining methods, we were also interested in evaluating the impact on performance due to two other variables, data proportion and data representation, on performance. Hence, an experimental design ( ) incorporating all the three variables was used. The design includes: Data mining method (3) neural networks, inductive learning, and rough sets. Data set representation (2) binary and integer Data set proportion (2) balanced and unbalanced The three data mining methods were discussed in the earlier section. For neural networks, 42 input neurons were used since we had seven attributes, each represented by six bits. Output neurons were set at 2 since it was a binary decision of yes or no. The number of hidden layers was set at 1, and hidden neurons were set at 15. A training cycle of 1,000 with a learning rate of 1.0 was used. The data representation can be in binary or integer form. In integer representation the original integer values of the system calls were used directly in the training system, but they were considered as qualitative attributes instead of quantitative numbers. Since the window size was 7, the number of attributes or input units in each record was 7. In binary representation each system call was changed to 6 bits valued at 0 or 1. Hence, the number of condition attributes was

14 14 Data Mining for Network Intrusion Detection Table 3: Sequences of system calls in different traces. Traces # of Sequences Normal 1112 Total 1112 Abnormal Decode 16 Forwarding loops 258 Sunsendmailcp 219 Syslog-local 359 Syslog-remote 439 Sm565a 23 Sm5x 262 Total instead of 7. In both representations the output unit had one value, a normal sequence classified as 0, and an abnormal sequence classified as 1. The proportion of normal and abnormal sequences in the training and testing data set is another important variable (Wilson & Sharda, 1994). The proportion rate can affect performance in multiple ways. Some methods do not perform well when the number of records of abnormal traces (base rate) is very low, since it may not be able to identify all the features necessary for classification. The base rate proportion in the training data set could be different from the testing data set. For a system to be robust it should be able to work with different proportions in the testing data after learning all the classification rules. In this study we used two data proportions, balanced and unbalanced, based on whether the proportion of normal and abnormal sequences is equal or not. Multiple data points are required in each of the 12 cells of the experimental design to statistically evaluate the research model. Hence, a three-fold cross-validation approach was used to create the data sets. For balanced proportion we split the normal data set into three parts, each consisting of about 370 sequences. Each time, we put two parts in the training set and combined them with the same number of abnormal sequences. The remaining normal data are placed in the testing set, combined with 300 abnormal sequences that are different from those in the training set. For the unbalanced proportion, the training data consists of 80% normal sequences and 20% abnormal sequences. The testing data consists of the remaining 20% of normal sequences and the remaining abnormal sequences. The three-fold approach generates data sets for three experiments. The exercise is repeated three times by randomly generating three different sets of partitions. Hence, data sets for nine experiments in each cell are created for a total of 108 data points. However, since only binary representation is used in neural networks, only 10 cells are feasible, thereby providing 90 data points for this study (see Table 4).

15 Zhu, Premkumar, Zhang, and Chu 15 Table 4: Data proportion Balanced and unbalanced. Training Testing Proportion Type Normal Abnormal Normal Abnormal Balanced Unbalanced Table 5: Accuracy rates for different learning methods. Neural Networks Binary Representation Inductive Learning Rough Sets Integer Representation* Inductive Learning Rough Sets Balanced Proportion Unbalanced Proportion *Integer representation is not possible for neural networks. Table 6: ANOVA analysis. Mean Square df F-Value Significance Model Residual Main Effects Method Representation Proportion Method * Representation Method * Proportion Representation * Proportion Method * Representation * Proportion RESULTS Table 5 shows the average classification accuracy rate for various combinations of data proportion, data representation, and data mining methods. The columns represent the two data representation methods and each data mining method under it, and the rows represent the two data proportions. A cursory analysis of the data clearly indicates that there are differences in performance due to the three variables. The results of ANOVA analysis, examining the impact of the three variables on performance, are presented in Table 6. The results indicate that the overall model is significant at p < Two of the three variables, data mining method and data proportion, are significant at p < All the interaction effects, except method*proportion, are also significant.

16 16 Data Mining for Network Intrusion Detection DISCUSSION Although ANOVA tests the overall model and evaluates the impact of the three variables on performance, it does not provide explanations on the reasons for the difference and an understanding of the interactions. A more detailed analysis of the impact of the individual variables provides these explanations. Table 7 provides the results of t-tests for each of the individual variables, which are discussed in detail below. Data Mining Method The results of ANOVA provide sufficient empirical evidence that the data mining method has the most influence on performance. The results of Duncan s test, shown in Table 7a, indicate that the performance of the three data mining methods is significantly different. Rough sets had the best performance, followed by neural networks and, finally, inductive learning. The finding that the neural network model is better than inductive learning is consistent with a few prior studies (Tam & Kiang, 1992; Chung & Tam, 1993). However, it should be noted that other studies (Weiss & Kapouleas, 1989; Sashisekharan et al., 1994) have found conflicting results, and Chung and Tam (1993) claimed that the results are dependent on the problem context. Since prior studies have not compared the performance of these two methods in the context of intrusion detection, based on our results we can claim that neural networks perform better than inductive learning in the IDS context. More studies may have to be conducted to conclusively validate this finding. There have been no studies comparing rough sets with the other two methods. The results of this study are very encouraging since rough sets perform better than neural networks, which is a very popular method in this area. Its performance in the context of intrusion detection is also noteworthy since studies have not explored use of this method for intrusion detection. Data Proportion Balanced proportion was significantly better than unbalanced proportion in classification accuracy, which is consistent with the results from prior studies (Wilson & Sharda, 1994). Balanced proportion had equal amounts of normal and abnormal sequences in the training set. It indicates that a greater number of sequences of one class in the training set leads to better learning and a more accurate classification. Another important difference between the two proportions is the source of abnormal sequences in the training set. While the balanced proportion contains all the intrusion traces for abnormal sequences, there are only four traces of abnormal sequences in the second. For all classifiers, an adequate amount of training samples is required to achieve satisfactory performance. The lower accuracy rate in the unbalanced proportion could be attributed to the possibility of losing some important abnormal patterns. Data Representation The results of the t-test, shown in Table 7c, comparing the two data representation schemes, indicate that there is no significant difference in classification accuracy

17 Zhu, Premkumar, Zhang, and Chu 17 Table 7: Performance comparison I. a. Comparison of data mining method. Variable Mean (SD) Inductive-Significance Rough Sets-Significance Neural networks (2.30) Inductive learning (7.08).0001 Rough sets (3.87) b. t-test proportion. Variable Mean (SD) t-value (df) Significance Balanced (9.6) 2.91 (88).005 Unbalanced (13.88) c. t-test representation. Variable Mean (SD) t-value (df) Significance Integer (13.85).64 (88).521 Binary (11.49) between the two representations. Data mining methods are able to learn from the data set regardless of the representation. Data representation does not significantly impact performance since we are capturing the same phenomenon in different representation formats. The data in this study were primarily discrete values of qualitative variables (system calls), which could be one reason for the lack of significance. The results may be different in other contexts where the values are continuous variables (Zhu & Padman, 1997). Overall Performance Although the individual analysis provides an understanding of the impact of each variable, it is useful to study the interactions among the variables. Table 8 provides information on the mean values for each of the 10 variations of the three factors and the results of t-test, testing for statistical significance. Table 8a indicates that while data proportion has a significant impact on all three methods, it is least pronounced in the context of neural networks. The ability to have good classification accuracy in unbalanced data proportion provides greater flexibility in practice, where in some contexts one may not be able to obtain large data sets with abnormal cases for training. As expected, the impact of data representation for the two data mining methods is negligible (Table 7b). An interesting analysis is the comparison of data representation and data proportion. The difference between balanced and unbalanced proportion is more significant for integer representation than binary representation. Binary representation performs better than integer representation for unbalanced proportion, but the difference is not statistically significant. Data mining methods need to perform well in both normal and abnormal cases. Classification accuracy is determined by the ability of the system to identify normal cases as normal and abnormal cases as abnormal. The system should minimize errors of identifying normal cases as abnormal behavior, which could lead to

18 18 Data Mining for Network Intrusion Detection Table 8: Performance comparison II. a. Comparison of method and proportion. Neural Networks Mean (SD) Inductive Learning Mean (SD) Rough Sets Mean (SD) Balanced (1.21) (2.54) (2.23) Unbalanced (2.77) (3.61) (3.77) t-value Significance b. Comparison of method and representation. Inductive Learning-Mean (SD) Rough Sets-Mean (SD) Integer (5.96) (5.08) Binary (8.23) 76 (2.19) t-value Significance c. Comparison of representation and proportion. Binary-Mean (SD) Integer-Mean (SD) t-value Significance Balanced (10.15) (9.06) Unbalanced (11.94) (16.64) t-value Significance Table 9: Classification Normal and abnormal cases. Correct Classification Neural Networks Inductive Learning Rough Sets Balanced Normal Abnormal Unbalanced Normal Abnormal many false alarms, or by identifying abnormal as normal behavior, which could lead to intrusions. The classification accuracy for the three data mining methods and two data proportions for both cases (normal and abnormal) are provided in Table 9. Neural networks classify 65.52% of normal cases as normal and 77.37% of abnormal cases as abnormal. It is rather conservative in classification because the probability of accepting normal behavior as abnormal is greater than the probability of accepting abnormal behavior as normal. Inductive learning has a very high classification accuracy of normal cases in unbalanced proportion but its accuracy

19 Zhu, Premkumar, Zhang, and Chu 19 drops significantly in identifying abnormal cases. This is a serious issue since it accepts almost 74% of abnormal cases as normal cases, thereby creating a false sense of security even when intrusions are taking place in the network. Rough sets are very good in identifying normal cases as normal, thereby considerably reducing false alarm rates. However, its performance in identifying abnormal cases is lower and it has lower prediction accuracy than neural networks. All three methods have lower accuracy in classifying abnormal sequences for unbalanced proportion. The relatively limited number of sequences in the data set may be inadequate to learn all the patterns and develop classification rules. CONCLUSIONS The tremendous growth in the Internet and electronic commerce has created serious challenges to network security. Advances in data mining and knowledge discovery provide new approaches to network intrusion detection. In this study, experiments were conducted to evaluate the prediction accuracy of three different data mining methods to sequences of system calls in sendmail privileged processes. An experimental design ( ) was created to evaluate the impact on the classification accuracy of intrusion detection systems due to three data mining methods, two data representation formats, and two data proportion schemes. The results indicated that data mining method and data proportion had a significant impact on classification accuracy. Among data mining methods, rough sets provided better accuracy, followed by neural networks, and then inductive learning. Balanced data proportion performed better than unbalanced data proportion. There were no major differences in performance between binary and integer data representation. To the best of our knowledge, our research was the first attempt to evaluate and compare multiple data mining methods including rough sets in the IDS context. This study provides opportunities for exploring new directions for future research. The data used in this study was created from a limited set of programs in a single environment. The data set can be expanded to include more variations in settings, to enable us to generalize the results for a broader set of parameters. We could expand to include more programs/processes within the Unix operating system. We could also include new versions of intrusions. Another possibility is to extend the testing to other operating systems such as Linux, NT, or other Unix versions. Data mining techniques are only as good as the training data that help these techniques make decisions. A key issue is where should we get the training data real-life data or artificially created data, or a combination of both? While real-life data provides a better representation of the activity, data from certain features of the program that are rarely invoked may be missing. Artificial data could be used to incorporate those features as a supplement to real-life data. Also, the feasibility of self-learning systems needs to be explored so that systems learn from detection of new intrusions in their daily operations. In this study the testing was primarily offline, using data collected earlier. Ideally, IDS should be real-time, so that it can detect intrusions as and when they are occurring. False alarm rates and computational efficiency become very significant in these situations. The system should not be a drain on the computing

20 20 Data Mining for Network Intrusion Detection resources of the server, and also should not generate too many false alarms to become ineffective. System architecture issues need to be addressed while designing those systems. [Received: October 2, Accepted: August 22, 2001.] REFERENCES Berry, M. J. A., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. New York: John Wiley & Sons, Inc. Bigus, J. (1996). Data mining with neural networks Solving business problems from application development to decision support. New York: McGraw-Hill. Bonifacio, J. M. Jr., Cansian, A. M., Carvalho, A. C. P. L. F., & Moreira, E. S. (1997). Neural networks applied in intrusion detection systems. Proceedings of the International Conference on Computational Intelligence and Multimedia Application. Gold Coast, Australia, Carrettoni, F., Castano, S., Martella, G., & Samarati, P. (1991). RETISS: A real time security system for threat detection using fuzzy logic. Proceedings of 25 th IEEE International Carnahan Conference on Security Technology, Taipei, Taiwai ROC. Chen, M. S., Han, J., & Yu, S. (1996). Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Engineering, 8, Chin, S. K. (1999). High confidence design for security. Communication of the ACM, 42(7), Chu, C. H., & Widjaja, D. (1994). A neural network system for forecasting method selection. Decision Support Systems, 12, Chung, H. M., & Gray, P. (1999). Special section: Data mining. Journal of Management Information Systems, 16(1), Chung, H. M., & Tam, K. Y. (1993). A comparative analysis of inductive learning algorithms. Intelligent Systems in Accounting, Finance and Management, 2 (1), Debar, H., Becker, M., & Siboni, D.(1992). A neural network component for an intrusion detection system. Proceedings of the 1992 IEEE Symposium on Research in Security and Privacy. Oakland, CA: IEEE Computer Society Press, Denning, D. E. (1987). An intrusion-detection model. IEEE Transaction on Software Engineering, 12(2), Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95, Dimitras, A. I., Slowinski, R., Susmaga, R., & Zopounidis, C. (1999). Business failure prediction using rough sets. European Journal of Operational Research, 114,

Intrusion Detection using Sequences of System Calls. Steven A. Hofmeyr Stephanie Forrest Anil Somayaji

Intrusion Detection using Sequences of System Calls. Steven A. Hofmeyr Stephanie Forrest Anil Somayaji Intrusion Detection using Sequences of System Calls Steven A. Hofmeyr Stephanie Forrest Anil Somayaji Dept. of Computer Science University of New Mexico Albuquerque, NM 87131-1386 {steveah,forrest,soma}@cs.unm.edu

More information

Host-based Intrusion Detection System using Sequence of System Calls

Host-based Intrusion Detection System using Sequence of System Calls Volume-4, Issue-2, April-2014, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Available at: www.ijemr.net Page : 241-247 Host-based Intrusion Detection System using Sequence

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING. Mahmood Hossain and Susan M. Bridges

A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING. Mahmood Hossain and Susan M. Bridges A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING Mahmood Hossain and Susan M. Bridges Department of Computer Science Mississippi State University, MS 39762, USA E-mail: {mahmood,

More information

Intrusion Detection for Grid and Cloud Computing

Intrusion Detection for Grid and Cloud Computing Intrusion Detection for Grid and Cloud Computing Author Kleber Vieira, Alexandre Schulter, Carlos Becker Westphall, and Carla Merkle Westphall Federal University of Santa Catarina, Brazil Content Type

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 21 CHAPTER 1 INTRODUCTION 1.1 PREAMBLE Wireless ad-hoc network is an autonomous system of wireless nodes connected by wireless links. Wireless ad-hoc network provides a communication over the shared wireless

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Host-Based Intrusion Detection Using User Signatures

Host-Based Intrusion Detection Using User Signatures Host-Based Intrusion Detection Using User Signatures Seth Freeman Rensselaer olytechnic Institute 110 8th Street freems@cs.rpi.edu Alan Bivens Rensselaer olytechnic Institute 110 8 th street bivenj@cs.rpi.edu

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION

FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION Susan M. Bridges Bridges@cs.msstate.edu Rayford B. Vaughn vaughn@cs.msstate.edu 23 rd National Information Systems Security Conference

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM MS. DIMPI K PATEL Department of Computer Science and Engineering, Hasmukh Goswami college of Engineering, Ahmedabad, Gujarat ABSTRACT The Internet

More information

Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks

Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks James Cannady Georgia Tech Information Security Center Georgia Institute of Technology Atlanta, GA 30332-0832 james.cannady@gtri.gatech.edu

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Neural Networks for Intrusion Detection and Its Applications

Neural Networks for Intrusion Detection and Its Applications , July 3-5, 2013, London, U.K. Neural Networks for Intrusion Detection and Its Applications E.Kesavulu Reddy, Member IAENG Abstract: With rapid expansion of computer networks during the past decade, security

More information

A Fast Host-Based Intrusion Detection System Using Rough Set Theory

A Fast Host-Based Intrusion Detection System Using Rough Set Theory A Fast Host-Based Intrusion Detection System Using Rough Set Theory Sanjay Rawat 1,2, V P Gulati 2, and Arun K Pujari 1 1 AI Lab, Dept. of Computer and Information Sciences University of Hyderabad, Hyderabad-500046,

More information

Intrusion Detection System using Log Files and Reinforcement Learning

Intrusion Detection System using Log Files and Reinforcement Learning Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT

More information

Keywords - Intrusion Detection System, Intrusion Prevention System, Artificial Neural Network, Multi Layer Perceptron, SYN_FLOOD, PING_FLOOD, JPCap

Keywords - Intrusion Detection System, Intrusion Prevention System, Artificial Neural Network, Multi Layer Perceptron, SYN_FLOOD, PING_FLOOD, JPCap Intelligent Monitoring System A network based IDS SONALI M. TIDKE, Dept. of Computer Science and Engineering, Shreeyash College of Engineering and Technology, Aurangabad (MS), India Abstract Network security

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

A Neural Network Based System for Intrusion Detection and Classification of Attacks

A Neural Network Based System for Intrusion Detection and Classification of Attacks A Neural Network Based System for Intrusion Detection and Classification of Attacks Mehdi MORADI and Mohammad ZULKERNINE Abstract-- With the rapid expansion of computer networks during the past decade,

More information

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Eleazar Eskin Computer Science Department Columbia University 5 West 2th Street, New York, NY 27 eeskin@cs.columbia.edu Salvatore

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

CSC574 - Computer and Network Security Module: Intrusion Detection

CSC574 - Computer and Network Security Module: Intrusion Detection CSC574 - Computer and Network Security Module: Intrusion Detection Prof. William Enck Spring 2013 1 Intrusion An authorized action... that exploits a vulnerability... that causes a compromise... and thus

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Web Application Security

Web Application Security Web Application Security Richard A. Kemmerer Reliable Software Group Computer Science Department University of California Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~rsg www.cs.ucsb.edu/~rsg/

More information

Computer Network Intrusion Detection, Assessment And Prevention Based on Security Dependency Relation

Computer Network Intrusion Detection, Assessment And Prevention Based on Security Dependency Relation Computer Network Intrusion Detection, Assessment And Prevention Based on Security Dependency Relation Stephen S. Yau and Xinyu Zhang Computer Science and Engineering Department Arizona State University

More information

Intrusion Detection Systems. Overview. Evolution of IDSs. Oussama El-Rawas. History and Concepts of IDSs

Intrusion Detection Systems. Overview. Evolution of IDSs. Oussama El-Rawas. History and Concepts of IDSs Intrusion Detection Systems Oussama El-Rawas History and Concepts of IDSs Overview A brief description about the history of Intrusion Detection Systems An introduction to Intrusion Detection Systems including:

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

A Framework for Intelligent Online Customer Service System

A Framework for Intelligent Online Customer Service System A Framework for Intelligent Online Customer Service System Yiping WANG Yongjin ZHANG School of Business Administration, Xi an University of Technology Abstract: In a traditional customer service support

More information

Enhanced data mining analysis in higher educational system using rough set theory

Enhanced data mining analysis in higher educational system using rough set theory African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

More information

CMSC 421, Operating Systems. Fall 2008. Security. URL: http://www.csee.umbc.edu/~kalpakis/courses/421. Dr. Kalpakis

CMSC 421, Operating Systems. Fall 2008. Security. URL: http://www.csee.umbc.edu/~kalpakis/courses/421. Dr. Kalpakis CMSC 421, Operating Systems. Fall 2008 Security Dr. Kalpakis URL: http://www.csee.umbc.edu/~kalpakis/courses/421 Outline The Security Problem Authentication Program Threats System Threats Securing Systems

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks 2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks Reyhaneh

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

CS 356 Lecture 17 and 18 Intrusion Detection. Spring 2013

CS 356 Lecture 17 and 18 Intrusion Detection. Spring 2013 CS 356 Lecture 17 and 18 Intrusion Detection Spring 2013 Review Chapter 1: Basic Concepts and Terminology Chapter 2: Basic Cryptographic Tools Chapter 3 User Authentication Chapter 4 Access Control Lists

More information

Outline Intrusion Detection CS 239 Security for Networks and System Software June 3, 2002

Outline Intrusion Detection CS 239 Security for Networks and System Software June 3, 2002 Outline Intrusion Detection CS 239 Security for Networks and System Software June 3, 2002 Introduction Characteristics of intrusion detection systems Some sample intrusion detection systems Page 1 Page

More information

An Artificial Immune Model for Network Intrusion Detection

An Artificial Immune Model for Network Intrusion Detection An Artificial Immune Model for Network Intrusion Detection Jungwon Kim and Peter Bentley Department of Computer Science, University Collge London Gower Street, London, WC1E 6BT, U. K. Phone: +44-171-380-7329,

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation Shanofer. S Master of Engineering, Department of Computer Science and Engineering, Veerammal Engineering College,

More information

Prediction of DDoS Attack Scheme

Prediction of DDoS Attack Scheme Chapter 5 Prediction of DDoS Attack Scheme Distributed denial of service attack can be launched by malicious nodes participating in the attack, exploit the lack of entry point in a wireless network, and

More information

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume1 Issue 3 Dec 2012 Page No. 151-155 Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

More information

Network- vs. Host-based Intrusion Detection

Network- vs. Host-based Intrusion Detection Network- vs. Host-based Intrusion Detection A Guide to Intrusion Detection Technology 6600 Peachtree-Dunwoody Road 300 Embassy Row Atlanta, GA 30348 Tel: 678.443.6000 Toll-free: 800.776.2362 Fax: 678.443.6477

More information

SURVEY OF INTRUSION DETECTION SYSTEM

SURVEY OF INTRUSION DETECTION SYSTEM SURVEY OF INTRUSION DETECTION SYSTEM PRAJAPATI VAIBHAVI S. SHARMA DIPIKA V. ASST. PROF. ASST. PROF. MANISH INSTITUTE OF COMPUTER STUDIES MANISH INSTITUTE OF COMPUTER STUDIES VISNAGAR VISNAGAR GUJARAT GUJARAT

More information

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme Efficient Detection for DOS Attacks by Multivariate Correlation Analysis and Trace Back Method for Prevention Thivya. T 1, Karthika.M 2 Student, Department of computer science and engineering, Dhanalakshmi

More information

Introduction... Error! Bookmark not defined. Intrusion detection & prevention principles... Error! Bookmark not defined.

Introduction... Error! Bookmark not defined. Intrusion detection & prevention principles... Error! Bookmark not defined. Contents Introduction... Error! Bookmark not defined. Intrusion detection & prevention principles... Error! Bookmark not defined. Technical OverView... Error! Bookmark not defined. Network Intrusion Detection

More information

CIS 433/533 - Computer and Network Security Intrusion Detection

CIS 433/533 - Computer and Network Security Intrusion Detection CIS 433/533 - Computer and Network Security Intrusion Detection Professor Kevin Butler Winter 2011 Computer and Information Science Intrusion An Authorized Action (or subversion of auth)... That Can Lead

More information

Artificial Neural Networks for Misuse Detection

Artificial Neural Networks for Misuse Detection Artificial Neural Networks for Misuse Detection James Cannady School of Computer and Information Sciences Nova Southeastern University Fort Lauderdale, FL 33314 cannadyj@scis.nova.edu Abstract Misuse detection

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Intrusion Detection. Overview. Intrusion vs. Extrusion Detection. Concepts. Raj Jain. Washington University in St. Louis

Intrusion Detection. Overview. Intrusion vs. Extrusion Detection. Concepts. Raj Jain. Washington University in St. Louis Intrusion Detection Overview Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse571-14/

More information

Taxonomy of Intrusion Detection System

Taxonomy of Intrusion Detection System Taxonomy of Intrusion Detection System Monika Sharma, Sumit Sharma Abstract During the past years, security of computer networks has become main stream in most of everyone's lives. Nowadays as the use

More information

Computational intelligence in intrusion detection systems

Computational intelligence in intrusion detection systems Computational intelligence in intrusion detection systems --- An introduction to an introduction Rick Chang @ TEIL Reference The use of computational intelligence in intrusion detection systems : A review

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

UNOBSERVABLE INTRUSION DETECTION BASED ON CALL TRACES IN PARAVIRTUALIZED SYSTEMS

UNOBSERVABLE INTRUSION DETECTION BASED ON CALL TRACES IN PARAVIRTUALIZED SYSTEMS UNOBSERVABLE INTRUSION DETECTION BASED ON CALL TRACES IN PARAVIRTUALIZED SYSTEMS Carlo Maiero, Marino Miculan Department of Mathematics and Computer Science, University of Udine, Italy carlo.maiero@uniud.it,

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

A survey on Data Mining based Intrusion Detection Systems

A survey on Data Mining based Intrusion Detection Systems International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion

More information

Role of Anomaly IDS in Network

Role of Anomaly IDS in Network Role of Anomaly IDS in Network SumathyMurugan 1, Dr.M.Sundara Rajan 2 1 Asst. Prof, Department of Computer Science, Thiruthangal Nadar College, Chennai -51. 2 Asst. Prof, Department of Computer Science,

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

Intrusion Detection for Mobile Ad Hoc Networks

Intrusion Detection for Mobile Ad Hoc Networks Intrusion Detection for Mobile Ad Hoc Networks Tom Chen SMU, Dept of Electrical Engineering tchen@engr.smu.edu http://www.engr.smu.edu/~tchen TC/Rockwell/5-20-04 SMU Engineering p. 1 Outline Security problems

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS SACHIN MALVIYA Student, Department of Information Technology, Medicaps Institute of Science & Technology, INDORE (M.P.)

More information

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK AND INFORMATION PROTECTION

APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK AND INFORMATION PROTECTION 18-19 September 2014, BULGARIA 137 Proceedings of the International Conference on Information Technologies (InfoTech-2014) 18-19 September 2014, Bulgaria APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK

More information

Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

More information

Speedy Signature Based Intrusion Detection System Using Finite State Machine and Hashing Techniques

Speedy Signature Based Intrusion Detection System Using Finite State Machine and Hashing Techniques www.ijcsi.org 387 Speedy Signature Based Intrusion Detection System Using Finite State Machine and Hashing Techniques Utkarsh Dixit 1, Shivali Gupta 2 and Om Pal 3 1 School of Computer Science, Centre

More information

Fuzzy Network Profiling for Intrusion Detection

Fuzzy Network Profiling for Intrusion Detection Fuzzy Network Profiling for Intrusion Detection John E. Dickerson (jedicker@iastate.edu) and Julie A. Dickerson (julied@iastate.edu) Electrical and Computer Engineering Department Iowa State University

More information

Chapter 23. Database Security. Security Issues. Database Security

Chapter 23. Database Security. Security Issues. Database Security Chapter 23 Database Security Security Issues Legal and ethical issues Policy issues System-related issues The need to identify multiple security levels 2 Database Security A DBMS typically includes a database

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Neural Networks and Back Propagation Algorithm

Neural Networks and Back Propagation Algorithm Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important

More information

Clustering as an add-on for firewalls

Clustering as an add-on for firewalls Clustering as an add-on for firewalls C. Caruso & D. Malerba Dipartimento di Informatica, University of Bari, Italy. Abstract The necessary spread of the access points to network services makes them vulnerable

More information

Intrusion Detection. Tianen Liu. May 22, 2003. paper will look at different kinds of intrusion detection systems, different ways of

Intrusion Detection. Tianen Liu. May 22, 2003. paper will look at different kinds of intrusion detection systems, different ways of Intrusion Detection Tianen Liu May 22, 2003 I. Abstract Computers are vulnerable to many threats. Hackers and unauthorized users can compromise systems. Viruses, worms, and other kinds of harmful code

More information

Performance Evaluation of Intrusion Detection Systems

Performance Evaluation of Intrusion Detection Systems Performance Evaluation of Intrusion Detection Systems Waleed Farag & Sanwar Ali Department of Computer Science at Indiana University of Pennsylvania ABIT 2006 Outline Introduction: Intrusion Detection

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK 1 K.RANJITH SINGH 1 Dept. of Computer Science, Periyar University, TamilNadu, India 2 T.HEMA 2 Dept. of Computer Science, Periyar University,

More information