Robust Quick String Matching Algorithm for Network Security



Similar documents
A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)

Contributing Efforts of Various String Matching Methodologies in Real World Applications

Improved Approach for Exact Pattern Matching

A Fast Pattern-Matching Algorithm for Network Intrusion Detection System

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Configurable String Matching Hardware for Speeding up Intrusion Detection. Monther Aldwairi*, Thomas Conte, Paul Franzon

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

Traffic Analysis: From Stateful Firewall to Network Intrusion Detection System

A FAST STRING MATCHING ALGORITHM

Protecting Websites from Dissociative Identity SQL Injection Attacka Patch for Human Folly

Fast string matching

Presenter: Hamed Vahdat-Nejad

An efficient matching algorithm for encoded DNA sequences and binary strings

arxiv: v2 [cs.ds] 15 Oct 2008

High Performance String Matching Algorithm for a Network Intrusion Prevention System (NIPS)

Extensible Network Configuration and Communication Framework

Intrusion Detection System Based Network Using SNORT Signatures And WINPCAP

2ND QUARTER 2006, VOLUME 8, NO. 2

Approximate Search Engine Optimization for Directory Service

Improved Single and Multiple Approximate String Matching

Announcements. Lab 2 now on web site

Optimizing Pattern Matching for Intrusion Detection

Secure Way of Storing Data in Cloud Using Third Party Auditor

An Efficient Combination of Dispatch Rules for Job-shop Scheduling Problem

A Collaborative Network Security Management System in Metropolitan Area Network

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

Detecting Computer Worms in the Cloud

A Double-Filter Structure Based Scheme for Scalable Port Scan Detection

CSE331: Introduction to Networks and Security. Lecture 18 Fall 2006

Network Intrusion Simulation Using OPNET

String Search. Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan

UNITE: Uniform hardware-based Network Intrusion detection Engine

Searching BWT compressed text with the Boyer-Moore algorithm and binary search

HIDS and NIDS Hybrid Intrusion Detection System Model Design Zhenqi Wang 1, a, Dankai Zhang 1,b

In-Place File Carving

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

ALGORITHM FOR DISTRIBUTED AGENT BASED NETWORK INTRUSION DETECTION SYSTEM (NIDS)

New Hash Function Construction for Textual and Geometric Data Retrieval

DRAFT Gigabit network intrusion detection systems

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

An enhanced TCP mechanism Fast-TCP in IP networks with wireless links

Multi-level Metadata Management Scheme for Cloud Storage System

An overview to Software Architecture in Intrusion Detection System

The Dirty Secret Behind the UTM: What Security Vendors Don t Want You to Know

Virtual Desktops Security Test Report

Signature-aware Traffic Monitoring with IPFIX 1

Double guard: Detecting Interruptions in N- Tier Web Applications

We examine the amount of memory saved by our string matching memory optimizations and the improvement in

Implementation of the Remote Control and Management System. in the Windows O.S

The Effects of Filtering Malicious Traffic. under DoS Attacks

Energy Constrained Resource Scheduling for Cloud Environment

Efficient Prevention of Credit Card Leakage from Enterprise Networks

AC : WORK-IN-PROGRESS: CREATING AN INTRUSION DE- TECTION EXPERIMENTAL ENVIRONMENT USING CLOUD-BASED VIR- TUALIZATION TECHNOLOGY

Sandeep Mahapatra Department of Computer Science and Engineering PEC, University of Technology

An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud

A Visualization Technique for Monitoring of Network Flow Data

Knuth-Morris-Pratt Algorithm

Design of a NAND Flash Memory File System to Improve System Boot Time

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform

Mining and Detecting Connection-Chains in Network Traffic

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

SURVEY OF INTRUSION DETECTION SYSTEM

Prediction System for Reducing the Cloud Bandwidth and Cost

Compiling PCRE to FPGA for Accelerating SNORT IDS

Load Distribution in Large Scale Network Monitoring Infrastructures

Detecting Distributed Signature-based Intrusion: The Case of Multi-Path Routing Attacks

A Data De-duplication Access Framework for Solid State Drives

Design Issues in a Bare PC Web Server

A Review on Network Intrusion Detection System Using Open Source Snort

On Optimizing Load Balancing of Intrusion Detection and Prevention Systems

Shift-based Pattern Matching for Compressed Web Traffic

Improved Single and Multiple Approximate String Matching

Advancement in Virtualization Based Intrusion Detection System in Cloud Environment

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

A Study of Network Security Systems

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Review Study on Techniques for Network worm Signatures Automation

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur

Data Storage - I: Memory Hierarchies & Disks

Online Remote Data Backup for iscsi-based Storage Systems

HyLARD: A Hybrid Locality-Aware Request Distribution Policy in Cluster-based Web Servers

This is an author-deposited version published in : Eprints ID : 12518

Data Networks Summer 2007 Homework #3

Awase-E: Image-based Authentication for Mobile Phones using User s Favorite Images

Reliable Systolic Computing through Redundancy

PERFORMANCE TESTING CONCURRENT ACCESS ISSUE AND POSSIBLE SOLUTIONS A CLASSIC CASE OF PRODUCER-CONSUMER

Prevention of Anomalous SIP Messages

Blog Post Extraction Using Title Finding

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

String Matching on a multicore GPU using CUDA

The Probabilistic Model of Cloud Computing

Nixu SNS Security White Paper May 2007 Version 1.2

Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test

High-Speed TCP Performance Characterization under Various Operating Systems

Transcription:

18 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 Robust Quick String Matching Algorithm for Network Security Jianming Yu, 1,2 and Yibo Xue, 2,3 1 Department of Automation, Tsinghua University, Beijing, China 2 Research Institute of Information Technology, Tsinghua University, Beijing, China 3 Tsinghua National Lab for Information Science and Technology, Beijing, China Summary String matching is one of the key algorithms in network security, and many areas could be benefit from a faster string matching algorithm. Based on the most efficient string matching algorithm in usual applications, the Boyer-Moore () algorithm, a novel algorithm called is proposed. utilizes an improved bad character heuristic to achieve bigger shift value area and an enhanced good suffix heuristic to dramatically improve the worst case performance. The two heuristics combined with a novel determinant condition to switch between them enable achieve a higher performance than both under normal and worst case situation. The experimental results reveal that appears efficient than many times in worst case, and the longer the pattern, the bigger the performance improvement. The performance of is 7.57~36.34% higher than in English text searching, 16.26~26.18% higher than in uniformly random text searching, and 9.77% higher than in the real world Snort pattern set searching. Key words: String matching; network security; algorithmic performance attack Introduction String matching is one of the basic and important research subjects in computer science. String matching consists in finding one, or more generally, all the occurrences of a search string, also be called as pattern, in an input string. If more than one search strings are matched against the input string simultaneously, it is called multiple pattern matching. Otherwise, it is called single pattern matching. Single pattern matching algorithm will be referred only in this paper. Single pattern matching algorithm is widely used in network security environments. (In network security realm, pattern is a string indicating network intrusion, attack, virus, Spam or dirty network information et al). For example, in Snort, the famous open sources lightweight NIDS [1, 2], the Boyer-Moore () [3] algorithm is called while the number of patterns needed to be matched is less than five. Single pattern matching algorithm is also the basis to construct exclusion-based pattern matching algorithm and hybrid pattern matching engine to handle enormous network security detection patterns. The exclusion-based string matching algorithm utilizes heuristics to identify the patterns that could not occur in the input string first, and then use single pattern matching algorithm to match the patterns could not be excluded. The ExB [4] and E 2 xb [5] are typical exclusion-based algorithms. The hybrid pattern matching engine triggers different algorithms, generally combines single pattern matching and multiple pattern matching algorithms, depending on different application environments such as the number of patterns and the size of input string. Considering the fact that no single algorithm performs best in all cases, a hybrid pattern matching engine appears to be the best approach in network security applications [6, 7]. Along with the development of network attack technologies, the network security equipment itself becomes the attack objective. So does the string matching algorithm. An effective attack method to string matching algorithm is algorithmic performance attack : an attacker intentionally provides inputs that will overload or cause the worst case performance of an algorithm [6]. It could slow down the search and cause dropped packets, upon which the intruder could begin his attack. For example, the processing time of Snort can be raised by up to 25 times under such attack [8]. Improving the average-case and worst-case performance of string matching algorithm simultaneously would effectively defend from the algorithmic performance attack. The algorithm is considered as the most efficient string-matching algorithm in usual applications, and is widely regarded as providing the best average-case performance of any known algorithm [6, 13]. In this work we concentrate on improving the average and worst case performance of at the same time. A novel algorithm named as Robust Quick String Matching () is proposed. Manuscript received July 5, 26. Manuscript revised July 25, 26.

IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 181 The rest of the paper is organized as follows. Section 2 gives the description of the algorithm. Section 3 is a formal description of. In section 4, detailed evaluation and analysis of is presented. We summarize our contributions in section 5, where the open issues for future investigation is outlined as well.from this section, input the body of your manuscript according to the constitution that you had. For detailed information for authors, please refer to [1]. 2. Related Works: The Algorithm String matching deals with a string T=t 1 t 2 t n (the input string) of size n and searches it for all occurrences of another, shorter string =p 1 p 2 p m (the search string) of size m (n > m). Both strings build over a finite set of character called an alphabet denoted by Σ. String matching algorithms scan the T with the help of the sliding window mechanism. The size of the window is generally equal to m. To a check point j (1 j n), the window is positioned on t j t j+1 t j+m-1. At each check point, the characters of the window are compared with the characters of the search string. After a whole match or after a mismatch, the window is shifted along the T according to the heuristics of each algorithm. utilizes two heuristics, bad character and good suffix, to reduce the number of comparisons (compared to Brute Force string matching). The search is carried out by shifting the window from left to right. t 1 is the first check point. Within each window, the characters of the window and the are compared from right to left. Both heuristics are triggered on a mismatch. Suppose the matching check is taken between t i+1 t i+2 t i+m, i n-m and p 1 p 2 p m now, and p j+1 p j+2 p m =t i+j+1 t i+j+2 t i+m, 1 j m-1, the mismatching occurs at p j, p j t i+j. The bad character heuristic works this way: when a mismatch appears, the window is shifted to right so that the mismatching character in T, t i+j, is aligned with p k1, k1=max{k2 p k2 =t i+j, 1 k2<j}. If t i+j dose not appears in p 1 p 2 p j-1, the window is shifted to the position that p 1 is one position past t i+j. The good suffix heuristic, works another way: when a mismatching occurs, there is a non-empty suffix that matches. The window is shifted to the next occurrence of the suffix in p 1 p 2 p m-1. If the suffix does not appear, the window is shifted to the position that p k3 is aligned with t i+m, k3=max{k4 p 1 p 2 p k4 =t i+m-k4+1 t i+m }, 1 k4 m-j-1. If k 3 does not exist, the window is shifted to the position that p 1 is one position past t i+m. 3. : Robust Quick String Matching The algorithm utilizes an improved bad character heuristic and an enhanced good suffix heuristic to improve the average and worst case performance of simultaneously. The improved bad character heuristic of the algorithm is shown in Figure 1. The character next to the rightmost character of the current window is always used as the bad character. For example in Figure 1, at current check point marked by the broken line window, use character s of T as the bad character to calculate the shift value. Compared with the bad character heuristic of, this will provide a larger shift value area and easy to implement [14]. T x x k m f s h a 3 y s f m f 2 y s f m f. Fig 1. The improved bad character heuristic of algorithm The main drawback of is that after a shift it forgets the information about characters already matched. So in worst case situation, its behavior is same as Brute Force. We proposed a novel enhanced good suffix heuristic which could remember the characters already matched when needed. The enhanced good suffix heuristic is: while the information of the characters already matched is needed to remember at a specific check point, only good suffix heuristic is used at that check point. Compared to the policy of (at every check point always calculate good suffix and bad character shift value and select the bigger one), we can easily remember the characters have already matched by a variable and do not compare them again at next check point. As shown in Figure 2, if the shift value, shift, is calculated by good suffix heuristic at current check point, so p p 1 p m-shift-1 =p shift p shift+1 p m-1. We use variable offset=m-shift to remember the starting position at next check point, then at next check point only need to compare p offset p m-1. 1 bad character takes the far most shift caused by the two heuristics.

182 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 T a b a b x x x a b a b shift shift. j = j+gs[]; offset = m-shift; //remember the character already matched else do j = j+gs[i]; offset = ; a b a b Fig 3. The search procedure of offset 4. Experiments and Analysis Fig 2. The enhanced good suffix heuristic of To keep the simplicity of implementation and achieve bigger shift value, a simple determinant condition is proposed to determine if the enhanced good suffix heuristic need to be called to remember the information of the characters already matched. Under the worst case situation, the behavior of is same as the Brute Force. This means the shift value is always equal to one. So in the design of, if the shift value calculated by the improved bad character heuristic is equal to one at current check point, the enhanced good suffix heuristic is utilized. This has two benefits. First, we could get a bigger shift value because the shift value calculated by good suffix heuristic is no less than one. Second, we could decrease the characters need to check at next check point because the enhanced good suffix heuristic remembered the characters already matched. The search procedure of the proposed algorithm is shown in Figure 3. Two tables are used to store the shift value calculated by the two heuristics: Bc[] for bad character and Gs[] for good suffix. Where is the search string, m is its length. T is the input string, n is its length. If (n<m) do exit; j = ; // j represents the check point offset = ; while (j <= n - m) do begin i = m-1; if (Bc[T[j + m]] > 1) do // using bad character heuristic while([i] = T[j+i] and i>=) do begin i = i-1; j = j+bc [T[j + m]]; else // using good suffix heuristic while ([i] = T[i + j] && i >= offset) do begin i = i-1; if (i < offset) do // a match found We implemented the algorithm and compared the performance of it with the algorithm. The codes of in [13] are used, and it is a high quality implementation. 4.1 Experimental Environment In the experiments we used a C with Intel entium 4 2.4 GHz processor, with L1 cache of 8KB and L2 cache of 512KB, and 512MB of RAM. The host operating system is Microsoft Windows X professional. The compiler used is Microsoft Visual C++ 6.. In all the experiments the length of the input string T is 16KB. We performed experiments with three kinds of input string T. The first was made up of the same character A to test worst case performance. The second is a piece of English text obtained from a software manual. And the third consists of uniformly random characters from 256-character alphabet. 4.2 The Worst Case Test To the input string and search string both be made up of the same character, the shift value is always equal to one and the match number is maximum. It provides the worst case scenario to string matching algorithm. We found that there are such keywords used in practice, for example the blow detection rule of Snort. alert tcp any any -> any any (ack:; flags:sfu12; content: AAAAAAAAAAAAAAAA ; depth: 16;). It tries to find the keyword AAAAAAAAAAAAAAAA in all the tcp flow. T=AA A and =AA A was used to test the performance of and. The results are shown in Figure 4. The horizontal axis represents the pattern length, and the vertical axis represents processing time.

IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 183 25 4.4 Experiments with Uniformly Random Text 2 15 1 One text which consists of uniformly random characters from 256-character alphabet is generated. The patterns are randomly selected from the text. For every pattern length, we selected 25 different patterns, and the 25 patterns are compared with the text one by one. 5 attern Length The results are shown in Figure 6. The horizontal axis represents the pattern length, and the vertical axis represents processing time. The processing time is the average searching time of 25 patterns. 4 Fig 4. The performance with =AA A and T=AA A We could find that the performance of is stable and higher than for many times under this situation. Because the information of the characters already matched is remembered, the number of character comparison is dramatically reduced. In fact, because the shift value is always equal to one under this situation, the enhanced good suffix heuristic is called at every check point. So the number of characters compared is equal to the length of input string and regardless of the length of the keyword. 4.3 Experiments with an English Text 35 3 25 2 15 1 5 attern Length The text is obtained from a software manual. The patterns are randomly selected from the text. For every pattern length, we selected 25 different patterns, and the 25 patterns are compared with the text one by one. The results are shown in Figure 5. The horizontal axis represents the pattern length, and the vertical axis represents processing time. The processing time is the average searching time of 25 patterns. 6 5 Fig. 6. The performance with uniformly random text We can find that the performance of is 16.26~26.18% higher than in the uniformly random text searching. 4.5 Experiments with Snort attern Set The real world pattern set of Snort rules is extracted. The latest Snort rules distributed in Jun. 15, 26 are used. There are totally 241 different patterns. The pattern length arranges from 1 to 122. The text used is uniformly random text with 256-character alphabet. The 241 patterns are compared with the text one by one. 4 3 2 1 The results are shown in Figure 7. The horizontal axis represents different algorithm, and the vertical axis represents processing time. The processing time in Figure 6 is the total searching time of the 241 patterns. attern Length 52 51 5 Fig 5. The performance with an English text searching We can find that the performance of is 7.57~36.34% higher than in the English text searching. Time (ms) 49 48 47 46 45 Fig. 7. The performance with Snort pattern set

184 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 We can find that the performance of is 9.77% higher than in the real world Snort pattern set searching. 5. Conclusion and Future Work We have examined the problem of string matching for network security, especially the anti-algorithmic performance attack problem, and presented the design of an improved algorithm called. utilizes an improved bad character heuristic to achieve bigger shift value area and an enhanced good suffix heuristic to remember the information of characters already matched when needed. A novel determinant condition to determine if the enhanced good suffix heuristic need to be called is proposed. So the worst case performance of is dramatically improved compared with and the normal performance is also improved at the same time.. We have evaluated against using different texts and patterns. The experimental results reveal that appears efficient than many times in worst case, and the longer the pattern, the bigger the performance improvement. is also efficient than in normal situation. The performance of is 7.57~36.34% higher than in the English text searching, 16.26~26.18% higher than in the uniformly random text searching, and 9.77% higher than in the real world Snort pattern set searching. Future work would include designing exclusion-based algorithm and hybrid pattern matching engine for network security applications based on. The design idea of this paper, improving and balancing the normal and worst case performance of string matching algorithms for network security, will be applied to multiple pattern matching algorithms. Acknowledgments We would like to thank the anonymous reviewers for providing useful comments on this paper. We would thank our colleagues at the Network Security Laboratory, Research Institute of Information Technology, Tsinghua University for their comments and enlightening discussions on draft of the paper. And this work is supported by Intel IXA University rogram. References [1] M. Norton, and D. Roelker, The new Snort, Computer security journal, vol. 19, no. 3, pp. 37-47, 23. [2] M. Roesch, Snort: lightweight intrusion detection for networks, roc. 13 th System Administration Conference and Exhibition (LISA 1999), pp. 229-238, 1999. [3] R. Boyer, and J. Moore, A fast string searching algorithm, Communications of the ACM, vol. 2, no. 1, pp. 762-772, 1977. [4] E.. Markatos, S. Antonatos, M. olychronakis, and K. G. Anagnostakis, EXB: Exclusion-based signature matching for intrusion detection, roc. The CCN 2, 22. [5] K. G. Anagnostakis, E.. Markatos, S. Antonatos, and M. olychronakis, E 2 XB: A domain-specific string matching algorithm for intrusion detection, roc. 18 th IFI International Information Security Conference (SEC23), 23. [6] M. Fisk, and G. Varghese, Fast content-based packet handling for intrusion detection, UCSD Technical Report CS21-67, May 21. [7] S. Antonatos, K. G. Anagnostakis, E.. Markatos, and M. olychronakis, erformance analysis of content matching intrusion detection system, roc. 24 International Symposium on Applications and the Internet (SAINT 4), 24. [8] S. Antonatos, K. G. Anagnostakis, and E.. Markatos, Generating realistic workloads for network intrusion detection systems, Software engineering notes, vol. 29, no. 1, pp. 27-215, 24. [9] R. N. Horspool, ractical fast searching in strings, Software practice and experience, vol. 1, no. 6, pp. 51-56, 198. [1] R. M. Karp, and M. O. Rabin, Efficient randomized pattern-matching algorithms, I J. Res. Dev., vol. 31, no. 2, pp. 249-26, 1987. [11] D. Knuth, J. Morris, and V. ratt, Fast pattern matching in strings, SIAM journal on computing, vol. 6, no. 2, pp. 323-35, 1977. [12] M. Crochemore, M. C. Hancart, attern Matching in Algorithms and Theory of Computation Handbook. CRC ress Inc., Boca aton, FL, 1999. [13] C. Charras, and T. Lecroq, Exact string matching algorithms, http://www-igm.univ-mlv.fr/~lecroq/string/, 1997. [14] D. M. Sunday, A very fast substring search algorithm, Communications of the ACM, vol. 33, no. 8, pp. 132-142, 199. Jianming Yu received the B.S. degree in Electrical ower System and Its Automation from Zhengzhou University of Technology in 1999 and the M.S. degree in Control Theory and Control Engineering from Harbin Institute of Technology in 23. From 23 to now, he stayed in Network Security Laboratory, Research Institute of Information Technology, Tsinghua University as a hd candidate to study network intrusion detection, string matching for network security, and network equipments design based on network processor.