A PartitionBased Efficient Algorithm for Large Scale. MultipleStrings Matching


 Aleesha Gibbs
 1 years ago
 Views:
Transcription
1 A PartitionBased Efficient Algorithm for Large Scale MultipleStrings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, Abstract Filtering procedure plays an important role in the Internet security and information retrieval fields, and usually employs multiplestrings matching algorithm as its key part. All the classical matching algorithms, however, perform poorly when the number of the keywords exceeds 5000, which made large scale multiplestrings matching problem a great challenge. Based on the observation that the speed of the classical algorithms depends on the minimal length of the keywords, a partition strategy was proposed to decompose the keywords set into a series of subsets on which the classical algorithms was performed. In the optimal partition, it was proved that the keywords with same length would be separated into one subset, and length of keywords in different subsets would not interlace each other. In this paper, we proposed a shortestpath model for the optimal partition problem. Experiments on both random dataset and ClamAV dataset demonstrated our algorithms works much better than the classical ones. Key Words large scale multiplestrings matching, partition, shortest path 1 Introduction With the development of Internet, more and more information, including bad along with good, appeared and congested the network. To secure Internet and retrieve useful information, filtering systems were designed and deployed on the gateways to filter out bad things. A filtering system usually employs a string matching procedure as its key part, and always contains a large scale keywords set to suit to various focuses. Hence, it is really a great challenge to design an efficient multistrings matching algorithm for a large scale keywords set. This paper is supported by the NSFC grant No..
2 1.1 Related Work String matching problem has been received extensive research, most of which follow a common procedure, i.e., compare keywords with substring of text within a fixed length window, and then shift the window from left to right as far as possible. According to the way that patterns are compared with the text in the window, string matching algorithms can be categorized into three classes: prefix searching, suffix searching and factor searching. In prefix searching methods, windowshifting is accomplished through computing the longest common prefix between the text and the patterns. There are two ways to compute the length of the longest prefix. The first way is to compute the longest suffix of the text read that is also a prefix of the string, typified by the famous KMP KnuthMorrisPratt algorithm [2] and the AhoCorasick algorithm[3]. The second maintain a set of prefix of keywords that are also suffix of text, and update the set at each character read. This is what the ShiftAnd and the ShiftOr algorithm do[4]. The suffix searching approach consists in searching backward for a suffix of the window that is also a suffix of the keyword, which features the backward searching from right to left within the window. The BoyerMoore algorithm[5] is of this kind, and so are the CommentzWalter algorithm [6]and the WuManber algorithm[7]. Factor searching, the most efficient algorithms in practice for long keywords, can be treated as the integration of the prefix searching and the suffix searching, which search backward in a window like suffix searching, but search for the longest suffix of the window that is also a factor of the keyword. The most famous algorithms of this kind are BDM (Backward DAWG Matching) algorithm[8], BOM (Backward Oracle Matching) algorithm[9] SBDM (Set Backward DAWG Matching) algorithm[10], SBOM (Set Backward Oracle Matching) algorithm[11]. The performance of the classical multistrings matching algorithms are determined mainly by the following three factors: the number of the keywords, the minimal length of the keywords and the size of the alphabet. In addition, the distribution of keywords in the text would also affect the performance. All of the classical algorithms are inapplicable for the case where there are a large scale of keywords set. Experiment on real data demonstrated that these algorithms perform bad when the number of patterns exceeds 5,000 or the minimal length of the patterns is 4 bytes. In this paper, we proposed a partitionbased algorithm suitable for filtering with the largescale keywords set. 1.2 Our Contribution Our contributions within this paper is as follows:
3 1). We analyzed the average timecomplexity of three representative algorithms, i.e., SBOM, WuManber and advanced Aho Corasick, and surveyed the relationship between their speed and the the minimal length of the patterns. 2). We proposed a partitionbased strategy to bound the influence of the shortest keywords on the performance. For the speed of the classical algorithms are tightly relative to the minimal length of the keywords, partitioning the keywords set into smaller subset would bound the influence of the shortest keywords, and thus improve the performance. 3). We proposed a shortestpath model of the optimal partition. Two properties of the optimal partition were proved, i.e., the keywords with same length would locate in one subset of the optimal partition, and the length of the keywords of different subset would not interlace each other. Based on the above theorems, a shortestpath model was proposed for the optimal partition. We implemented our partitionbased algorithm into a C program and did experiment on real data, which demonstrated its efficiency on largescale keywords set compared with the classical ones. The rest of this paper are organized as follow: Section 2 describes the average timecomplexity and property of three classical algorithms; Section 3 proposes the partitionbased strategy and prove two theorems for the optimal partitions; Section 4 reports the implementation and the experimental results on real data; Section 5 mentions the further work. 2 Properties about Performance of Classical Algorithms In this section, we analyze the averagecase time complexity of three representative multistrings matching algorithms: SBOM, WuManber, advanced Aho Corasick. Let use Σ to denote the alphabet, n to denote the length of the text, r the number of the pattern, and b to denote the block size in WuManber algorithm. Let us denote m(s) or m the minimal length of a keywords set S, and M(S) or M the maximal one. Assuming an uniform distribution of text and keywords over the alphabet Σ, and that n is large enough, it is easy to estimate the averagecase time complexity of Advanced Aho Corasick algorithm is O(n), WuManber algorithm is O ( ) n, and SBOM (m b+1) (1 (m b+1) r 2 Σ b ) algorithm is O ( n log Σ mr ). m log Σ mr The above analysis implies the following properties about the speed of the classical algorithms, which are confirmed by experimental result on real data: Property 1 The main factors affecting the speed of multistrings matching algorithms are the size of alphabet, the number of the patterns and the minimal length of the patterns. Hence,
4 the matching time can be denoted as T (r, m) if the size of alphabet is fixed. Property 2 The matching time of a multistrings matching algorithm increases monotonously when the number of the patterns increase, i.e., T r(r, m) > 0. (See Fig. 1) Property 3 The matching time of a multistrings matching algorithms decreases monotonously when minimal length of the patterns increase, i.e., T m(r, m) < 0. (See Fig. 2) Property 4 The increase rate of the matching time with the number of patterns is independent of the number of the patterns, and increase when the minimal length decrease, i.e., F x(x, y) = H(y) > 0andH (y) < 0. 3 A PartitionBased Matching Algorithm The shortest keywords, though very small in quantity, have a great affect on the matching time. To bound their influence, an intuitional idea is to decompose the keywords set into a series of smaller subsets, and then choose an appropriate classical matching algorithms to run on each subset. Since the influence of the shortest keywords is bounded in the smaller subset rather than the entire set, the sum of matching time on individual subsets is even smaller than the time costed to run on an entire set directly. For a given keywords set P = {p 1, p 2, p 3,, p n }, there are many kinds of feasible partitions, among which the optimal one will bring the minimal matching time. Here, we assumed that the keywords was already sorted according to its length, i.e., p 1 p 2 p n. Then the optimal partition finding problem can be defined as follows: Optimal Partition Finding Problem Given a sorted keywords set P = {p 1, p 2,, p n }, k to construct a partition S 1, S 2,, S k, so that S i = P, 1 i, j k, S i Sj =, and k T (m(s i ), S i ) is minimized. i=1 3.1 Properties about Optimal Partition In the following, two properties about the optimal partition are proved to provide solid foundation for finding it. Theorem 1 There exist an optimal partition, S 1, S 2,, S k, of the sorted keywords set P = {p 1, p 2,, p n }, and for i j either a S i, b S j, a b ; or a S i, b S j, a b ; This theorem demonstrates the continuum of the optimal partition, that is, the intervals formed by the subset, [m(s i ), M(S i )], would not intercross each other. For the sake of simplicity, we use m i and m j to denote the minimal length of S i and S j, n i and n j the number of keywords in them, respectively. i=1
5 Proof: Suppose in an optimal partition S 1, S 2,, S k, there exist two subsets S i and S j with intercrossing interval, i.e.,[m(s i ), M(S i )] intercross with [m(s j ), M(S j )]. We give proof for the case m i m j, and the case m i m j is similar thus omitted. Two new subset S i and S j were constructed by exchanging the longest keyword of S i with the shortest one in S j. Thus, we have: m i = m i, m j m j ; n i = n i, n j = n j ; Hence, (T (m i, n i ) + T (m j, n j )) (T (m i, n i ) + T (m j, n j )) = (T (m i, n i ) T (m i, n i )) + (T (m j, n j ) T (m j, n j )) 0 (Property 3 in section 2, and m(s j ) > m(s j )) Thus, the new partition is better than the original one, which contradicts with the assumption. Hence, the theorem holds. Theorem 2 In the optimal partition of the sorted keywords set P = {p 1, p 2, p n }, keywords with the same length would not disperse into different subsets. Theorem 1 assures that the length of the keywords in different subsets would not interlace, hence the keywords with same length would not disperse into more than two subsets. So, it suffices to complete the proof for the case of two subsets. Proof. In an optimal partition S 1, S 2,, S k, suppose the C keywords with same length l were split into two subsets, i.e., C i keywords in S i and C j in S j (C i > 0, C j > 0, C i + C j = C). For S i and S j, two new subsets, S i and S j, could be constructed through removing the C i keywords from S i to S j. For the sake of simplicity, we use m i and m j to denote the minimal length of S i and S j, n i and n j the number of keywords in them, respectively. We give proof for the case m(s i ) m(s j ), and the case m(s i ) m(s j ) is similar thus omitted. In this case, we have m i = m i, m j = m j ; n i = n i C i, n j = n j + C i ; Hence, (T (m i, n i ) + T (m j, n j )) (T (m i, n i ) + T (m j, n j )) = (T (m i, n i ) T (m i, n i C i )) + (T (m j, n j ) T (m j, n j + C i )) = C i n T (m i, n i δ 1 C i ) C i n T (m j, n j + δ 2 C i ) (0 δ 1 1, 0 δ 2 1) 0 (Propety 4 in section 2, and m1 < m2) Thus, the new partition is better than the original one, which conflict with the assumption. Therefore, in an optimal partition, the keywords with same length would be in one subset. The above two properties imply that the keywords with same length work as a block, that is, they would not separate in an optimal partition. Moreover, a subset S i of an optimal partition contains all the blocks with length in the interval [m(s i ), M(S i )].
6 3.2 Algorithm to Find the Optimal Partition In this section, we model the optimal partition problem into finding the shortestpath problem in a weighted graph. Given a sorted keywords set P = p 1, p 2,, p n, we create a partition graph G as follows. For each a block with length i in P, a node N i is created to represent it, and an auxiliary node N M(P )+1 is created to represent the end of P. Let V = {N m(p ), N m(p )+1,, N M(P ), N M(P )+1 }. The edges of G is specified as follows. For N i and N j V, there is an edge from N i to N j, denoted as (N i, N j ) if i < j. In deed, an edge (N i, N j ) is used to represent a subset containing blocks with length greater than or equal with N i, but less than N j. For each edge (N i, N j ), a weight W (N i, N j ) was assigned to measure the benefit of setting the corresponding blocks as a subset. The matching time of a representative text on the subset was used as an estimation of W (N i, N j ). Therefore, the optimal partition correspond to the shortest path in the partition graph G. An example is shown in Fig The short path in the graph is 2 > 6 > 8, hence the optimal partition has two subset, one containing keywords with length 2,3,4,5, and the other having keywords with length 6,7. The algorithm to find the optimal partition is given as follows: Algorithm to Find the Optimal Partition Input: A sorted keywords P, a representative text T ; Output: The optimal partition of P ; 1. Construct the partition graph G =< V, E >, here, V = {N m(p ), N m(p )+1,, N M(P ), N M(P )+1 }; E = {(N i, N j ) N i, N j V, i < j}.. W (N i, N j ) is set as above; 2. Finding the shortest path (e i1, e i2,, e ik ) from N m(p ) to N M(P )+1. Here, e ij is an edge in G; 3. For each e ij, output a subset containing the corresponding blocks; 4 Experimental Result 4.1 Results on Random Data Set In the test on the random data, the size of the alphabet is 32. A program is made to build the pattern randomly. We build two group patterns which number are 5000, and length are from 4 bytes to 40 bytes. The random patterns are symmetrical in the number of the length. The search text is also build randomly. It s size about 200M.
7 It can be seen from above that the speed of the WuManber algorithm is much more quick than the other two algorithms among the three classical algorithms. Yet when the number of the pattern number is increase rapidly, WuManber s speed decrease more quickly than the other two algorithms. In both of the two group experiments, the speed of the COM algorithm is the most best. In the COM algorithm the positions of the partition and the basic algorithms are different in the test patterns. By the all, the speed of the COM algorithm could increase averagely 13 times than the classical algorithms. The larger the number of the patterns is, the larger the increase is. 4.2 Results on Real Data Set The best test is to use the real patterns from real systems. In this test we use two groups data: one group is extracted from Snort, the other is extracted from the signatures of ClamAV. Snort is a open source IDS. It s last version can download from We use the version and extracted 2086 patterns which lengths are larger than 1 byte. ClamAntiVirus is a open source AntiVirus system. It s virus database is updated everyday and the last version can download from We use the version 0.83 and extracted patterns which no wildcards in them. The training text is from MIT, a group data of the real network, which are used to evaluate the capability of IDS. The data set can download from We use the file mit 1999 training week1 Friday inside.dat. We cut off the file from 64M to 16M for quickly training. The matched text is mit 1999 training week1 Friday inside.dat, about 64M. In the follow table the left part are the distributing of pattern lengths and the right parts
8 are the test results. It can be seen from above that in COM the optical partition on Snort patterns is only one group, use Wumanber algorithm. This mean that COM is not must better than the classical algorithms on some special pattern sets. On the other side this also mean that the speed of COM must not slower than the speed of the classical algorithms, and the speed at least equal to the most quick one of the classical algorithms, generally more quickly. The superiority of COM is distinct on the ClamAV patterns. When the length range of patterns is large and the length distributing is very asymmetrical, use partition strategy can increase the speed of matching. The increase of the matching speed is more obviously with the increase of the pattern number.
9 5 Conclusions Conclusion is here. Acknowledgment: The author expresses him deep appreciation to his advisors for help on the subject of this paper. References [1] Gonzalo Navarro and Mathieu Raffinot, Flexible Pattern Matching in Strings Practical online search algorithms for texts and biological sequences, Camedge University Press,2002,ISBN pp74 76 [2] D.E.Knuth,J.H.Morris,V.R.Pratt, Fast Pattern Matching in Strings,SIAM Journal on Computing,Page ,1977 [3] A.V. Aho and M.J.Corasick, Efficient string matching:an aid to bibliographic search, Communication of the ACM,18(6): ,1975 [4] S. Wu, U. Manber, Fast text searching allowing errors,communications of the ACM, 35(10): 83 91,1992 [5] R.S.Boyer, J.S.Moore, A fast string searching algorithm,communications of the ACM,20(10): ,1977 [6] B.CommentzWalter, A string matching algorithm fast on the average, In Proceeding s of the 6th International Colloquium on Automata, Language and Programming, number 71 in Lecture Notes in Computer Science,pages ,1979 [7] S.Wu, U.Manber, A fast algorithm for multipattern searching, Report TR 94 17, Department of Computer Science, University of Arizona,Tucson, AZ,1994 [8] M.Crochemore, A.Czumaj, L.Gasienniec, S.Jarominek, T.Lecroq, W.Plandowski, W.Rytter, Speeding up two string matching algorithms, Algorithmica,12(4/5): ,1994 [9] C.Allauzen, M.Crochemore, M.Raffinot, Efficient experimental string matching by weak factor recognition, In proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, number 2089 in Lecture Notes in Computer Science,pages SpringerVerlag,2001 [10] A.Blumer, J.Blumer, A.Ehrenfeucht, D.Haussler, R.McConnel. Complete inverted files for efficient text retrieval and analysis, Jonual of the ACM,34(3): ,1987 [11] C.Allauzen, M.Raffinot Factor oracle of a set of words,technical report 99 11, Institute Gaspard Monge, University de Marnelavallee, 1999 [12] Xiaodong Wang The Design and Analysis of Computer Algorithms Publishing House of Electronic Industry, Beijing ISBN P
Fast string matching
Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: ShiftAnd/ShiftOr 1. Flexible Pattern Matching in Strings,
More informationOn line construction of suffix trees 1
(To appear in ALGORITHMICA) On line construction of suffix trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN 00014 University of Helsinki,
More informationA SURVEY OF SOFTWAREBASED STRING MATCHING ALGORITHMS FOR FORENSIC ANALYSIS
A SURVEY OF SOFTWAREBASED STRING MATCHING ALGORITHMS FOR FORENSIC ANALYSIS YiChing Liao Norwegian Information Security Laboratory Gjøvik University College, Norway yiching.liao@hig.no ABSTRACT Employing
More informationContributing Efforts of Various String Matching Methodologies in Real World Applications
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume4, IssueI EISSN: 23472693 Contributing Efforts of Various String Matching Methodologies in Real World Applications
More informationA Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,
More informationCMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
More information8.1 Min Degree Spanning Tree
CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree
More informationA NonLinear Schema Theorem for Genetic Algorithms
A NonLinear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 5042806755 Abstract We generalize Holland
More information2.3 Scheduling jobs on identical parallel machines
2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed
More informationApproximability of TwoMachine NoWait Flowshop Scheduling with Availability Constraints
Approximability of TwoMachine NoWait Flowshop Scheduling with Availability Constraints T.C. Edwin Cheng 1, and Zhaohui Liu 1,2 1 Department of Management, The Hong Kong Polytechnic University Kowloon,
More informationGreedy Algorithm And Matroid Intersection Algorithm. Summer Term Talk Summary Paul Wilhelm
Greedy Algorithm And Matroid Intersection Algorithm Summer Term 2010 Talk Summary Paul Wilhelm December 20, 2011 1 Greedy Algorithm Abstract Many combinatorial optimization problems can be formulated in
More informationRobust Quick String Matching Algorithm for Network Security
18 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 Robust Quick String Matching Algorithm for Network Security Jianming Yu, 1,2 and Yibo Xue, 2,3 1 Department
More informationarxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
More informationMinimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring
Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring Qiang Zheng, Student Member, IEEE, and Guohong Cao, Fellow, IEEE Department of Computer Science and Engineering
More informationApproximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and HingFung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
More informationTwo General Methods to Reduce Delay and Change of Enumeration Algorithms
ISSN 13465597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII2003004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration
More information1 Approximating Set Cover
CS 05: Algorithms (Grad) Feb 224, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the setcovering problem consists of a finite set X and a family F of subset of X, such that every elemennt
More informationProtecting Websites from Dissociative Identity SQL Injection Attacka Patch for Human Folly
International Journal of Computer Sciences and Engineering Open Access ReviewPaper Volume4, Special Issue2, April 2016 EISSN: 23472693 Protecting Websites from Dissociative Identity SQL Injection Attacka
More informationA Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393401, 2008 ISSN 15493636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola AlKhalid, Dima Suleiman, Mariam
More informationApproximate Search Engine Optimization for Directory Service
Approximate Search Engine Optimization for Directory Service KaiHsiang Yang and ChiChien Pan and TzaoLin Lee Department of Computer Science and Information Engineering, National Taiwan University, Taipei,
More informationImproved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredrisson and Gonzalo Navarro 2 Department of Computer Science, University of Joensuu fredri@cs.joensuu.fi 2 Department of Computer Science,
More informationRi and. i=1. S i N. and. R R i
The subset R of R n is a closed rectangle if there are n nonempty closed intervals {[a 1, b 1 ], [a 2, b 2 ],..., [a n, b n ]} so that R = [a 1, b 1 ] [a 2, b 2 ] [a n, b n ]. The subset R of R n is an
More informationFairness in Routing and Load Balancing
Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationImproved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile
More informationA Sublinear Bipartiteness Tester for Bounded Degree Graphs
A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublineartime algorithm for testing whether a bounded degree graph is bipartite
More informationLecture 6: Approximation via LP Rounding
Lecture 6: Approximation via LP Rounding Let G = (V, E) be an (undirected) graph. A subset C V is called a vertex cover for G if for every edge (v i, v j ) E we have v i C or v j C (or both). In other
More informationCompetitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationLearning Threshold Functions with Small Weights using Membership Queries
Learning Threshold Functions with Small Weights using Membership Queries Elias Abboud Research Center Ibillin Elias College P.O. Box 102 Ibillin 30012 Nader Agha Research Center Ibillin Elias College P.O.
More informationThe Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
More informationBicolored Shortest Paths in Graphs with Applications to Network Overlay Design
Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design Hongsik Choi and HyeongAh Choi Department of Electrical Engineering and Computer Science George Washington University Washington,
More informationNPCompleteness and Cook s Theorem
NPCompleteness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More informationThe StudentProject Allocation Problem
The StudentProject Allocation Problem David J. Abraham, Robert W. Irving, and David F. Manlove Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK Email: {dabraham,rwi,davidm}@dcs.gla.ac.uk.
More informationPolynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis
More informationPolarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
More informationCS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010
CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison GomoryHu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationCacti with minimum, secondminimum, and thirdminimum Kirchhoff indices
MATHEMATICAL COMMUNICATIONS 47 Math. Commun., Vol. 15, No. 2, pp. 4758 (2010) Cacti with minimum, secondminimum, and thirdminimum Kirchhoff indices Hongzhuan Wang 1, Hongbo Hua 1, and Dongdong Wang
More informationDuplicating and its Applications in Batch Scheduling
Duplicating and its Applications in Batch Scheduling Yuzhong Zhang 1 Chunsong Bai 1 Shouyang Wang 2 1 College of Operations Research and Management Sciences Qufu Normal University, Shandong 276826, China
More informationA Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms
A Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms Thierry Garcia and David Semé LaRIA Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf 80000 Amiens, France, Email:
More information1 Introductory Comments. 2 Bayesian Probability
Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham s website at www.paulgraham.com/ffb.html, and the second was a paper
More informationBig Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network
, pp.273284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More information2ND QUARTER 2006, VOLUME 8, NO. 2
ND QUARTER 6, VOLUME, NO. www.comsoc.org/pubs/surveys PROFILING AND ACCELERATING STRING MATCHING ALGORITHMS IN THREE NETWORK CONTENT SECURITY APPLICATIONS POCHING LIN, ZHIXIANG LI, AND YINGDAR LIN,
More information6.2 Permutations continued
6.2 Permutations continued Theorem A permutation on a finite set A is either a cycle or can be expressed as a product (composition of disjoint cycles. Proof is by (strong induction on the number, r, of
More informationarxiv:0810.2390v2 [cs.ds] 15 Oct 2008
Efficient Pattern Matching on Binary Strings Simone Faro 1 and Thierry Lecroq 2 arxiv:0810.2390v2 [cs.ds] 15 Oct 2008 1 Dipartimento di Matematica e Informatica, Università di Catania, Italy 2 University
More informationWeek 5: Binary Relations
1 Binary Relations Week 5: Binary Relations The concept of relation is common in daily life and seems intuitively clear. For instance, let X be the set of all living human females and Y the set of all
More informationThe LoadDistance Balancing Problem
The LoadDistance Balancing Problem Edward Bortnikov Samir Khuller Yishay Mansour Joseph (Seffi) Naor Yahoo! Research, Matam Park, Haifa 31905 (Israel) Department of Computer Science, University of Maryland,
More informationConcrete Security of the BlumBlumShub Pseudorandom Generator
Appears in Cryptography and Coding: 10th IMA International Conference, Lecture Notes in Computer Science 3796 (2005) 355 375. SpringerVerlag. Concrete Security of the BlumBlumShub Pseudorandom Generator
More informationPlanar Tree Transformation: Results and Counterexample
Planar Tree Transformation: Results and Counterexample Selim G Akl, Kamrul Islam, and Henk Meijer School of Computing, Queen s University Kingston, Ontario, Canada K7L 3N6 Abstract We consider the problem
More informationDefinition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
More informationAn Introduction to Information Theory
An Introduction to Information Theory Carlton Downey November 12, 2013 INTRODUCTION Today s recitation will be an introduction to Information Theory Information theory studies the quantification of Information
More informationEvery tree contains a large induced subgraph with all degrees odd
Every tree contains a large induced subgraph with all degrees odd A.J. Radcliffe Carnegie Mellon University, Pittsburgh, PA A.D. Scott Department of Pure Mathematics and Mathematical Statistics University
More informationLecture Notes 12: Scheduling  Cont.
Online Algorithms 18.1.2012 Professor: Yossi Azar Lecture Notes 12: Scheduling  Cont. Scribe:Inna Kalp 1 Introduction In this Lecture we discuss 2 scheduling models. We review the scheduling over time
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationNotes: Chapter 2 Section 2.2: Proof by Induction
Notes: Chapter 2 Section 2.2: Proof by Induction Basic Induction. To prove: n, a W, n a, S n. (1) Prove the base case  S a. (2) Let k a and prove that S k S k+1 Example 1. n N, n i = n(n+1) 2. Example
More informationSingle machine parallel batch scheduling with unbounded capacity
Workshop on Combinatorics and Graph Theory 21th, April, 2006 Nankai University Single machine parallel batch scheduling with unbounded capacity Yuan Jinjiang Department of mathematics, Zhengzhou University
More informationLongest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille, Inge Li Gørtz, and Jesper Kristensen Technical University of Denmark, DTU Informatics, Copenhagen, Denmark Abstract. The longest common extension
More informationTopic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06
CS880: Approximations Algorithms Scribe: Matt Elder Lecturer: Shuchi Chawla Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06 3.1 Set Cover The Set Cover problem is: Given a set of
More informationSHORT CYCLE COVERS OF GRAPHS WITH MINIMUM DEGREE THREE
SHOT YLE OVES OF PHS WITH MINIMUM DEEE THEE TOMÁŠ KISE, DNIEL KÁL, END LIDIKÝ, PVEL NEJEDLÝ OET ŠÁML, ND bstract. The Shortest ycle over onjecture of lon and Tarsi asserts that the edges of every bridgeless
More informationLoad Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load
More informationTriangle deletion. Ernie Croot. February 3, 2010
Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,
More informationAnalysis of Server Provisioning for Distributed Interactive Applications
Analysis of Server Provisioning for Distributed Interactive Applications Hanying Zheng and Xueyan Tang Abstract Increasing geographical spreads of modern distributed interactive applications DIAs make
More informationPrivate Approximation of Clustering and Vertex Cover
Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, BenGurion University of the Negev Abstract. Private approximation of search
More informationReading 13 : Finite State Automata and Regular Expressions
CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model
More informationSingleLink Failure Detection in AllOptical Networks Using Monitoring Cycles and Paths
SingleLink Failure Detection in AllOptical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,
More informationWeb Mining and Searching
Lecture s Outline Web and Searching Approaches to Efficient Automated Knowledge Discovery from Semistructured Web sources Brief Introduction Web Content CASE STUDY: Extracting Patterns & Relations Books
More informationRegular Expressions with Nested Levels of Back Referencing Form a Hierarchy
Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Kim S. Larsen Odense University Abstract For many years, regular expressions with back referencing have been used in a variety
More informationTheory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras
Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding
More informationBreaking Generalized DiffieHellman Modulo a Composite is no Easier than Factoring
Breaking Generalized DiffieHellman Modulo a Composite is no Easier than Factoring Eli Biham Dan Boneh Omer Reingold Abstract The DiffieHellman keyexchange protocol may naturally be extended to k > 2
More informationThe Orthogonal Art Gallery Theorem with Constrained Guards
The Orthogonal Art Gallery Theorem with Constrained Guards T. S. Michael 1 Mathematics Department United States Naval Academy Annapolis, MD, U.S.A. Val Pinciu 2 Department of Mathematics Southern Connecticut
More informationUNIVERSIDADE DE SÃO PAULO
UNIVERSIDADE DE SÃO PAULO Instituto de Ciências Matemáticas e de Computação ISSN 01032569 Comments on On minimizing the lengths of checking sequences Adenilso da Silva Simão N ō 307 RELATÓRIOS TÉCNICOS
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationCompletion Time Scheduling and the WSRPT Algorithm
Completion Time Scheduling and the WSRPT Algorithm Bo Xiong, Christine Chung Department of Computer Science, Connecticut College, New London, CT {bxiong,cchung}@conncoll.edu Abstract. We consider the online
More informationClassification  Examples
Lecture 2 Scheduling 1 Classification  Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking
More information1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal 506009, AP, INDIA tsrinivasku@gmail.com
A New Allgoriitthm for Miiniimum Costt Liinkiing M. Sreenivas Alluri Institute of Management Sciences Hanamkonda 506001, AP, INDIA allurimaster@gmail.com Dr. T. Srinivas Department of Mathematics Kakatiya
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationLecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
More informationInternational Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE COFACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II MohammediaCasablanca,
More informationOn the independence number of graphs with maximum degree 3
On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs
More information17.6.1 Introduction to Auction Design
CS787: Advanced Algorithms Topic: Sponsored Search Auction Design Presenter(s): Nilay, Srikrishna, Taedong 17.6.1 Introduction to Auction Design The Internet, which started of as a research project in
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of
More informationNotes on Complexity Theory Last updated: August, 2011. Lecture 1
Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationGood luck, veel succes!
Final exam Advanced Linear Programming, May 7, 13.0016.00 Switch off your mobile phone, PDA and any other mobile device and put it far away. No books or other reading materials are allowed. This exam
More informationMinimizing the Number of Machines in a UnitTime Scheduling Problem
Minimizing the Number of Machines in a UnitTime Scheduling Problem Svetlana A. Kravchenko 1 United Institute of Informatics Problems, Surganova St. 6, 220012 Minsk, Belarus kravch@newman.basnet.by Frank
More information2.1 Complexity Classes
15859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
More informationOHJ2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012
276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the
More informationBreaking An IdentityBased Encryption Scheme based on DHIES
Breaking An IdentityBased Encryption Scheme based on DHIES Martin R. Albrecht 1 Kenneth G. Paterson 2 1 SALSA Project  INRIA, UPMC, Univ Paris 06 2 Information Security Group, Royal Holloway, University
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationCycles and cliqueminors in expanders
Cycles and cliqueminors in expanders Benny Sudakov UCLA and Princeton University Expanders Definition: The vertex boundary of a subset X of a graph G: X = { all vertices in G\X with at least one neighbor
More informationRadix Sort. The time complexity is O(n + σ). Counting sort is a stable sorting algorithm, i.e., the relative order of equal elements stays the same.
Radix Sort The Ω(n log n) sorting lower bound does not apply to algorithms that use stronger operations than comparisons. A basic example is counting sort for sorting integers. Algorithm 3.10: CountingSort(R)
More informationHandout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs
MCS236: Graph Theory Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010 Chapter 7: Digraphs Strong Digraphs Definitions. A digraph is an ordered pair (V, E), where V is the set
More informationAn approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student startups
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student startups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,
More informationAnalysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadthfirst search (BFS) 4 Applications
More information14.1 Rentorbuy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationInstructor: Bobby Kleinberg Lecture Notes, 5 May The MillerRabin Randomized Primality Test
Introduction to Algorithms (CS 482) Cornell University Instructor: Bobby Kleinberg Lecture Notes, 5 May 2010 The MillerRabin Randomized Primality Test 1 Introduction Primality testing is an important
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More information