An efficient matching algorithm for encoded DNA sequences and binary strings
|
|
|
- Justin Robbins
- 10 years ago
- Views:
Transcription
1 An efficient matching algorithm for encoded DNA sequences and binary strings Simone Faro and Thierry Lecroq Dipartimento di Matematica e Informatica, Università di Catania, Italy University of Rouen, LITIS EA 4108, Mont-Saint-Aignan Cedex, France Combinatorial Pattern Matching June 2009 Lille, France
2 Outline 1 Introduction 2 A new algorithm 3 Experimental Results Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 2 / 38
3 Outline 1 Introduction 2 A new algorithm 3 Experimental Results Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 3 / 38
4 Problem Searching for all exact occurrences of a pattern p ( p = m) in a text t ( t = n) where both p and t are bitstreams Example p = and t = Requirement Avoid the access to individual bits access to blocks of k bits Special cases Each character of p and t consists of a single bit binary sequences a couple of bits encoded DNA sequences Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 4 / 38
5 Problem Searching for all exact occurrences of a pattern p ( p = m) in a text t ( t = n) where both p and t are bitstreams Example p = and t = Requirement Avoid the access to individual bits access to blocks of k bits Special cases Each character of p and t consists of a single bit binary sequences a couple of bits encoded DNA sequences Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 4 / 38
6 Existing solutions S. T. Klein and M. K. Ben-Nissan Accelerating Boyer Moore searches on binary texts CIAA, LNCS 4783, pp , 2007 J. W. Kim, E. Kim, and K. Park Fast matching method for DNA sequences Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, LNCS 4614, pp , 2007 S. Faro and T. Lecroq Efficient pattern matching on binary strings SOFSEM, poster, 2009 Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 5 / 38
7 Outline 1 Introduction 2 A new algorithm 3 Experimental Results Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 6 / 38
8 Preprocessing The algorithm computes 1 a table of k copies of p, in order to process text and pattern block by block (as in [Klein & Ben-Nissan 2007]) 2 bit-mask vectors to implement a multi-pattern version of the BNDM algorithm 3 an index-list table to identify candidate alignments during the searching phase 4 a shift table based on the bad-character heuristic to increase the length of the shifts Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 7 / 38
9 Byte We suppose that the block size k is fixed All references to both text and pattern will only be to entire blocks of k bits We refer to a k-bit block as a byte though larger values than k = 8 could be supported T [i] and P [i] denote, respectively, the (i + 1)-th byte of the text and of the pattern The last byte may be only partially defined. We suppose that the undefined bits of the last byte are set to 0. Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 8 / 38
10 k copies of p We define k copies, denoted by Patt[i] of the pattern p shifted by i position to the right, for 0 i < k i P = {0, 1,..., k 1} In each pattern P att[i], the i leftmost bits of the first byte remain undefined and are set to 0 Similarly the rightmost ((k ((m + i) mod k) last byte are set to 0 mod k) bits of the Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM 09 9 / 38
11 Example p = of length 27 Patt i Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
12 Additional information to the k copies b i : the index of the first byte in P att[i] containing a k-substring of p e i : the index of the last byte of the pattern P att[i]. m i : the number of bytes in P att[i] containing k-substrings of p F 1[i] : bit mask for the first byte of Patt[i] F 2[i] : bit mask for the last byte of Patt[i] Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
13 Example p = of length 27 Patt i b i e i m i F F Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
14 Example p = of length 27 Patt i Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
15 Example p = of length 27 Patt i Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
16 Bit-parallelism The algorithm uses bit-parallelism to simulate the behavior of a NFA constructed over the set of patterns P att[i] However, in order to let the automaton fit in a single machine word of size ω, only the substrings P att[i][b i.. b i + m 1] are handled by the automaton m = min({m i } {ω}) P=set of remaining k patterns of length m Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
17 Bit-parallelism m + 1 different states: Q = {0, 1, 2, 3,..., m} m different transitions: state q, with 0 < q m, has a transition towards state q 1 labeled with the class of characters {P att[i][s i + q]} m is the initial state Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
18 p = of length 27 Patt i =L =A =H =F =K =C =G =I =J =E =F =B =C =H =I =D ω = 32 M =A =B =C =D =E =F =G =H =I =J =K =L c {A, B, C, D, E, F, G, H, I, J, K, L} Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
19 Index list The NFA recognizes also words that are not substrings of the pattern However, in order to make a filter the algorithm maintains, for each block B {0,..., 2 k 1}, a linked list λ which is used to find candidate patterns In particular, for each block B {0,..., 2 k 1}: λ[b] = {i Patt[i, b i + m 1] = B} When a block sequence is recognized by the automaton, ending at block position j of the text, the algorithm naively checks for the occurrence of any pattern Patt[g], with g λ[t [j]] Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
20 Example p = of length 27 Patt i =L =A =H =F =K =C =G =I =J =E =F =B =C =H =I =D λ =A {0} =B {5} =C {2} =D {7} =E {4} =F {1} =H {6} =I {3} c {A, B, C, D, E, F, H, I} Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
21 Shift table text patterns Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
22 Shift table text patterns Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
23 Shift table text patterns Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
24 Shift table The algorithm uses a long shift rule which is a multi-pattern version of the original bad-character shift heuristic, improved with an efficient look-ahead This shift rule is used when no substring is recognized while scanning the text from right to left In such a case the current window of the text can be safely advanced by m positions to the right Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
25 Shift table Observe that, if j is the right position of the current window of the text, the block at position j + m is always involved in the next attempt, thus we can use it to compute the next window position ls : {0,..., 2 k 1} {m,..., 2 m 1} ls[b] = m 1 + min{distance of B from the right end of a pattern in P} the next right position of the window is j + ls[t [j + m 1]] Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
26 p = of length 27 Patt i =L =A =H =F =K =C =G =I =J =E =F =B =C =H =I =D ls =A =B =C =D =E =F =G =H =I =J =K =L c {A, B, C, D, E, F, G, H, I, J, K, L} Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
27 The searching phase 2 parts: a match phase a shift phase Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
28 The shift phase The NFA is represented by a state vector D of size m (the first state of the automaton is not represented) Like in the SBndm algorithm [Holub & Durian 2005], each iteration starts with a test of 2 consecutive text characters and implements a fast-loop to obtain better results on average Such a fast loop makes use of the long shift table to compute the next window alignment Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
29 The match phase Right to left scan in a window of size m, ending at position j in the text, as in the BNDM algorithm The state vector is updated in a similar fashion as in the Shift-And algorithm [BYG92] If the state vector D is equal to 0 after l + 1 updates of D, then a word of length l has been recognized by the automaton If l = m a candidate alignment has been found and the algorithm naively checks the occurrence of any pattern contained in the index list λ[t [j]] In all cases, after the match phase, the index j is advanced of m l + 1 to the right and the algorithm restarts its computation with the shift phase Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
30 The match phase If a candidate alignment is found for a text position j, for each index i λ[t [j]], the algorithm uses the precomputed table P att[i] to check whether s = j m b i + 1 is a valid shift Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
31 Example p = A= B= C= D= E= F = G= H= I= J= K= L= Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
32 Example p = A= B= C= D= E= F = G= H= I= J= K= L= j Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
33 Example p = A= B= C= D= E= F = G= H= I= J= K= L= j D = Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
34 Example p = A= B= C= D= E= F = G= H= I= J= K= L= j j {A, B, C, D, E, F, G, H, I, J, K, L} ls = 3 Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
35 Example p = A= B= C= D= E= F = G= H= I= J= K= L= j D = Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
36 Example p = A= B= C= D= E= F = G= H= I= J= K= L= j j D = 0 ls[c] = 1 Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
37 Example p = A= B= C= D= E= F = G= H= I= J= K= L= D 0 j Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
38 Example p = A= B= C= D= E= F = G= H= I= J= K= L= F 1[2] j K C D 0 λ(c) = F 2[2] Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
39 Complexity time space Patt, b i, e i, m i, F 1, F 2 O(k m/k ) = O(m) O(m) Bit masks O(2 k + km) O(2 k ) Index list O(2 k + k) O(2 k ) Shift table O(2 k + m) O(2 k ) Searching phase O( n/k m/k k) = O(n m) Overall O(2 k + (n + k)m) O(2 k + m) Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
40 Handling encoded DNA sequences Each base is represented by a couple of bits Thus a DNA sequence γ can be represented with a bitstream of (2 γ ) bits Any occurrence of a given encoded pattern p starts at an even position of the text This suggests that only even alignments of the pattern have to be processed The only change to be applied, when handling encoded DNA sequences, is in the preprocessing of the set of patterns Specifically the set P is defined by P = {i 0 i < k and (i mod 2) = 0} For instance, if each block consists of k = 8 bits, we have P = {0, 2, 4, 6} Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
41 Outline 1 Introduction 2 A new algorithm 3 Experimental Results Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
42 Experimental Results BBM, Klein and Ben-Nissan, 2007 FED, Kim, Kim and Park, 2007 BHM, Faro and Lecroq, SOFSEM 2009 BSKS, Faro and Lecroq, SOFSEM 2009 BFL, Faro and Lecroq, CPM 2009 All algorithms have been implemented in the C programming language and were used to search for the same binary strings in large fixed text buffers on a PC with Intel Core2 processor of 1.66GHz. Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
43 Experimental results for a Rand(0/1) 50 problem BHM BSKS FED BFL 20 time pattern length Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
44 Experimental results for a Rand(0/1) 70 problem BHM BSKS FED BFL time pattern length Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
45 Experimental results for an encoded DNA sequence BHM BSKS FED BFL time pattern length Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
46 Conclusion and perspectives algorithm for exact matching on binary strings and encoded DNA sequences combines a multi-pattern version of the BNDM algorithm with a simplified shift strategy of CW algorithm efficient in practical cases adapts easily to the exact multiple string matching case Faro and Lecroq (Catania and Rouen) Matching encoded sequences CPM / 38
A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,
Improved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile
A FAST STRING MATCHING ALGORITHM
Ravendra Singh et al, Int. J. Comp. Tech. Appl., Vol 2 (6),877-883 A FAST STRING MATCHING ALGORITHM H N Verma, 2 Ravendra Singh Department of CSE, Sachdeva Institute of Technology, Mathura, India, [email protected]
A The Exact Online String Matching Problem: a Review of the Most Recent Results
A The Exact Online String Matching Problem: a Review of the Most Recent Results SIMONE FARO, Università di Catania THIERRY LECROQ, Université de Rouen This article addresses the online exact string matching
Fast string matching
Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: Shift-And/Shift-Or 1. Flexible Pattern Matching in Strings,
Improved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredrisson and Gonzalo Navarro 2 Department of Computer Science, University of Joensuu [email protected] 2 Department of Computer Science,
A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393-401, 2008 ISSN 1549-3636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola Al-Khalid, Dima Suleiman, Mariam
Robust Quick String Matching Algorithm for Network Security
18 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 Robust Quick String Matching Algorithm for Network Security Jianming Yu, 1,2 and Yibo Xue, 2,3 1 Department
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 683-690 Research India Publications http://www.ripublication.com/aeee.htm Implementation of Modified Booth
Notes on Complexity Theory Last updated: August, 2011. Lecture 1
Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look
A PPENDIX G S IMPLIFIED DES
A PPENDIX G S IMPLIFIED DES William Stallings opyright 2010 G.1 OVERVIEW...2! G.2 S-DES KEY GENERATION...3! G.3 S-DES ENRYPTION...4! Initial and Final Permutations...4! The Function f K...5! The Switch
Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
Arithmetic Coding: Introduction
Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation
Automata and Computability. Solutions to Exercises
Automata and Computability Solutions to Exercises Fall 25 Alexis Maciel Department of Computer Science Clarkson University Copyright c 25 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata
Efficient Prevention of Credit Card Leakage from Enterprise Networks
Efficient Prevention of Credit Card Leakage from Enterprise Networks Matthew Hall 1, Reinoud Koornstra 2, and Miranda Mowbray 3 1 No Institutional Affiliation, mhall @ mhcomputing.net 2 HP Networking,
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
A Dynamic Programming Approach for Generating N-ary Reflected Gray Code List
A Dynamic Programming Approach for Generating N-ary Reflected Gray Code List Mehmet Kurt 1, Can Atilgan 2, Murat Ersen Berberler 3 1 Izmir University, Department of Mathematics and Computer Science, Izmir
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images Borusyak A.V. Research Institute of Applied Mathematics and Cybernetics Lobachevsky Nizhni Novgorod
Improved Approach for Exact Pattern Matching
www.ijcsi.org 59 Improved Approach for Exact Pattern Matching (Bidirectional Exact Pattern Matching) Iftikhar Hussain 1, Samina Kausar 2, Liaqat Hussain 3 and Muhammad Asif Khan 4 1 Faculty of Administrative
Chapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
New Techniques for Regular Expression Searching
New Techniques for Regular Expression Searching Gonzalo Navarro Mathieu Raffinot Abstract We present two new techniques for regular expression searching and use them to derive faster practical algorithms.
String Search. Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan
String Search Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan String Searching Text with N characters Pattern with M characters match existence: any occurence of pattern in text? enumerate: how
Reconfigurable FPGA Inter-Connect For Optimized High Speed DNA Sequencing
Reconfigurable FPGA Inter-Connect For Optimized High Speed DNA Sequencing 1 A.Nandhini, 2 C.Ramalingam, 3 N.Maheswari, 4 N.Krishnakumar, 5 A.Surendar 1,2,3,4 UG Students, 5 Assistant Professor K.S.R College
MACHINE INSTRUCTIONS AND PROGRAMS
CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER OBJECTIVES In this chapter you will learn about: Machine instructions and program execution, including branching and subroutine call and return operations
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:
Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm
Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College
Section 1.4 Place Value Systems of Numeration in Other Bases
Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason
Learning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
Web Data Extraction: 1 o Semestre 2007/2008
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
Compression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: [email protected]
Operating Systems. Virtual Memory
Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page
CS101 Lecture 11: Number Systems and Binary Numbers. Aaron Stevens 14 February 2011
CS101 Lecture 11: Number Systems and Binary Numbers Aaron Stevens 14 February 2011 1 2 1 3!!! MATH WARNING!!! TODAY S LECTURE CONTAINS TRACE AMOUNTS OF ARITHMETIC AND ALGEBRA PLEASE BE ADVISED THAT CALCULTORS
Streaming Lossless Data Compression Algorithm (SLDC)
Standard ECMA-321 June 2001 Standardizing Information and Communication Systems Streaming Lossless Data Compression Algorithm (SLDC) Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http://www.ecma.ch
A Fast Pattern-Matching Algorithm for Network Intrusion Detection System
A Fast Pattern-Matching Algorithm for Network Intrusion Detection System Jung-Sik Sung 1, Seok-Min Kang 2, Taeck-Geun Kwon 2 1 ETRI, 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-700, Korea [email protected]
DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky
DNA Mapping/Alignment Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky Overview Summary Research Paper 1 Research Paper 2 Research Paper 3 Current Progress Software Designs to
Formal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
Finite Automata. Reading: Chapter 2
Finite Automata Reading: Chapter 2 1 Finite Automaton (FA) Informally, a state diagram that comprehensively captures all possible states and transitions that a machine can take while responding to a stream
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms Daniil Chivilikhin PhD student ITMO University Vladimir Ulyantsev PhD student ITMO University Anatoly Shalyto Dr.Sci.,
Randomized Hashing for Digital Signatures
NIST Special Publication 800-106 Randomized Hashing for Digital Signatures Quynh Dang Computer Security Division Information Technology Laboratory C O M P U T E R S E C U R I T Y February 2009 U.S. Department
respectively. Section IX concludes the paper.
1 Fast and Scalable Pattern Matching for Network Intrusion Detection Systems Sarang Dharmapurikar, and John Lockwood, Member IEEE Abstract High-speed packet content inspection and filtering devices rely
The Mixed Binary Euclid Algorithm
Electronic Notes in Discrete Mathematics 35 (009) 169 176 www.elsevier.com/locate/endm The Mixed Binary Euclid Algorithm Sidi Mohamed Sedjelmaci LIPN CNRS UMR 7030 Université Paris-Nord Av. J.-B. Clément,
Regular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
A survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
An Efficient DNA Molecule Clustering using GCC Algorithm
DOI 10.7603/s40601-014-0011-y GSTF Journal on Computing (JoC) Vol.4 No.2, March 2015 An Efficient DNA Molecule Clustering using GCC Algorithm Faisal Alsaby and Kholood Alnowaiser Received 31 Dec 2014 Accepted
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search
Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search Weixiong Rao [1,4] Lei Chen [2] Pan Hui [2,3] Sasu Tarkoma [4] [1] School of Software Engineering [2] Department of Comp. Sci.
BLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
Computer Science 281 Binary and Hexadecimal Review
Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two
Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016
Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful
Private Record Linkage with Bloom Filters
To appear in: Proceedings of Statistics Canada Symposium 2010 Social Statistics: The Interplay among Censuses, Surveys and Administrative Data Private Record Linkage with Bloom Filters Rainer Schnell,
A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.
A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER Manoj Kumar 1 Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA ABSTRACT The MPEG/Audio is a standard for both
The LCA Problem Revisited
The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus
How To Compare A Markov Algorithm To A Turing Machine
Markov Algorithm CHEN Yuanmi December 18, 2007 1 Abstract Markov Algorithm can be understood as a priority string rewriting system. In this short paper we give the definition of Markov algorithm and also
Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework
Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern
A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching
A Partition-Based Efficient Algorithm for Large Scale Multiple-Strings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout
: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout Ziqiang Feng Eric Lo Ben Kao Wenjian Xu Department of Computing, The Hong Kong Polytechnic University Department of Computer
RSA Question 2. Bob thinks that p and q are primes but p isn t. Then, Bob thinks Φ Bob :=(p-1)(q-1) = φ(n). Is this true?
RSA Question 2 Bob thinks that p and q are primes but p isn t. Then, Bob thinks Φ Bob :=(p-1)(q-1) = φ(n). Is this true? Bob chooses a random e (1 < e < Φ Bob ) such that gcd(e,φ Bob )=1. Then, d = e -1
FAST 11. Yongseok Oh <[email protected]> University of Seoul. Mobile Embedded System Laboratory
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)
Sheet 7 (Chapter 10)
King Saud University College of Computer and Information Sciences Department of Information Technology CAP240 First semester 1430/1431 Multiple-choice Questions Sheet 7 (Chapter 10) 1. Which error detection
6 3 4 9 = 6 10 + 3 10 + 4 10 + 9 10
Lesson The Binary Number System. Why Binary? The number system that you are familiar with, that you use every day, is the decimal number system, also commonly referred to as the base- system. When you
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Computability Theory
CSC 438F/2404F Notes (S. Cook and T. Pitassi) Fall, 2014 Computability Theory This section is partly inspired by the material in A Course in Mathematical Logic by Bell and Machover, Chap 6, sections 1-10.
Shift-based Pattern Matching for Compressed Web Traffic
Shift-based Pattern Matching for Compressed Web Traffic Anat Bremler-Barr Computer Science Dept. Interdisciplinary Center, Herzliya, Israel Email: [email protected] Yaron Koral Blavatnik School of Computer
14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.
CS54 Turing Machines Turing Machine q 0 AI N P U T IN TAPE read write move read write move Language = {0} q This Turing machine recognizes the language {0} Turing Machines versus DFAs TM can both write
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001. (Total Time = 120 minutes, Total Points = 100)
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001 (Total Time = 120 minutes, Total Points = 100) Name: (please print) Wolfe - Solution In recognition of and in the spirit of the
Theoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
MACM 101 Discrete Mathematics I
MACM 101 Discrete Mathematics I Exercises on Combinatorics, Probability, Languages and Integers. Due: Tuesday, November 2th (at the beginning of the class) Reminder: the work you submit must be your own.
LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b
LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =
Nino Pellegrino October the 20th, 2015
Learning Behavioral Fingerprints from NetFlows... using Timed Automata Nino Pellegrino October the 20th, 2015 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 1 / 32 Use case Nino
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Temporal Difference Learning in the Tetris Game
Temporal Difference Learning in the Tetris Game Hans Pirnay, Slava Arabagi February 6, 2009 1 Introduction Learning to play the game Tetris has been a common challenge on a few past machine learning competitions.
HY345 Operating Systems
HY345 Operating Systems Recitation 2 - Memory Management Solutions Panagiotis Papadopoulos [email protected] Problem 7 Consider the following C program: int X[N]; int step = M; //M is some predefined constant
CPSC 121: Models of Computation Assignment #4, due Wednesday, July 22nd, 2009 at 14:00
CPSC 2: Models of Computation ssignment #4, due Wednesday, July 22nd, 29 at 4: Submission Instructions Type or write your assignment on clean sheets of paper with question numbers prominently labeled.
Factoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY 11794 4400
Lecture 18: Applications of Dynamic Programming Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Problem of the Day
A Catalogue of the Steiner Triple Systems of Order 19
A Catalogue of the Steiner Triple Systems of Order 19 Petteri Kaski 1, Patric R. J. Östergård 2, Olli Pottonen 2, and Lasse Kiviluoto 3 1 Helsinki Institute for Information Technology HIIT University of
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Learning Tetris Using the Noisy Cross-Entropy Method
NOTE Communicated by Andrew Barto Learning Tetris Using the Noisy Cross-Entropy Method István Szita [email protected] András Lo rincz [email protected] Department of Information Systems, Eötvös
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm
Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm 1. LZ77:Sliding Window Lempel-Ziv Algorithm [gzip, pkzip] Encode a string by finding the longest match anywhere within a window of past symbols
