Algorithms. Algorithms 5.3 SUBSTRING SEARCH. introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp ROBERT SEDGEWICK KEVIN WAYNE

Similar documents
String Search. Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching

Robust Quick String Matching Algorithm for Network Security

Fast string matching

A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)

4.2 Sorting and Searching

Knuth-Morris-Pratt Algorithm

C H A P T E R Regular Expressions regular expression

A FAST STRING MATCHING ALGORITHM

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

Searching BWT compressed text with the Boyer-Moore algorithm and binary search

Contributing Efforts of Various String Matching Methodologies in Real World Applications

Automata and Computability. Solutions to Exercises

Compiler Construction

Improved Approach for Exact Pattern Matching

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

03 - Lexical Analysis

2ND QUARTER 2006, VOLUME 8, NO. 2

Protecting Websites from Dissociative Identity SQL Injection Attacka Patch for Human Folly

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

The Halting Problem is Undecidable

CAs and Turing Machines. The Basis for Universal Computation

Sources: On the Web: Slides will be available on:

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Regular Expressions and Automata using Haskell

Reconfigurable FPGA Inter-Connect For Optimized High Speed DNA Sequencing

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

Finite Automata. Reading: Chapter 2

Theory of Computation Chapter 2: Turing Machines

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

6.080/6.089 GITCS Feb 12, Lecture 3

Composability of Infinite-State Activity Automata*

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

An efficient matching algorithm for encoded DNA sequences and binary strings

Basics of Java Programming Input and the Scanner class

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Unit 6. Loop statements

CS106A, Stanford Handout #38. Strings and Chars

arxiv: v2 [cs.ds] 15 Oct 2008

A Fast String Searching Algorithm

public static void main(string[] args) { System.out.println("hello, world"); } }

Turing Machines, Part I

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

FRACTAL RECOGNITION AND PATTERN CLASSIFIER BASED SPAM FILTERING IN SERVICE

A The Exact Online String Matching Problem: a Review of the Most Recent Results

Notes on Complexity Theory Last updated: August, Lecture 1

BALANCING FOR DISTRIBUTED BACKUP

In-Place File Carving

New Techniques for Regular Expression Searching

Configurable String Matching Hardware for Speeding up Intrusion Detection. Monther Aldwairi*, Thomas Conte, Paul Franzon

Analysis of Micromouse Maze Solving Algorithms

On line construction of suffix trees 1

Data storage Tree indexes

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

4.5 Symbol Table Applications

Finite Automata. Reading: Chapter 2

CPSC 121: Models of Computation Assignment #4, due Wednesday, July 22nd, 2009 at 14:00

Automata on Infinite Words and Trees

Genetic programming with regular expressions

3515ICT Theory of Computation Turing Machines

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Longest Common Extensions via Fingerprinting

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

Fast Arithmetic Coding (FastAC) Implementations

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

AP Computer Science Static Methods, Strings, User Input

ELFRING FONTS UPC BAR CODES

J a v a Quiz (Unit 3, Test 0 Practice)

Solving Linear Equations in One Variable. Worked Examples

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

Dynamic Programming. Lecture Overview Introduction

Hardware Pattern Matching for Network Traffic Analysis in Gigabit Environments

Computer Programming I

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Compilers Lexical Analysis

Turing Machines: An Introduction

COS 333: Advanced Programming Techniques

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

Programming Languages CIS 443

Building a Multi-Threaded Web Server

Introduction to Parallel Programming and MapReduce

Regular Languages and Finite State Machines

DRAFT Gigabit network intrusion detection systems

Informatique Fondamentale IMA S8

Symbol Tables. Introduction

High Performance String Matching Algorithm for a Network Intrusion Prevention System (NIPS)

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

International Journal of Advanced Research in Computer Science and Software Engineering

The Model Checker SPIN

Longest Common Extensions via Fingerprinting

How to Format a Bibliography or References List in the American University Thesis and Dissertation Template

NP-Completeness and Cook s Theorem

1 Definition of a Turing machine

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

The Classes P and NP

Transcription:

lgorithms ROBERT SEDGEWIK KEVIN WYNE 5.3 SUBSTRING SERH lgorithms F O U R T H E D I T I O N ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp http://algs4.cs.princeton.edu

5.3 SUBSTRING SERH lgorithms ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp http://algs4.cs.princeton.edu

Substring search Goal. Find pattern of length M in a text of length N. typically N >> M pattern text N E E D L E I N H Y S T K N E E D L E I N match 3

Substring search applications Goal. Find pattern of length M in a text of length N. typically N >> M pattern text N E E D L E I N H Y S T K N E E D L E I N match 4

Substring search applications Goal. Find pattern of length M in a text of length N. typically N >> M pattern N E E D L E text I N H Y S T K N E E D L E I N match Substring search omputer forensics. Search memory or disk for signatures, e.g., all URLs or RS keys that the user has entered. http://citp.princeton.edu/memory 5

Substring search applications Goal. Find pattern of length M in a text of length N. typically N >> M pattern text N E E D L E I N H Y S T K N E E D L E I N match Identify patterns indicative of spam. PROFITS L0SE WE1GHT herbal Viagra There is no catch. This is a one-time mailing. This message is sent in compliance with spam regulations. 6

Substring search applications Screen scraping. Extract relevant data from web page. Ex. Find string delimited by <b> and </b> after first occurrence of pattern Last Trade:. http://finance.yahoo.com/q?s=goog... <tr> <td class= "yfnc_tablehead1" width= "48%"> Last Trade: </td> <td class= "yfnc_tabledata1"> <big><b>452.92</b></big> </td></tr> <td class= "yfnc_tablehead1" width= "48%"> Trade Time: </td> <td class= "yfnc_tabledata1">... 8

Screen scraping: Java implementation Java library. The indexof() method in Java's string library returns the index of the first occurrence of a given string, starting at a given offset. public class StockQuote { public static void main(string[] args) { String name = "http://finance.yahoo.com/q?s="; In in = new In(name + args[0]); String text = in.readll(); int start = text.indexof("last Trade:", 0); int from = text.indexof("<b>", start); int to = text.indexof("</b>", from); String price = text.substring(from + 3, to); StdOut.println(price); } } % java StockQuote goog 582.93 % java StockQuote msft 24.84 9

5.3 SUBSTRING SERH lgorithms ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp http://algs4.cs.princeton.edu

Brute-force substring search heck for pattern starting at each text position. i j i+j 0 1 2 3 4 5 6 7 8 9 10 txt B D B R 0 2 2 B R pat 1 0 1 B R entries in red are mismatches 2 1 3 B R 3 0 3 B R entries in gray are for reference only 4 1 5 B R entries in black 5 0 5 match the text B R 6 4 10 B R return i when j is M match 11

Brute-force substring search: Java implementation heck for pattern starting at each text position. i j i+j 0 1 2 3 4 5 6 7 8 9 10 B D B R 4 3 7 D R 5 0 5 D R public static int search(string pat, String txt) { int M = pat.length(); int N = txt.length(); for (int i = 0; i <= N - M; i++) { int j; for (j = 0; j < M; j++) if (txt.chart(i+j)!= pat.chart(j)) } break; if (j == M) return i; } return N; not found index in text where pattern starts 12

Brute-force substring search: worst case Brute-force algorithm can be slow if text and pattern are repetitive. i j i+j 0 1 2 3 4 5 6 7 8 9 txt B 0 4 4 B pat 1 4 5 B 2 4 6 B 3 4 7 B 4 4 8 B 5 5 10 B Brute-force substring search (worst case) match Worst case. ~ M N char compares. 13

Backup In many applications, we want to avoid backup in text stream. Treat input as stream of data. bstract model: standard input. TTK T DWN substring search machine found Brute-force algorithm needs backup for every mismatch. matched chars mismatch B B backup B B shift pattern right one position pproach 1. Maintain buffer of last M characters. pproach 2. Stay tuned. 14

Brute-force substring search: alternate implementation Same sequence of char compares as previous implementation. i points to end of sequence of already-matched chars in text. j stores # of already-matched chars (end of sequence in pattern). i j 0 1 2 3 4 5 6 7 8 9 10 B D B R 7 3 D R 5 0 D R public static int search(string pat, String txt) { int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i < N && j < M; i++) { if (txt.chart(i) == pat.chart(j)) j++; else { i -= j; j = 0; } } if (j == M) return i - M; else return N; } explicit backup 15

lgorithmic challenges in substring search Brute-force is not always good enough. Theoretical challenge. Linear-time guarantee. Practical challenge. void backup in text stream. fundamental algorithmic problem often no room or time to save text Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. 16

5.3 SUBSTRING SERH lgorithms ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp http://algs4.cs.princeton.edu

Knuth-Morris-Pratt substring search Intuition. Suppose we are searching in text for pattern B. Suppose we match 5 chars in pattern, with mismatch on 6th char. We know previous 6 chars in text are BB. Don't need to back up text pointer! assuming {, B } alphabet text after mismatch on sixth char brute-force backs up to try this B B B and this and this i B and this B and this pattern B B B but no backup is needed B Knuth-Morris-Pratt algorithm. lever method to always avoid backup. (!) 18

Deterministic finite state automaton (DF) DF is abstract string-searching machine. Finite number of states (including start and halt). Exactly one transition for each char in alphabet. ccept if sequence of transitions leads to halt state. internal representation j pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 1 3 1 5 1 0 2 0 4 0 4 0 0 0 0 0 6 If in state j reading char c: if j is 6 halt and accept else move to state dfa[c][j] graphical representation B, 0 1 B 2 3 B 4 5 6 B, B, B 19

Knuth-Morris-Pratt demo: DF simulation B B B pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 1 3 1 5 1 0 2 0 4 0 4 0 0 0 0 0 6 B, B 0 1 B 2 3 B 4 5 6 B, B, 20

Knuth-Morris-Pratt demo: DF simulation B B B pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 1 3 1 5 1 0 2 0 4 0 4 0 0 0 0 0 6 B, B 0 1 B 2 3 B 4 5 6 B, substring found B, 21

Interpretation of Knuth-Morris-Pratt DF Q. What is interpretation of DF state after reading in txt[i]?. State = number of characters in pattern that have been matched. Ex. DF is in state 3 after reading in txt[0..6]. length of longest prefix of pat[] that is a suffix of txt[0..i] i txt 0 1 2 3 4 5 6 7 8 B B B pat 0 1 2 3 4 5 B B suffix of txt[0..6] prefix of pat[] B, B 0 1 B 2 3 B 4 5 6 B, B, 22

Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. Need to precompute dfa[][] from pattern. Text pointer i never decrements. public int search(string txt) { } int i, j, N = txt.length(); for (i = 0, j = 0; i < N && j < M; i++) j = dfa[txt.chart(i)][j]; if (j == M) return i - M; else return N; no backup Running time. Simulate DF on text: at most N character accesses. Build DF: how to do efficiently? [warning: tricky algorithm ahead] 23

Knuth-Morris-Pratt substring search: Java implementation Key differences from brute-force implementation. Need to precompute dfa[][] from pattern. Text pointer i never decrements. ould use input stream. public int search(in in) { } int i, j; for (i = 0, j = 0;!in.isEmpty() && j < M; i++) j = dfa[in.readhar()][j]; if (j == M) return i - M; else return NOT_FOUND; no backup B, B 0 1 B 2 3 B 4 5 6 B, B, 24

Knuth-Morris-Pratt demo: DF construction Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B onstructing the DF for KMP substring search for B B 0 1 2 3 4 5 6 25

Knuth-Morris-Pratt demo: DF construction pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 1 3 1 5 1 0 2 0 4 0 4 0 0 0 0 0 6 onstructing the DF for KMP substring search for B B B, B 0 1 B 2 3 B 4 5 6 B, B, 26

How to build DF from pattern? Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 0 1 2 3 4 5 6 27

How to build DF from pattern? Match transition. If in state j and next char c == pat.chart(j), go to j+1. first j characters of pattern have already been matched next char matches now first j +1 characters of pattern have been matched pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 3 5 2 4 6 0 1 B 2 3 B 4 5 6 28

How to build DF from pattern? Mismatch transition. If in state j and next char c!= pat.chart(j), then the last j-1 characters of input are pat[1..j-1], followed by c. To compute dfa[c][j]: Simulate pat[1..j-1] on DF and take transition c. Running time. Seems to require j steps. still under construction (!) Ex. dfa[''][5] = 1; dfa['b'][5] = 4 simulate BB; take transition '' = dfa[''][3] simulate BB; take transition 'B' = dfa['b'][3] j pat.chart(j) 0 1 2 3 4 5 B B B, 0 1 B 2 3 B 4 5 6 B, simulation of BB B j B, 29

How to build DF from pattern? Mismatch transition. If in state j and next char c!= pat.chart(j), then the last j-1 characters of input are pat[1..j-1], followed by c. state X To compute dfa[c][j]: Simulate pat[1..j-1] on DF and take transition c. Running time. Takes only constant time if we maintain state X. Ex. dfa[''][5] = 1; dfa['b'][5] = 4 X' = 0 from state X, take transition '' = dfa[''][x] from state X, take transition 'B' = dfa['b'][x] from state X, take transition '' = dfa[''][x] 0 1 2 3 4 5 B B B, X B j 0 1 B 2 3 B 4 5 6 B, B, 30

Knuth-Morris-Pratt demo: DF construction in linear time Include one state for each character in pattern (plus accept state). pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B onstructing the DF for KMP substring search for B B 0 1 2 3 4 5 6 31

Knuth-Morris-Pratt demo: DF construction in linear time pat.chart(j) dfa[][j] B 0 1 2 3 4 5 B B 1 1 3 1 5 1 0 2 0 4 0 4 0 0 0 0 0 6 onstructing the DF for KMP substring search for B B B, B 0 1 B 2 3 B 4 5 6 B, B, 32

onstructing the DF for KMP substring search: Java implementation For each state j: opy dfa[][x] to dfa[][j] for mismatch case. Set dfa[pat.chart(j)][j] to j+1 for match case. Update X. public KMP(String pat) { this.pat = pat; M = pat.length(); dfa = new int[r][m]; dfa[pat.chart(0)][0] = 1; for (int X = 0, j = 1; j < M; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; dfa[pat.chart(j)][j] = j+1; X = dfa[pat.chart(j)][x]; } } copy mismatch cases set match case update restart state Running time. M character accesses (but space/time proportional to R M). 33

KMP substring search analysis Proposition. KMP substring search accesses no more than M + N chars to search for a pattern of length M in a text of length N. Pf. Each pattern char accessed once when constructing the DF; each text char accessed once (in the worst case) when simulating the DF. Proposition. KMP constructs dfa[][] in time and space proportional to R M. Larger alphabets. Improved version of KMP constructs nfa[] in time and space proportional to M. 0 1 B 2 3 B 4 5 6 KMP NF for BB 34

Knuth-Morris-Pratt: brief history Independently discovered by two theoreticians and a hacker. Knuth: inspired by esoteric theorem, discovered linear algorithm Pratt: made running time independent of alphabet size Morris: built a text editor for the D 6400 computer Theory meets practice. SIM J. OMPUT. Vol. 6, No. 2, June 1977 FST PTTERN MTHING IN STRINGS* DONLD E. KNUTHf, JMES H. MORRIS, JR.:l: ND VUGHN R. PRTT bstract. n algorithm is presented which finds all occurrences of one. given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language {can}*, can be recognized in linear time. Other algorithms which run even faster on the average are also considered. Don Knuth Jim Morris Vaughan Pratt 35

5.3 SUBSTRING SERH lgorithms ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp http://algs4.cs.princeton.edu Robert Boyer J. Strother Moore

Boyer-Moore: mismatched character heuristic Intuition. Scan characters in pattern from right to left. an skip as many as M text chars when finding one not in the pattern. i text j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 F I N D I N H Y S T K N E E D L E I N 0 5 N E E D L E pattern 5 5 N E E D L E 11 4 N E E D L E 15 0 N E E D L E return i = 15 Mismatched character heuristic for right-to-left (Boyer-Moore) substring search 37

Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 1. Mismatch character not in pattern. before i txt pat...... T L E...... N E E D L E i after txt pat...... T L E...... N E E D L E mismatch character 'T' not in pattern: increment i one character beyond 'T' 38

Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2a. Mismatch character in pattern. before i txt pat...... N L E...... N E E D L E i after txt pat...... N L E...... N E E D L E mismatch character 'N' in pattern: align text 'N' with rightmost pattern 'N' 39

Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2b. Mismatch character in pattern (but heuristic no help). before i txt pat...... E L E...... N E E D L E i aligned with rightmost E? txt pat...... E L E...... N E E D L E mismatch character 'E' in pattern: align text 'E' with rightmost pattern 'E'? 40

Boyer-Moore: mismatched character heuristic Q. How much to skip? ase 2b. Mismatch character in pattern (but heuristic no help). before i txt pat...... E L E...... N E E D L E i after txt pat...... E L E...... N E E D L E mismatch character 'E' in pattern: increment i by 1 41

Boyer-Moore: mismatched character heuristic Q. How much to skip?. Precompute index of rightmost occurrence of character c in pattern. (-1 if character not in pattern) right = new int[r]; for (int c = 0; c < R; c++) right[c] = -1; for (int j = 0; j < M; j++) right[pat.chart(j)] = j; c N E E D L E 0 1 2 3 4 5 right[c] -1-1 -1-1 -1-1 -1-1 B -1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -1-1 D -1-1 -1-1 3 3 3 3 E -1-1 1 2 2 2 5 5... -1 L -1-1 -1-1 -1 4 4 4 M -1-1 -1-1 -1-1 -1-1 N -1 0 0 0 0 0 0 0... -1 Boyer-Moore skip table computation 42

Boyer-Moore: Java implementation } public int search(string txt) { int N = txt.length(); int M = pat.length(); int skip; for (int i = 0; i <= N-M; i += skip) { } skip = 0; for (int j = M-1; j >= 0; j--) { } if (pat.chart(j)!= txt.chart(i+j)) { } skip = Math.max(1, j - right[txt.chart(i+j)]); break; if (skip == 0) return i; return N; compute skip value in case other term is nonpositive match 43

Boyer-Moore: analysis Property. Substring search with the Boyer-Moore mismatched character heuristic takes about ~ N / M character compares to search for a pattern of length M in a text of length N. sublinear! Worst-case. an be as bad as ~ M N. i skip 0 1 2 3 4 5 6 7 8 9 txt B B B B B B B B B B 0 0 B B B B pat 1 1 B B B B 2 1 B B B B 3 1 B B B B 4 1 B B B B 5 1 B B B B Boyer-Moore-Horspool substring search (worst case) Boyer-Moore variant. an improve worst case to ~ 3 N character compares by adding a KMP-like rule to guard against repetitive patterns. 44

Substring search cost summary ost of searching for an M-character pattern in an N-character text. algorithm version operation count guarantee typical backup in input? correct? extra space brute force M N 1.1 N yes yes 1 Knuth-Morris-Pratt full DF (lgorithm 5.6 ) mismatch transitions only 2 N 1.1 N no yes MR 3 N 1.1 N no yes M full algorithm 3 N N / M yes yes R Boyer-Moore mismatched char heuristic only (lgorithm 5.7 ) M N N / M yes yes R Rabin-Karp Monte arlo (lgorithm 5.8 ) 7 N 7 N no yes 1 Las Vegas 7 N 7 N no yes yes 1 probabilisitic guarantee, with uniform hash function 55