Sparse Suffix Tree Construction in Small Space

Similar documents
Longest Common Extensions via Fingerprinting

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

A Spam Message Filtering Method: focus on run time

Persistent Data Structures and Planar Point Location

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

12.4 Problems. Excerpt from "Introduction to Geometry" 2014 AoPS Inc. Copyrighted Material CHAPTER 12. CIRCLES AND ANGLES

Report b Measurement report. Sylomer - field test

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

Scheduling of Jobs and Maintenance Activities on Parallel Machines

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

A technical guide to 2014 key stage 2 to key stage 4 value added measures

Module 8. Three-phase Induction Motor. Version 2 EE IIT, Kharagpur

CASE STUDY BRIDGE.


A note on profit maximization and monotonicity for inbound call centers

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Chapter 10 Velocity, Acceleration, and Calculus

Partial optimal labeling search for a NP-hard subclass of (max,+) problems

Longest Common Extensions via Fingerprinting

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

On Reference RIAA Networks by Jim Hagerman

Lecture 1: Course overview, circuits, and formulas

Data Streams A Tutorial

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology

Review of Multiple Regression Richard Williams, University of Notre Dame, Last revised January 13, 2015

TRADING rules are widely used in financial market as

Simple Modular Half-Bridge

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Trusted Document Signing based on use of biometric (Face) keys

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Tap Into Smartphone Demand: Mobile-izing Enterprise Websites by Using Flexible, Open Source Platforms

CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK

Online story scheduling in web advertising

Data Structures and Algorithms Written Examination

Sample Questions Csci 1112 A. Bellaachia

Unit 11 Using Linear Regression to Describe Relationships

Digital Signatures. Murat Kantarcioglu. Based on Prof. Li s Slides. Digital Signatures: The Problem

Topological Properties

Question 2 Naïve Bayes (16 points)

Piracy in two-sided markets

Project Management Basics

Arithmetic Coding: Introduction

Name: SID: Instructions

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

The Butterfly, Cube-Connected-Cycles and Benes Networks

Please read the information that follows before beginning. Incomplete applications will delay the review and approval process.

INTERSECTION OF LINE-SEGMENTS

Teaching Rank-Based Tests by Emphasizing Structural Similarities to Corresponding Parametric Tests

Randomized algorithms

Binary Heap Algorithms

Schmid Peoplemover Overpass and Revolution. The Discovery of a New Way.

Availability of WDM Multi Ring Networks

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

Algorithm Design and Analysis Homework #1 Due: 5pm, Friday, October 4, 2013 TA === Homework submission instructions ===

User Behavior Analysis Using Alignment Based Grammatical Inference from Web Server Access Log

Free Enterprise, the Economy and Monetary Policy

Assessing the Discriminatory Power of Credit Scores

Find-The-Number. 1 Find-The-Number With Comps

Every tree contains a large induced subgraph with all degrees odd

Introduction to Algorithms. Part 3: P, NP Hard Problems

DATABASE DESIGN - 1DL400

Discuss the size of the instance for the minimum spanning tree problem.

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Solution of the Heat Equation for transient conduction by LaPlace Transform

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Quantum Computing Lecture 7. Quantum Factoring. Anuj Dawar

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

σ m using Equation 8.1 given that σ

INFORMATION Technology (IT) infrastructure management

Queueing Models for Multiclass Call Centers with Real-Time Anticipated Delays

Converting a Number from Decimal to Binary

MBA 570x Homework 1 Due 9/24/2014 Solution

DUE to the small size and low cost of a sensor node, a

Shortest Inspection-Path. Queries in Simple Polygons

Analysis of Binary Search algorithm and Selection Sort algorithm

Ohm s Law. Ohmic relationship V=IR. Electric Power. Non Ohmic devises. Schematic representation. Electric Power

MECH Statics & Dynamics

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Lecture 3: Finding integer solutions to systems of linear equations

8.1 Min Degree Spanning Tree

A Catalogue of the Steiner Triple Systems of Order 19

License & SW Asset Management at CES Design Services

CIS 700: algorithms for Big Data

Cluster-Aware Cache for Network Attached Storage *

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Turbulent Mixing and Chemical Reaction in Stirred Tanks

Analysis of Mesostructure Unit Cells Comprised of Octet-truss Structures

Transmission Scheduling for Multi-Channel Satellite and Wireless Networks

Performance of a Browser-Based JavaScript Bandwidth Test

INTERACTIVE TOOL FOR ANALYSIS OF TIME-DELAY SYSTEMS WITH DEAD-TIME COMPENSATORS

Bio-Plex Analysis Software

Transcription:

Spare Suffix Tree Contruction in Small Space Joint work with Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj (Technical Univerity of Denmark) Tvi Kopelowitz, (Weizmann Intitute of Science) Johanne Ficher (Karlruhe Intitute of Technology). to appear in ICALP 2013

The pare uffix tree (SST) T b a n a n a Suffix Tree a banana na na na na

The pare uffix tree (SST) T b a n a n a Suffix a banana na na na na

The pare uffix tree (SST) T b a n a n a a banana na na na na Can be built in O(n) time log 2 and O(n) extra pace

The pare uffix tree (SST) T b a n a n a a banana na na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a a banana na b n a a na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a a banana na b n a a na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a a banana na b n a a na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a a banana na b n a a na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a O(b) a banana na b n a a na na na Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix tree (SST) n T b a n a n a a n a n a O(b) a banana na b n a a na na na Beating the naive bound ha been open ince 1960 O(n log 2 b) time (Monte-Carlo) O((n + b 2 ) log 2 b) time with high probability (La-Vega) both in O(b) pace

n T b a n a n a The pare uffix array (SSA)

The pare uffix array (SSA) n 1 b a n a n a T b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

The pare uffix array (SSA) n 1 b a n a n a T b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

The pare uffix array (SSA) n 1 b a n a n a T b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

The pare uffix array (SSA) n 1 b a n a n a T b a n a n a 2 a n a n a Sort the uffixe lexicographically 3 n a n a 4 a n a 5 n a 6 a 7

The pare uffix array (SSA) n T b a n a n a Sort the uffixe lexicographically 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7

The pare uffix array (SSA) n T b a n a n a Sort the uffixe lexicographically n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7

The pare uffix array (SSA) n T b a n a n a Sort the uffixe lexicographically n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace

The pare uffix array (SSA) n T b a n a n a Sort the uffixe lexicographically n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 5 b 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Can be built in O(n) time log 2 and O(n) extra pace What if we only care about a few of the uffixe?

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 5 b 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Converion between SSA and SST i imple and take O(n log b) time

n T b a n a n a b a n The pare uffix array (SSA) a n n a a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 b 5 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 Converion between SSA and SST i imple and take O(n log b) time... o we will focu on building the pare uffix array

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 5 b 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 O(n log 2 b) time (Monte-Carlo) O((n + b 2 ) log 2 b) time with high probability (La-Vega) both in O(b) pace

LCE - a fundamental tool for pattern matching T a b c b a b n a b c a b a b a For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch j

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch j

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a j For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a j For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch LCE data tructure are typically baed on the uffix array or uffix tree.

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a j For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch LCE data tructure are typically baed on the uffix array or uffix tree. We do the oppoite - we ue batched LCE querie to contruct the pare uffix array

LCE - a fundamental tool for pattern matching T a b c b a b i n a b c a b a b a j For any (i, j), the longet common extenion i the larget l uch that T [i... i + l 1] = T [j... j + l 1] it the furthet you can go before hitting a mimatch LCE data tructure are typically baed on the uffix array or uffix tree. We do the oppoite - we ue batched LCE querie to contruct the pare uffix array Thee LCE querie will be anwered uing Karp-Rabin fingerprint to enure that the pace remain mall

Karp-Rabin fingerprint of tring S a b a c c b a b c b φ(s) = S 1 k=0 S[k]rk mod p Here p = Θ(n 4 ) i a prime and 1 r < p i a random integer with high probability, S 1 = S 2 iff φ(s 1 ) = φ(s 2 )

Karp-Rabin fingerprint of tring S a b a c c b a b c b φ(s) = S 1 k=0 S[k]rk mod p Here p = Θ(n 4 ) i a prime and 1 r < p i a random integer with high probability, S 1 = S 2 iff φ(s 1 ) = φ(s 2 ) Oberve that φ(s) fit in an O(log n) bit word

Karp-Rabin fingerprint of tring S a b a c c b a b c b φ(s) = S 1 k=0 S[k]rk mod p Here p = Θ(n 4 ) i a prime and 1 r < p i a random integer with high probability, S 1 = S 2 iff φ(s 1 ) = φ(s 2 ) Oberve that φ(s) fit in an O(log n) bit word Given φ(s[0, l]) and φ(s[0, r]) we can compute φ(s[l + 1, r]) in O(1) time

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1]

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] T 1 2 3 1 3 4 2 4

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] T 1 2 3 1 3 4 2 4

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] T 1 2 3 1 3 4 2 4

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] T 1 2 3 1 3 4 2 4 We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] 1 T 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint In each pa we tore (at mot) 4b prefix fingerprint

Simple, Monte-Carlo batched LCE querie Input : a tring, T of length n and b pair, (i, j) Output : for each pair (i, j) output the larget l.t. T [i... i + l 1] = T [j... j + l 1] prefix fingerprint T 1 2 3 4 b We find the larget l for each pair by binary earch (in parallel) comparion are performed uing fingerprint In each pa we tore (at mot) 4b prefix fingerprint thi take O(n log b) time, O(b) pace and i correct whp.

Building the pare uffix array uing batched LCE T b a n a n a

Building the pare uffix array uing batched LCE T b a n a n a 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 2 < 4 becaue n < 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 2 < 4 becaue n < 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 3 n a n a 4 a n a 5 n a 6 a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion Pick a random pivot and compare each other uffix to it - Thi partition the uffixe in O(n log b) time and O(b) pace

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion Pick a random pivot and compare each other uffix to it - Thi partition the uffixe in O(n log b) time and O(b) pace

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion Pick a random pivot and compare each other uffix to it - Thi partition the uffixe in O(n log b) time and O(b) pace - Recure on each partition (the batch till contain b LCE)

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion Pick a random pivot and compare each other uffix to it - Thi partition the uffixe in O(n log b) time and O(b) pace - Recure on each partition (the batch till contain b LCE)

Building the pare uffix array uing batched LCE T b a n a n a The LCE of two uffixe give u their order 1 b a n a n a 2 a n a n a 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion The depth of the recurion i O(log b) whp. o... The total time i O(n log 2 b) and the pace i O(b)

Building the pare uffix array uing batched LCE 1 b a n a n a n a a T b a n a n a 2 a n 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion The depth of the recurion i O(log b) whp. o... The total time i O(n log 2 b) and the pace i O(b)

Building the pare uffix array uing batched LCE T b a n a n a Thi algorithm i Monte-Carlo and La-Vega. It can be made Monte-Carlo only by aborting the quickort early 1 b a n a n a 2 a n a n a 4 a n a 6 a 3 n a n a 5 n a 7 We perform randomied quickort on the b uffixe uing batched LCE for uffix comparion The depth of the recurion i O(log b) whp. o... The total time i O(n log 2 b) and the pace i O(b)

The pare uffix array (SSA) n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 5 b 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 O(n log 2 b) time (Monte-Carlo) O((n + b 2 ) log 2 b) time with high probability (La-Vega) both in O(b) pace

Verifying the pare uffix array T b a n a n a n Suffix Array 2 4 6 1 3 5 7 How can we tell if thi uffix array i correct?

Verifying the pare uffix array T b a n a n a n Suffix Array 2 4 6 1 3 5 7 How can we tell if thi uffix array i correct? Check that 2 < 4, 4 < 6, 6 < 1, 1 < 3...

Verifying the pare uffix array T b a n a n a n Suffix Array 2 4 6 1 3 5 7 How can we tell if thi uffix array i correct? Check that 2 < 4, 4 < 6, 6 < 1, 1 < 3... Thi uffice becaue lexicographical ordering i tranitive

Verifying the pare uffix array T b a n a n a n Suffix Array 2 4 6 1 3 5 7 How can we tell if thi uffix array i correct? Check that 2 < 4, 4 < 6, 6 < 1, 1 < 3... Thi uffice becaue lexicographical ordering i tranitive We could check 2 < 4 uing an LCE query if we verified it

Verifying the pare uffix array T b a n a n a n Suffix Array 2 4 6 1 3 5 7 How can we tell if thi uffix array i correct? Check that 2 < 4, 4 < 6, 6 < 1, 1 < 3... Thi uffice becaue lexicographical ordering i tranitive We could check 2 < 4 uing an LCE query if we verified it So it uffice to verify a batch of b LCE querie

Verifying batched LCE querie Input : a tring, T of length n and b triple, (i, j, l) Output : for each triple (i, j, l) check that T [i + l] T [j + l] and T [i... i + l 1] = T [j... j + l 1] T 1 2 3 4 b

Verifying batched LCE querie Input : a tring, T of length n and b triple, (i, j, l) Output : for each triple (i, j, l) check that T [i + l] T [j + l] and T [i... i + l 1] = T [j... j + l 1] eay! O(n) time T 1 2 3 4 b

Verifying batched LCE querie Input : a tring, T of length n and b triple, (i, j, l) Output : for each triple (i, j, l) check that T [i... i + l 1] = T [j... j + l 1] T 1 2 3 4 b

Verifying batched LCE querie Input : a tring, T of length n and b triple, (i, j, l) Output : for each triple (i, j, l) check that T [i... i + l 1] = T [j... j + l 1] T 1 2 3 4 b We proceed in round in decending order... in the k-th round every l = 2 k

Verifying batched LCE querie Input : a tring, T of length n and b triple, (i, j, l) Output : for each triple (i, j, l) check that T [i... i + l 1] = T [j... j + l 1] T 1 2 3 4 b We proceed in round in decending order... in the k-th round every l = 2 k Let focu on a ingle round

A firt example T 1 2 3

A firt example T 1 2 3

A firt example T 1 2 3

A firt example T 1 2 3

A firt example T 1 2 3

A firt example T 1 2 3 If yellow (1) and blue (2) match then the right half of green (3) matche

A firt example T 1 2 3 If yellow (1) and blue (2) match then the right half of green (3) matche Thi i a lock-tepped cycle

A econd example T 1 2 3

A econd example T 1 2 3

A econd example T 1 2 3

A econd example T 1 2 3

A econd example T 1 2 3

A econd example T 1 2 3 If yellow (1),blue (2) and green (3) match then the right 3 /4 of green (3) i periodic

A econd example T 1 2 3 If yellow (1),blue (2) and green (3) match then the right 3 /4 of green (3) i periodic

A econd example T 1 2 3 If yellow (1),blue (2) and green (3) match then the right 3 /4 of green (3) i periodic Thi i an unlocked cycle

A econd example T 1 2 3

A econd example T 1 2 3 offet Thee trick only work when the offet are mall

A econd example T 1 2 3 l offet Thee trick only work when the offet are mall in particular when they um to at mot l/4

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l l/(9 log b) We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l l/(9 log b) We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l > l/(9 log b) We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l > l/(9 log b) We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node...

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query We can apply one of the two trick to any hort cycle

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query We can apply one of the two trick to any hort cycle (length at mot 2 log b + 1)

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query We can apply one of the two trick to any hort cycle (length at mot 2 log b + 1) Thi break the cycle (becaue we delete an edge)

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query We can apply one of the two trick to any hort cycle (length at mot 2 log b + 1) Thi break the cycle (becaue we delete an edge) Fact If every node ha degree at leat three there i a hort cycle

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query We can apply one of the two trick to any hort cycle (length at mot 2 log b + 1) Thi break the cycle (becaue we delete an edge) Fact If every node ha degree at leat three there i a hort cycle (low degree node are eaily handled)

The overall idea T 1 2 3 l We build a graph containing V 9 n/(l log b) node... and one edge per LCE query Fact If every node ha degree at leat three there i a hort cycle We can find a hort cycle in the graph via a BFS in O( E ) = O(b) time Thi give the additive O(b 2 log b) term All other tep take O(n log b) time over all round (and ue O(b) pace)

Summary n T b a n a n a a n a n a b n a a n Suffix Array 2 4 6 1 3 5 7 Spare Suffix Array 2 6 5 b 2 a n a n a 4 a n a 6 a 1 b a n a n a 3 n a n a 5 n a 7 O(n log 2 b) time (Monte-Carlo) O((n + b 2 ) log 2 b) time with high probability (La-Vega) both in O(b) pace